Service Health
Service Health in APM provides an immediate, visual traffic light assessment of the operational state of your monitored services within Coralogix.
This feature enables at-a-glance understanding of service reliability and helps reduce the time it takes to identify and respond to issues.
Key benefits
- Quick overview: Instantly see which services are healthy or experiencing issues directly from Service Catalog.
- Simplified visibility: Complex conditions and alerts are rolled up into a single indicator.
- Direct context: Drill down into metrics, traces, logs, or triggering alerts for details.
How it works
The Service Health Indicator provides an immediate, visual traffic light assessment of the operational state of your monitored services within Coralogix.
- 🟩 Healthy (Green): No active incidents.
- 🟨 Warning (Yellow): At least one active incident with a low priority (P3, P4, P5).
- 🟥 Critical (Red): At least one active incident with a high priority (P1, P2).
- ⬛ Unknown (Gray): No incidents because no alerts are configured.
This feature gives you a quick, intuitive way to understand the health of services without needing to drill down into multiple dashboards. Get immediate use of this feature’s benefits by configuring alerts (TODO: add link) for your services.
Enabling Service Health
To enable Service Health, you first need to define an alert for your service. You can create an alert in two ways:
- From the APM overview page, by selecting the relevant metric (learn more here).
- By adding service catalog labels when defining the alert (learn more here).
To ensure your Service Health status is accurate and meaningful, you must configure well-defined alerts with priorities that directly reflect the real-world business impact of an issue.
Once an alert is configured:
- The service is shown as Healthy when no incidents are active.
- If an incident is triggered (or acknowledged), the service health status updates based on the incident’s priority.
- After the incident is resolved, the service automatically returns to the Healthy state.
Best practices
- Base your service health rules on key APM metrics such as error rate, latency, throughput, and Apdex.
- Revisit thresholds regularly, especially after scaling events, new deployments, or architecture changes. Use historical data to refine what “Healthy” looks like for each service.
- When a service goes unhealthy, check its health metrics, logs, SLOs and resources at a glance in the service catalog drilldown to see the root cause.
- The distinction between Critical (Red 🟥) and Warning (Yellow 🟨) is the most important part of this feature. Base the alert priority on the impact to the end-user or the business, not just the technical component.
- Avoid setting alert thresholds that are too sensitive, which can lead to a constantly "flapping" health status (switching between green, yellow, and red).