Service Level Objectives
Note
SLOs and SLO alerts are in Beta mode. Features may change, and some functionality may be limited.
Overview
Service Level Objectives (SLOs) help you define clear, measurable performance targets for any object that emits metrics—whether it's a service, application, infrastructure component, or custom-defined entity. These targets can be based on key indicators such as availability, latency, error rates, or other relevant metrics.
Once your SLOs are in place, you can monitor them in real time using dashboards and alerts that surface potential issues early. Over time, you can track how well each object is meeting its targets, with full visibility into error budgets and burn rates to support data-driven reliability decisions.
Use SLOs for:
- Proactive issue detection. SLOs act as benchmarks for measuring whether a service or any metric-generating object is performing well or degrading. By monitoring the metrics that contribute to SLOs, identify early signs of degradation or failure and take corrective action before issues escalate.
- Error budget tracking. Monitor and track error budgets to measure whether they are within acceptable failure limits according to the SLOs. Manage how much downtime, degraded performance, or error rates are acceptable before corrective action is necessary.
- Automated alerts and remediation. Trigger alerts when SLOs are at risk or breached to enable teams to proactively resolve issues before they affect end-users, reducing downtime and ensuring services consistently meet customer expectations.
Core concepts
A service level indicator (SLI) is the actual measured value of an object’s performance, representing real-time compliance with the SLO by tracking metrics such as request success rate, response time, or uptime.
A service level objective (SLO) defines the target threshold for an SLI. It communicates reliability goals by setting a clear expectation—for example, “99.9% success rate over 30 days.”
SLO groups act as a subset of an SLO, defined by unique combinations of label values (via group by
logic). Each group independently tracks its own compliance and budget consumption.
An error budget is the acceptable margin of failure for an SLO. For example, a 99% availability on 7-days time frame SLO allows for 1% of errors over a rolling window of 7 days. Once the budget is depleted, the SLO is breached.
A remaining budget is the percentage of the allowable bad events out of the entire error budget.
An SLO time frame is the entire duration over which the SLO is defined and evaluated.
A rolling time window is a continuously updated time frame (e.g., past 1 hour) used for evaluating metrics like burn rate. It ensures responsiveness to recent changes.
SLO types
Event-based SLOs track the success rate of individual requests. An event-based SLO is based on an SLI defined as the ratio of successful (good) events to the total number of events. The SLO is met when this ratio meets or exceeds the target objective over a specified compliance period.
Time window SLOs evaluate performance over fixed intervals. This SLO type is best suited for use cases that prioritize reliability across consistent, minute-by-minute time windows. It ensures that quality remains steady throughout the entire SLO period by evaluating performance in each discrete time window. Any time window that fails to meet the defined criteria is treated as a failure, thereby contributing to the depletion of the SLO error budget.
SLO alerts
Define notifications that help monitor and maintain a service's reliability by tracking whether the service is meeting its predefined SLOs. SLO alerts notify teams when service performance deviates from the agreed-upon targets for availability, response times, error rates, or other key metrics.
Error budget
Error budget alerts notify you when too much of that allowance has been used. You can configure alerts at specific thresholds—such as warning at 50% consumed and critical at 75%—to catch issues early and avoid breaching your SLO.
Burn rate
Burn rate alerts help detect when the error budget is being consumed too quickly. When a burn rate threshold is exceeded, an alert is triggered. These alerts are based on how fast errors are accumulating relative to the SLO's time window.
You can define a burn rate threshold that triggers an alert when the observed burn rate exceeds a specified multiple of the normal rate.
For example:
With a 7-day SLO and a 5% error budget, if the actual error rate during a given period reaches 10%, the burn rate is calculated as 2 (i.e., twice the expected rate). You can configure the threshold—for instance, alerting if the burn rate exceeds 1.5x, 2x, or any other multiplier you choose.
Create SLOs
Create event-based and time window SLOs to track and maintain your service's reliability based on user-centric performance goals.
Manage SLOs
View and manage your SLOs in Coralogix’s SLO Center to track SLO compliance and ensure object performance aligns with the allocated user budget.
Monitor service performance with SLO alerts
Monitor and maintain a stable error budget by configuring SLO alerts that notify you when performance or error rates exceed defined thresholds.
Resources
This guide provides a comprehensive walkthrough of Coralogix SLOs.
Here's what you'll find:
- Create SLOs:
- Safe Use of Recording Rule-Based Metrics
- View and Manage SLOs
- Configure SLO Alerts
- Permissions