Skip to content

SLO Alerts

Note

SLOs and SLO alerts are in Beta mode. Features may change, and some functionality may be limited.

Service Level Objectives help teams define what reliability means for their services. But knowing whether you're meeting your SLOs isn’t enough, you need to be alerted when you're at risk of breaching them. That’s where SLO-based error budget and burn rate alerts come in.

Overview

SLO alerts provide actionable, reliability-centered monitoring, shifting focus away from individual incidents to the user experience as a whole. Instead of reacting to every spike or dip, you can prioritize based on real impact to service goals.

SLO alerts support event-based and time window SLOs.

Alerts API v2 and v3

SLO alerts require support from Alerts API v2 or v3. While both versions are supported, we recommend using API v3 to take advantage of enhanced alert management capabilities.

To create SLO alerts via API or Terraform, Alerts API v3 is required.

In addition, API v3 introduces significant improvements to alert management, including support for:

SLO alerts may be created in the Coralogix platform with support from Alerts API v2.

This version lacks programmatic support for SLO creation via API and Terraform.

It also lacks support for new alert management features, including:

To migrate from Alerts API v2 to v3, follow this guide.

Track error budget usage

Your error budget represents how much unreliability you can tolerate before breaching your SLO. Monitoring it gives you a clear picture of how much room you have left to absorb incidents. This enables you to make smarter decisions about whether to ship a risky release, investigate performance regressions, or prioritize reliability improvements over new features.

Detect and react to burn rate spikes

Burn rate shows how quickly you're using up your error budget. By tracking this rate, you can differentiate between high-impact, fast-burning incidents that require immediate attention and slow-burning degradations that may need longer-term investigation. This empowers you to respond with the appropriate urgency based on the actual risk to your SLO.

Reduce alert noise and fatigue

Traditional alerting systems often overwhelm with alerts that may not reflect real user impact. SLO-based alerting focuses on the experience that actually matters to users, drastically cutting down false positives and alert fatigue. You can prioritize fewer, higher-quality alerts that signal true reliability threats.

Getting started

Create SLO alerts using one of the following methods:

  • In the Coralogix UI, go to Alerts > Alert Management. Click NEW ALERT, and select SLO as your alert type.
  • When saving a newly-created SLO, select Save & create alert.
  • In the SLO Hub, hover over an SLO in the SLO grid, click on the ellipsis (…) at the beginning of the row, and choose Create alert.

Select SLO and define alert details

  1. Use the SLO drop-down box to select an existing SLO or create a new one. If needed, click View SLO to navigate to the SLO details page in the SLO Hub.

  2. Define the alert details.

Labels help you filter alerts in Incidents and organize views. You can create a new label or select an existing one. To nest a label, use the key:value format (e.g., env:prod).

Error budget alert

Error budget alerts are triggered when the remaining error budget percentage is equal to or below a defined threshold.

These alerts ensure you are notified early when your service is at risk of falling below its threshold, enabling you to take proactive action before breaching your reliability goals.

Add a budget threshold

Set when the alert should trigger based on error budget usage, and assign its priority (P1 to P5). For example, trigger a P1 alert at 10% consumption for urgent action, or a P3 alert at 50% for early warning.

  • You can define up to five unique thresholds for each alert.
  • Avoid using the same percentage value or assigning the same priority level more than once within a single alert configuration.

Burn rate alert

Burn rate alerts are triggered when the SLO error budget is consumed faster than the expected burn rate.

There are two types of burn rate alerting strategies: single-window and dual-window. Each serves a different operational need for balancing fast incident detection and alert stability.

Single window

A single-window burn rate alert monitors the error budget consumption over one rolling time window of up to 48 hours (e.g., 1 hour, 6 hours, or 48 hours maximum).

Single-window alerts are ideal for detecting sustained issues over a fixed period. They allow for rapid detection of critical failures, when immediate responsiveness outweighs occasional noise.

Dual-window

A dual-window burn rate alert uses two time windows, short and long, to balance speed and stability in alerting.

The alert only triggers if the threshold for both the short and long windows is exceeded.

Dual-window alerts are ideal for production-critical services where alert noise must be minimized. Or, for environments requiring early yet reliable detection of ongoing incidents.

Recovery behavior with dual-window burn rate windows

When an alert uses dual burn rate windows (e.g., 1h and 5m), it must be breached in both time windows to trigger. This dual evaluation not only ensures better signal quality but also enables quick detection of recovery.

For example, if a burn rate violation occurred over the past hour but the last 5 minutes are now healthy, the alert condition is no longer met and will not trigger. This allows the system to recognize and respond to recovery events faster, helping reduce alert noise during transient failures.

Example: How dual-window burn rate alerting works

Let’s assume a long window of 1 hour, a short window of 5 minutes, and a burn rate threshold of X14.

When the burn rate alert triggers at minute 5, it doesn’t mean we’re only evaluating that single minute. Instead, at minute 5, the system evaluates the previous 5 minutes (short window) and the previous 1 hour (long window) to determine if the burn rate exceeds the defined threshold.

An alert is triggered only if both the short and long windows exceed the threshold at the same evaluation point (minute 5).

The long window helps prevent false positives caused by short-lived spikes.

In the example above, although the error stops at minute 10, the long window still includes the earlier error data. If only the long window was used, the alert would remain active for longer. That’s where the short window adds value—it allows for quicker alert resolution. Using both ensures timely detection without remaining in an alert state longer than necessary.

Define the evaluation window

Define the burn rate alert type: single or dual window.

  • Single window – Evaluates the burn rate over a single, fixed time frame (e.g., the past 1 hour). This is useful for straightforward, time-bound alerting.
  • Dual window – Evaluates the burn rate across two time frames: a short-term window and a long-term window (e.g., past 5 minutes and past 1 hour).

    This approach allows for both fast-response alerts and confirmation over a longer period to reduce noise.

Note

In dual window mode, the short window is automatically calculated as 1/12 of the long window, following best practices outlined in the Google SRE Workbook.

Define burn rate thresholds

Define how many times faster than normal the error budget is burning and assign it a priority (P1 to P5). For example, send a P1 alert if the burn rate is> 1.5 normal usage.

Note

Avoid using the same priority level more than once within a single alert configuration.

Notifications

As part of the notification settings setup, the following outbound webhooks are supported:

Supported fields for webhook configuration can be found here.

Schedule

Define an alert schedule by specifying the exact days and time ranges during which the alert should be active.

View triggered alert events in Incidents

Triggered SLO-based alert events are aggregated into Incidents, providing a centralized way to track, investigate, and resolve critical issues across your system. Each incident aggregates related alerts, giving you a high-level view of what went wrong, when, and where. From within the Incidents view, you can dive into the relevant SLO permutation to understand the root cause faster.

In this example, the pink represents the P1 threshold, while the green line shows the actual burn rate.

Alert evaluation timing

Sampling for burn rate alerts begins only after the duration of the longer time window specified in the alert configuration. For example, if the long window is 6 hours, alert evaluation will start 6 hours after creation. This delay ensures meaningful evaluation of sustained burn conditions.