Create Event-Based SLOs
Event-based SLOs measure reliability as the ratio of successful (good) events to total events, making them ideal for high-traffic systems where every request matters.
They’re especially useful when every event counts—such as tracking successful requests or transactions—because even small error rates can significantly impact user experience.
Overview
An event-based SLO is based on an SLI defined as the ratio of successful (good) events to the total number of events. The SLO is met when this ratio meets or exceeds the target objective over a specified compliance period.
Note
The metrics used in the queries under good and total events can either be sent directly to Coralogix or derived from logs and spans using Event2Metrics.
SLO components
When defining an event-based SLO, you must specify:
- A good events query that defines what counts as a successful event
- A total events query representing the full population of events to measure
- A time frame against which the SLO compliance is evaluated (e.g., 7 days)
- A target percentage of successful events (e.g., 99.9%)
This structure ensures that your SLO tracks meaningful outcomes that are aligned with entity goals and user experience.
Good events
The good events query defines what constitutes a successful or healthy event. These events are a subset of total events that meet your success criteria (e.g., fast response time, no errors).
This query determines the numerator in your SLO calculation.
For example:
- HTTP responses with a status code of
200
or204
. - Transactions completed in under 2 seconds.
- Requests not containing error messages or exceptions.
Total events
The total events query defines the full set of events that the SLO should measure. This represents the denominator in your SLO formula.
For example:
- All requests to a given service endpoint.
- Transactions excluding health checks or irrelevant noise.
- Log entries that represent meaningful user actions.
SLI calculation
To calculate the SLI, use two PromQL queries: one for counting good events and one for total events. Then use the following formula:
Good events\Total events over the SLO time frame
For example, if your system logs 9,800 successful responses out of 10,000 total requests:
This value is compared against your defined SLO threshold (e.g., 99%) to determine compliance.
SLO setup
To create a new Service Level Objective (SLO), go to APM > SLO Center and click Create SLO. Define the SLO details.
Field | Description |
---|---|
Name | The unique name identifying your SLO. |
Owner | Ownership defaults to the SLO creator. |
Entity labels | (Optional) Metadata used for filtering SLOs. |
Description | (Optional) Additional context or purpose of the SLO to clarify its scope and intent for other users. |
Select the SLO type
Select the event-based SLO.
Query configuration
Configure an event-based SLO by defining PromQL queries for good events and all events. These queries determine the percentage of successful events over time and are used to evaluate whether the SLO objective is being met.
Grouping SLOs
You can apply grouping to event-based SLOs by using group by
in your queries. This creates a grouped SLO, where each unique value in the grouping field (e.g., service name) results in an individual SLO tracked under the same definition.
When grouping is applied, each combination becomes a distinct SLO instance with its own compliance evaluation, alerts, and burn rate tracking. For example, if you have 100 services and define an SLO grouped by service_name
, the SLO Center will display a single high-level SLO definition. Clicking into it reveals a detailed drilldown with 100 individual SLOs—one per service. Each of these is monitored independently.
This approach is especially useful for managing large-scale systems with shared SLO logic but multiple units of accountability, such as services, teams, or environments.
Query examples
The following examples show how to define good and total event queries for event-based SLOs. Each query uses group by
to evaluate reliability across specific parts of your system—such as services, routes, or infrastructure instances. Grouping enables more targeted monitoring, alerting, and troubleshooting.
Example: HTTP status codes grouped by service
This example measures request success rates by evaluating HTTP response codes for each service.
Good events
Total events
This query pair measures request success by filtering out error codes. Grouping by service_name
creates separate SLOs for each service, allowing you to monitor performance across your architecture.
Example: HTTP requests grouped by service and method
This example tracks successful versus total HTTP requests for each unique combination of service and HTTP method.
Good events
Total events
Example: Redis commands grouped by host, db, and command
This example monitors Redis command reliability by grouping total and error-free command counts by instance, database, and command.
Good events
Total events
Example: Postgres commits and total transactions grouped by database
This example compares committed transactions (good events) against the total number of transaction attempts (commits + rollbacks) in each Postgres database. It demonstrates how to combine multiple metrics into a single total query using a consistent aggregation (group by
).
Good events
Total events
Query validation
During SLO creation, the system performs built-in validations. If any validation fails, an error message is displayed, and you must adjust your queries before proceeding. When queries are valid, Good events and Total events are visualized in the Preview section.
Validation type | Description | Troubleshooting |
---|---|---|
Invalid query syntax | Query contains a syntax error that prevents it from running. | Check for typos, missing brackets, or invalid operators. Use autocomplete to correct issues. |
Good events exceed total events | Good events query returns more results than the total events query. | Make sure the good events query is a subset of the total events. Check for incorrect filters or overly broad logic. |
Excessive query cardinality | Query includes too many unique time series and exceeds system limits. | Add filters or reduce grouping dimensions. If needed, split the SLO into multiple smaller ones. |
Grouping mismatch | Good and total events use different grouping fields. | Ensure both queries use the same group_by dimensions. Compare fields side by side to align them. |
Recording rule detected | Metric based on a recording rule. | To ensure reliable SLO results, add a 2m offset so that the data has time to fully update. |
SLO group limitation | Users may create SLOs with a maximum of 10,000 groups. If you modify an SLO and it produces more than 10,000 groups, the system will stop tracking the SLO. | Reduce the number of group-by values or add filters to your SLO query to exclude groups. |
Set the SLO target and time frame
After defining your queries, configure:
- Target success rate – The percentage of good events required to meet the SLO (e.g., 99.9% of requests should be successful)
- Time frame – The compliance period over which the SLI is evaluated (e.g., 7, 14, 21, 28, or 90 days)
These parameters define the bounds of your error budget and ensure reliability goals are aligned with operational requirements.
Limitations
When an SLO is first created, data collection for its associated metric begins at that point in time. As a result, initial calculations are based solely on data from the moment of creation onward. Once the age of the SLO exceeds its defined time frame, the system begins using the full SLO time frame as a rolling window—continuously evaluating data across the entire duration of the time frame.
Next steps
Click Save to store the SLO and return to the SLO Center.
To configure an SLO-based alert, click Save & create alert.
Additional resources
Find out how to safely use recording rule–based metrics in your SLO creation with this guide.