Skip to content

Create Time Window SLOs

Note

SLOs and SLO alerts are in Beta mode. Features may change, and some functionality may be limited.

Time window SLOs evaluate performance over fixed intervals. This SLO type is best suited for use cases that prioritize reliability across consistent, minute-by-minute time windows.

This method ensures that quality remains steady throughout the entire SLO period by evaluating performance in each discrete time window. Any time window that fails to meet the defined criteria is treated as a failure, thereby contributing to the depletion of the error budget.

Overview

Time window SLOs measure reliability by assessing performance of a service or other object in fixed windows of 1 or 5 minutes. Instead of accumulating the requests over the SLO time frame (e.g. 7,14,21 or 28 days), the requests of each time window are evaluated to assess whether they are good or bad, based on whether they meet a predefined success threshold (e.g., 99% of requests within a 1-min time window are successful).

The Service Level Indicator (SLI) is calculated as:

\[ \text{SLI} = 100* \frac{\text{Count of compliant time windows}}{\text{Total number of time windows}} \]

Time window SLOs are especially effective for latency because they evaluate performance over fixed intervals of 1 or 5 minutes, marking each window as good or bad based on whether it meets a defined threshold. This approach avoids the need to aggregate individual request data across long timeframes and eliminates reliance on histogram buckets like le, which require manual setup, introduce high cardinality, and limit flexibility. By using raw metrics directly, users can define latency thresholds more naturally, resulting in simpler configuration, more accurate evaluations, and lower storage and compute costs. It also better captures real-world behavior by reflecting the impact of sustained latency spikes rather than isolated slow requests.

SLO components

When defining a time window SLO, you must specify:

  • Conditions of the time-windows: a measurement interval (1 or 5 minutes), the metric you wish to query within each SLO time window, an operator (e.g., <, >=, !=), and a success threshold (e.g., p95 latency ≤ 1.5s)
  • Time frame (e.g., 7 days) and a target percentage for the overall SLO (e.g., 95%)

SLO setup

To create a new Service Level Objective (SLO), go to APM > SLO Center and click Create SLO. Define the SLO details.
FieldDescription
NameThe unique name identifying your SLO.
OwnerThe user responsible for maintaining and reviewing the SLO. Ownership defaults to the creater, but can be reassigned to any Coralogix team member.
Entity labelsMetadata used for filtering and grouping SLOs.
Description(Optional) Additional context or purpose of the SLO to clarify its scope and intent for other users.

Select the SLO type

Choose Time window as the SLO type. This model evaluates how many time windows (e.g., 1 or 5 minutes) meet a defined success condition within the SLO time frame.

Define the SLI

To define your uptime condition, specify:

  • A time window (1m or 5m)
  • A PromQL query that returns a value to evaluate (e.g., p95 latency, error rate)
  • An operator (e.g., <, >=, !=)
  • A threshold the value must satisfy

Note

The metrics used in the query can either be sent directly to Coralogix or derived from logs and spans using Event2Metrics.

Set the operator and threshold

Use the dropdowns to define your operator (e.g., <, >=) and threshold value.

In this example:

  • Operator: >=
  • Threshold: 80

This means a window is considered good if average CPU usage meets or exceeds 80. Otherwise, it’s bad.

Set the SLO target and time frame

Define:

  • Time frame (e.g., last 7 days)
  • SLO target (e.g., 95% of time windows must be good)

In the example shown, the system tracks whether at least 95% of all 1-minute windows over the past 7 days were within the threshold.

Real-time preview

You’ll see:

  • State – Current percentage of good windows (e.g., 98%)
  • Status – Whether the target is met (e.g., OK)
  • Budget – Remaining percentage of allowed downtime (e.g., 82% error budget remains)
  • Visualization - A chart with the measured metric over the recent SLO time frame

This preview allows you to assess how the defined threshold performs against real metric data over the SLO time frame.

Example: Time window SLO with p95 latency threshold

In the example below, an SLO is configured with 5-minute time windows. The uptime condition is defined as: p95 latency must be less than or equal to 1.5 seconds. Only one time window in the evaluation period exceeds this threshold.

The total monitored period spans 12 hours (720 minutes), resulting in 144 time windows (720 ÷ 5). With one window violating the latency condition, the service is considered up for 715 minutes and down for 5 minutes.

The resulting uptime is:

\[ \text{Uptime} = \frac{143}{144} \times 100 = 99.305\% \]

Underlying PromQL query

histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket)) by (le))

This query calculates the p95 latency over 5-minute windows using Prometheus histograms. Each time window is evaluated against the threshold: if the p95 value is greater than 1.5 seconds, that time window is marked as downtime.

Additional query examples

You can define time window SLOs using any PromQL expression that returns a single numeric value per time window. Here are two additional examples based on average and total latency.

Average latency per time window

This query calculates the average latency by dividing the sum of durations by the count of requests. You might use this with a threshold such as < 300ms.

sum(increase(duration_ms_sum{service_name="my-service"}))
/
sum(increase(duration_ms_count{service_name="my-service"}))
  • Time window: 1m or 5m
  • Threshold: < 300 (milliseconds)
  • Use case: Tracks average response time of a specific service in each time window.

Total latency grouped by dimension

This query evaluates total latency in each time window and groups results by one or more labels (e.g., route, instance, service_name).

sum(increase(duration_ms_sum{<some filter>})) by (<selected group labels>)
  • Time window: 1m or 5m
  • Threshold: < 1m (milliseconds or seconds, depending on metric scale)
  • Use case: Ensures total latency remains within bounds across grouped entities such as routes or services.

Next steps

Once the configuration is complete:

  • Click Save to store the SLO.
  • Click Save & create alert to immediately configure an alert based on this SLO.

Additional resources

Find out how to safely use recording rule–based metrics in your SLO creation with this guide.