Skip to content

View and Manage Service Level Objectives

Overview

The SLO Center is the centralized dashboard for tracking and managing Service Level Objectives (SLOs) across your environment. It helps you stay on top of system reliability with clear visuals, real-time status, and actionable insights.

Use it to:

  • Create and update SLO group definitions and associated alerts all in one place.
  • Monitor SLO compliance, remaining error budgets, and performance trends at a glance.
  • Filter and search SLOs by entity labels, status, and remaining budget.
  • Drill down into individual SLOs to analyze grouped data, alerts, and triggered alert events.
  • Quickly understand what is actually burning the SLO error budget.

The SLO Center provides a high-level view of your system's health. Use the visualizations to explore your overall system compliance based on the statuses of all SLOs tracked over time.

Each SLO or SLO group is categorized based on its remaining error budget. These categories reflect system health and help teams prioritize response actions.
StatusRemaining error budget (%)ColorDescription
OK75%–100%🟢 Green (healthy)The SLO is within its defined target. The system is performing as expected. No action is required.
Warning25%–74%🟡 Yellow (monitor)The SLO is approaching its threshold. There is a risk of breaching the target if performance does not improve. Users should monitor closely.
Critical1%–24%🟣 Purple (high risk)The SLO is at high risk of breaching, with very little error budget remaining. Immediate attention is recommended.
Breached0%🔴 Red (failure)The SLO has exceeded its defined target. This indicates failure to meet reliability expectations. Immediate action is required.

Understand multiple statuses in a single row

If you see more than one status icon (for example, OK, Warning, Critical) in the SLO grid, it means the SLO was defined with a group by clause in the SLO setup. This creates a grouped SLO, where each unique value in the grouping field—such as service.name, region, or customer.id—becomes an individually tracked objective under the same SLO definition.

Each group is evaluated independently and may have a different health status, depending on its own compliance and remaining error budget. The SLO grid aggregates these statuses into a single row to give a high-level view.

Clicking into the row opens the SLO Details view, where you can drill down into specific permutations, view group-specific performance metrics, and monitor alerts per group.

Examine SLO groups

Scroll down to view the SLO grid, which lists all SLOs in your account.

Each column includes key indicators that help you evaluate performance at a glance.
ParameterDescription
NameThe user-defined name of the SLO group.
Grouping keyThe key (for example, service.namecustomer.id) used to break the SLO group into multiple SLO permutations for individual tracking.
SLO statusStatus of the permutation with the lowest remaining error budget.
Lowest remaining error budgetPercentage (%) of allowable errors left. For grouped SLOs, the information for the group with the lowest remaining budget is displayed.
TargetThe SLO compliance target (for example, 99.9%).
Time frameThe SLO period (for example, 7 days, 28 days).
AlertsNumber of unique alert definitions that were triggered within the selected SLO time frame. This includes alerts of any status—triggered or acknowledged. Selecting the value redirects to Incidents, pre-filtered to show all incidents associated with those alert definitions for deeper investigation.
CreatorThe user who created the SLO.
Entity labelsA custom label assigned to the SLO at creation, used for filtering or grouping (for example, env:prodteam:payments).
Last updatedTimestamp of the most recent change to the SLO's configuration.
PermutationsNumber of individual SLO permutations per SLO group.

Note

Select any column header to sort the table—for example, sort by lowest remaining budget to quickly find the most at-risk SLOs.

SLO actions

You can take action on any SLO directly from the SLO Center.

  1. Hover over the SLO row.
  2. Select the ellipsis icon on the left.
  3. Proceed to an action:
ActionDescription
EditUpdate thresholds, queries, labels, and other configuration parameters. If any changes invalidate existing alerts, you’ll be prompted to modify or remove those alerts before saving. Historical data for the previous SLO configuration will no longer be retained.
Create an alertDefine alerting conditions based on the SLO’s logic to detect potential or actual violations. Alerts help teams respond proactively to reliability risks.
DeleteRemove obsolete SLOs to keep your view focused and accurate. When an SLO configuration is deleted, its historical data will no longer be available and associated alerts are deleted.

Drill down into a specific SLO

Each row in the SLO Center represents an SLO group—a single SLO definition that may produce multiple permutations if grouping labels were used (for example, by service, region, namespace).

Selecting an SLO group opens its permutations, allowing you to drill down into the list of SLO permutations and analyze each one.

SLO drilldown

View SLO definitions and statistics

SLO drilldown
ParameterDescription
SLO nameThe name assigned to the SLO.
Selected permutationEach SLO is evaluated per permutation. If the SLO is grouped, the displayed charts and statistics reflect only the currently selected permutation.
SLO statusStatus of the selected SLO permutation (Critical, Warning, Breached, OK).
SLO targetThe defined performance goal for the SLO, for example, 99% over the past 14 days.
Current complianceThe latest SLI value of the selected SLO permutation.
Error budget remainingShown as a progress bar (for example, 94%).
Alerts for this permutationHovering over this element reveals the alert names associated with this permutation. Selecting opens the SLO Alert Configurator, allowing you to view or edit the alert configuration.

Selecting a permutation opens the SLO drawer which is the primary workspace for analyzing a specific permutation. It contains two tabs:

  • Highlights: Use it to identify which labels contribute to bad events.
  • Performance: Use it to see the SLO performance trends over time.

The time picker is shared between both tabs.

Highlights

The Highlights tab is designed for quickly identifying what is driving SLO degradation by breaking down bad events into their top contributing dimensions. This tab allows to:

  • See what is driving your bad events
  • Identify which labels or values contribute most
  • Understand what changed inside the selected time range that caused degradation

This tab consists of the three components, as described in subsequent sections.

All events

This is the starting point of the Highlights workflow. The chart shows the total good and bad events over the selected period of time per permutation.

All events

Inside the graph, you can select, drag, and resize a time window to focus the analysis, toggle between absolute counts and percentages of bad events, and view the query for the selected permutation.

Time window behavior

Selecting a time window defines the time range used for the Highlights analysis:

  • Only bad events that occurred within that portion of the timeline are evaluated.
  • The analysis focuses on what happened during that specific period.
  • Bad events can then be broken down into multiple time series based on the underlying metrics and their label values observed during that timeframe.

Investigation view

The Investigation view visualizes how bad events behave inside the selected window. When viewing it’s default state, without grouping or filtering, the graph shows a single time series of bad events and reflects exactly the bad-event counts inside the selection window.

This view allows you to:

  • Quickly inspect the shape of bad events in the chosen range.
  • Verify whether failures are bursty or gradual.
  • Understand time-based context before grouping or filtering.

Investigation view

When you change the time window, the graph immediately updates.

Grouping and filtering

By default, Investigation view shows a single aggregated time series representing all bad events for the selected SLO and time window.

Grouping and filtering allow you to progressively refine this view and understand which dimensions and values are driving error budget burn.

Bad events filtered

Tip

You can sort the bad events breakdown by total (time series with the highest number of bad events across the selected time window) or by maximum (time series with the highest single peak of bad events within the selected time window), and limit the view to the Top 5, Top 10, or Top 20 contributors.

Group by – splitting the bad events

Adding a Group by label splits the single aggregated bad-events series into multiple series, one per combination of labels and values. Each series represents the contribution of one or more labels with their values to bad events over time.

Example 1:

If you group by k8s.namespace.name:

  • The original single Bad events graph is split into multiple series, such as:
    • namespace = checkout
    • namespace = payments
    • namespace = auth
  • Each line shows the bad events produced only by that namespace over the selected time window.

Example 2:

Adding an additional group-by label increases the granularity of the breakdown. Grouping by both k8s.namespace.name and status.code produces series such as:

  • checkout + 500
  • checkout + 503
  • payments + 500

This makes it easy to see which combinations of dimensions are responsible for most of the error budget burn.

Filters – removing noise from the analysis

Adding Filter by does not split the data further. Instead, it removes label values that are not relevant to the investigation, allowing you to focus only on meaningful contributors.

Example 1:

You group by status.code but want to exclude client errors:

  • Add a filter: status.code != 4xx
  • The graph now shows only server-side failures (for example, 500, 503), removing noisy but expected traffic.

Example 2:

  • Group by k8s.namespace.name
  • Add a filter: namespace != load-generator
  • This removes synthetic or test traffic that would otherwise distort the analysis.

Time window behavior

The selected time window controls which bad events participate in the breakdown:

  • Only label values that produced bad events within the selected window appear.
  • Top contributors are recalculated when the window changes.
  • Label values may appear or disappear as the window shifts.

Together, grouping breaks bad events into meaningful dimensions, and filters remove noise, enabling precise, window-focused root cause analysis.

Label distribution

This panel shows which labels and values contributed most to bad events inside the selected window, using a Highlights-style analysis similar to what is available in Traces Explorer, applied specifically to the SLO’s underlying metric labels.

It can be used as a stand-alone aggregation view to see which labels contribute most to bad events, and it also helps you decide which labels or values to add to your bad events breakdown in the graph above.

Labels distribution

Tip

You can sort the labels by how evenly or unevenly values are distributed, or alphabetically by name (A–Z or Z–A).

Labels distribution allows you to:

  • See how each label and its values contribute to bad events within the selected time window.
  • Quickly identify the highest-impact contributors.

Time window behavior

  • The distribution recalculates.
  • Percentages change to reflect only the updated subset of bad events.
  • Some values may disappear (if they had no events in the window).
  • Highest contributors may change, depending on what occurred inside the new window.

Performance

The Performance tab visualizes SLO behavior using recording-rule based data, ensuring fast and scalable performance even over long time ranges.

Performance tab

Visualize SLO performance metrics

Performance charts show how your SLO is trending over time. These visualizations help you understand whether you're within budget, approaching a threshold, or burning error budget too quickly.
ChartDescription
Remaining error budget over timeTracks the available error budget over time, calculating the remaining budget based on the SLO time frame. For example, in a 14-day SLO, each point represents the remaining error budget over the past 14 days (from the selected point).
Compliance over timeSLI performance measured at each time step in the graph. Helps spot patterns without relying on a rolling average.
Burn rate over timeA multiplier that shows how fast the error budget is being consumed at each time step in the graph. High spikes indicate instability.
Bad and good windowsCounts of good vs. bad events at each time step in the graph. Useful for pinpointing failure surges.

Performance tab allows you to:

  • Understand whether the service is trending toward an SLO breach.
  • Identify when failures or degradation started increasing.
  • Detect patterns of instability or recurring performance issues.

Monitor specific permutations

For grouped SLOs, all permutations are displayed. Triggered alert events per permutation appear in the Alerts column. Select the bell icon to see the alerts list and the latest triggered alert event within the Watch Data UI.

Troubleshooting missing or partial data in permutations

If a specific permutation of an SLO shows missing or partial data in the burn rate graph, it is often due to inconsistent data ingestion—that is, the underlying time series is not continuously reporting.

This situation invalidates the burn rate calculation for that permutation during the affected window.

How to investigate

  1. Identify the SLI query used in the SLO definition (good and bad event expressions).

  2. Locate the SLO time window (for example, 7 or 28 days).

  3. In Grafana, paste the SLI query and apply the same time range as defined in the SLO.

Inspect the graph

If you see gaps or intermittent data, the time series is inconsistent. This confirms the issue and invalidates burn rate metrics for that time period.

Resolution options

  • Fix instrumentation issues causing gaps in reporting to restore valid SLO evaluation.

  • Treat the permutation as intermittent (for example, a scheduled or bursty service):

  • Recognize that burn rate is not applicable for such patterns.

  • Avoid assigning burn rate alerts to intermittent permutations.

  • As a best practice, split SLOs between continuous and noncontinuous permutations to preserve alert accuracy and meaningful budget tracking.