View and Manage Service Level Objectives
Overview
The SLO Center is the centralized dashboard for tracking and managing Service Level Objectives (SLOs) across your environment. It helps you stay on top of system reliability with clear visuals, real-time status, and actionable insights.
Use it to:
- Create and update SLO group definitions and associated alerts all in one place.
- Monitor SLO compliance, remaining error budgets, and performance trends at a glance.
- Filter and search SLOs by entity labels, status, and remaining budget.
- Drill down into individual SLOs to analyze grouped data, alerts, and triggered alert events.
- Quickly understand what is actually burning the SLO error budget.
Visualize your SLO performance trends
The SLO Center provides a high-level view of your system's health. Use the visualizations to explore your overall system compliance based on the statuses of all SLOs tracked over time.
Each SLO or SLO group is categorized based on its remaining error budget. These categories reflect system health and help teams prioritize response actions.
| Status | Remaining error budget (%) | Color | Description |
|---|---|---|---|
| OK | 75%–100% | 🟢 Green (healthy) | The SLO is within its defined target. The system is performing as expected. No action is required. |
| Warning | 25%–74% | 🟡 Yellow (monitor) | The SLO is approaching its threshold. There is a risk of breaching the target if performance does not improve. Users should monitor closely. |
| Critical | 1%–24% | 🟣 Purple (high risk) | The SLO is at high risk of breaching, with very little error budget remaining. Immediate attention is recommended. |
| Breached | 0% | 🔴 Red (failure) | The SLO has exceeded its defined target. This indicates failure to meet reliability expectations. Immediate action is required. |
Understand multiple statuses in a single row
If you see more than one status icon (for example, OK, Warning, Critical) in the SLO grid, it means the SLO was defined with a group by clause in the SLO setup. This creates a grouped SLO, where each unique value in the grouping field—such as service.name, region, or customer.id—becomes an individually tracked objective under the same SLO definition.
Each group is evaluated independently and may have a different health status, depending on its own compliance and remaining error budget. The SLO grid aggregates these statuses into a single row to give a high-level view.
Clicking into the row opens the SLO Details view, where you can drill down into specific permutations, view group-specific performance metrics, and monitor alerts per group.
Examine SLO groups
Scroll down to view the SLO grid, which lists all SLOs in your account.
Each column includes key indicators that help you evaluate performance at a glance.
| Parameter | Description |
|---|---|
| Name | The user-defined name of the SLO group. |
| Grouping key | The key (for example, service.name, customer.id) used to break the SLO group into multiple SLO permutations for individual tracking. |
| SLO status | Status of the permutation with the lowest remaining error budget. |
| Lowest remaining error budget | Percentage (%) of allowable errors left. For grouped SLOs, the information for the group with the lowest remaining budget is displayed. |
| Target | The SLO compliance target (for example, 99.9%). |
| Time frame | The SLO period (for example, 7 days, 28 days). |
| Alerts | Number of unique alert definitions that were triggered within the selected SLO time frame. This includes alerts of any status—triggered or acknowledged. Selecting the value redirects to Incidents, pre-filtered to show all incidents associated with those alert definitions for deeper investigation. |
| Creator | The user who created the SLO. |
| Entity labels | A custom label assigned to the SLO at creation, used for filtering or grouping (for example, env:prod, team:payments). |
| Last updated | Timestamp of the most recent change to the SLO's configuration. |
| Permutations | Number of individual SLO permutations per SLO group. |
Note
Select any column header to sort the table—for example, sort by lowest remaining budget to quickly find the most at-risk SLOs.
SLO actions
You can take action on any SLO directly from the SLO Center.
- Hover over the SLO row.
- Select the ellipsis icon on the left.
- Proceed to an action:
| Action | Description |
|---|---|
| Edit | Update thresholds, queries, labels, and other configuration parameters. If any changes invalidate existing alerts, you’ll be prompted to modify or remove those alerts before saving. Historical data for the previous SLO configuration will no longer be retained. |
| Create an alert | Define alerting conditions based on the SLO’s logic to detect potential or actual violations. Alerts help teams respond proactively to reliability risks. |
| Delete | Remove obsolete SLOs to keep your view focused and accurate. When an SLO configuration is deleted, its historical data will no longer be available and associated alerts are deleted. |
Drill down into a specific SLO
Each row in the SLO Center represents an SLO group—a single SLO definition that may produce multiple permutations if grouping labels were used (for example, by service, region, namespace).
Selecting an SLO group opens its permutations, allowing you to drill down into the list of SLO permutations and analyze each one.
View SLO definitions and statistics
| Parameter | Description |
|---|---|
| SLO name | The name assigned to the SLO. |
| Selected permutation | Each SLO is evaluated per permutation. If the SLO is grouped, the displayed charts and statistics reflect only the currently selected permutation. |
| SLO status | Status of the selected SLO permutation (Critical, Warning, Breached, OK). |
| SLO target | The defined performance goal for the SLO, for example, 99% over the past 14 days. |
| Current compliance | The latest SLI value of the selected SLO permutation. |
| Error budget remaining | Shown as a progress bar (for example, 94%). |
| Alerts for this permutation | Hovering over this element reveals the alert names associated with this permutation. Selecting opens the SLO Alert Configurator, allowing you to view or edit the alert configuration. |
Selecting a permutation opens the SLO drawer which is the primary workspace for analyzing a specific permutation. It contains two tabs:
- Highlights: Use it to identify which labels contribute to bad events.
- Performance: Use it to see the SLO performance trends over time.
The time picker is shared between both tabs.
Highlights
The Highlights tab is designed for quickly identifying what is driving SLO degradation by breaking down bad events into their top contributing dimensions. This tab allows to:
- See what is driving your bad events
- Identify which labels or values contribute most
- Understand what changed inside the selected time range that caused degradation
This tab consists of the three components, as described in subsequent sections.
All events
This is the starting point of the Highlights workflow. The chart shows the total good and bad events over the selected period of time per permutation.
Inside the graph, you can select, drag, and resize a time window to focus the analysis, toggle between absolute counts and percentages of bad events, and view the query for the selected permutation.
Time window behavior
Selecting a time window defines the time range used for the Highlights analysis:
- Only bad events that occurred within that portion of the timeline are evaluated.
- The analysis focuses on what happened during that specific period.
- Bad events can then be broken down into multiple time series based on the underlying metrics and their label values observed during that timeframe.
Investigation view
The Investigation view visualizes how bad events behave inside the selected window. When viewing it’s default state, without grouping or filtering, the graph shows a single time series of bad events and reflects exactly the bad-event counts inside the selection window.
This view allows you to:
- Quickly inspect the shape of bad events in the chosen range.
- Verify whether failures are bursty or gradual.
- Understand time-based context before grouping or filtering.
When you change the time window, the graph immediately updates.
Grouping and filtering
By default, Investigation view shows a single aggregated time series representing all bad events for the selected SLO and time window.
Grouping and filtering allow you to progressively refine this view and understand which dimensions and values are driving error budget burn.
Tip
You can sort the bad events breakdown by total (time series with the highest number of bad events across the selected time window) or by maximum (time series with the highest single peak of bad events within the selected time window), and limit the view to the Top 5, Top 10, or Top 20 contributors.
Group by – splitting the bad events
Adding a Group by label splits the single aggregated bad-events series into multiple series, one per combination of labels and values. Each series represents the contribution of one or more labels with their values to bad events over time.
Example 1:
If you group by k8s.namespace.name:
- The original single Bad events graph is split into multiple series, such as:
namespace = checkoutnamespace = paymentsnamespace = auth
- Each line shows the bad events produced only by that namespace over the selected time window.
Example 2:
Adding an additional group-by label increases the granularity of the breakdown. Grouping by both k8s.namespace.name and status.code produces series such as:
checkout + 500checkout + 503payments + 500
This makes it easy to see which combinations of dimensions are responsible for most of the error budget burn.
Filters – removing noise from the analysis
Adding Filter by does not split the data further. Instead, it removes label values that are not relevant to the investigation, allowing you to focus only on meaningful contributors.
Example 1:
You group by status.code but want to exclude client errors:
- Add a filter:
status.code != 4xx - The graph now shows only server-side failures (for example,
500,503), removing noisy but expected traffic.
Example 2:
- Group by
k8s.namespace.name - Add a filter:
namespace != load-generator - This removes synthetic or test traffic that would otherwise distort the analysis.
Time window behavior
The selected time window controls which bad events participate in the breakdown:
- Only label values that produced bad events within the selected window appear.
- Top contributors are recalculated when the window changes.
- Label values may appear or disappear as the window shifts.
Together, grouping breaks bad events into meaningful dimensions, and filters remove noise, enabling precise, window-focused root cause analysis.
Label distribution
This panel shows which labels and values contributed most to bad events inside the selected window, using a Highlights-style analysis similar to what is available in Traces Explorer, applied specifically to the SLO’s underlying metric labels.
It can be used as a stand-alone aggregation view to see which labels contribute most to bad events, and it also helps you decide which labels or values to add to your bad events breakdown in the graph above.
Tip
You can sort the labels by how evenly or unevenly values are distributed, or alphabetically by name (A–Z or Z–A).
Labels distribution allows you to:
- See how each label and its values contribute to bad events within the selected time window.
- Quickly identify the highest-impact contributors.
Time window behavior
- The distribution recalculates.
- Percentages change to reflect only the updated subset of bad events.
- Some values may disappear (if they had no events in the window).
- Highest contributors may change, depending on what occurred inside the new window.
Performance
The Performance tab visualizes SLO behavior using recording-rule based data, ensuring fast and scalable performance even over long time ranges.
Visualize SLO performance metrics
Performance charts show how your SLO is trending over time. These visualizations help you understand whether you're within budget, approaching a threshold, or burning error budget too quickly.
| Chart | Description |
|---|---|
| Remaining error budget over time | Tracks the available error budget over time, calculating the remaining budget based on the SLO time frame. For example, in a 14-day SLO, each point represents the remaining error budget over the past 14 days (from the selected point). |
| Compliance over time | SLI performance measured at each time step in the graph. Helps spot patterns without relying on a rolling average. |
| Burn rate over time | A multiplier that shows how fast the error budget is being consumed at each time step in the graph. High spikes indicate instability. |
| Bad and good windows | Counts of good vs. bad events at each time step in the graph. Useful for pinpointing failure surges. |
Performance tab allows you to:
- Understand whether the service is trending toward an SLO breach.
- Identify when failures or degradation started increasing.
- Detect patterns of instability or recurring performance issues.
Monitor specific permutations
For grouped SLOs, all permutations are displayed. Triggered alert events per permutation appear in the Alerts column. Select the bell icon to see the alerts list and the latest triggered alert event within the Watch Data UI.
Troubleshooting missing or partial data in permutations
If a specific permutation of an SLO shows missing or partial data in the burn rate graph, it is often due to inconsistent data ingestion—that is, the underlying time series is not continuously reporting.
This situation invalidates the burn rate calculation for that permutation during the affected window.
How to investigate
Identify the SLI query used in the SLO definition (good and bad event expressions).
Locate the SLO time window (for example, 7 or 28 days).
In Grafana, paste the SLI query and apply the same time range as defined in the SLO.
Inspect the graph
If you see gaps or intermittent data, the time series is inconsistent. This confirms the issue and invalidates burn rate metrics for that time period.
Resolution options
Fix instrumentation issues causing gaps in reporting to restore valid SLO evaluation.
Treat the permutation as intermittent (for example, a scheduled or bursty service):
Recognize that burn rate is not applicable for such patterns.
Avoid assigning burn rate alerts to intermittent permutations.
As a best practice, split SLOs between continuous and noncontinuous permutations to preserve alert accuracy and meaningful budget tracking.






