Using the Underlying SLO Metrics
When the SLO center begins calculating your SLO statuses, it uses Recording Rules to store the results of those calculations. This makes for faster UI loading times, and minimises query volumes issued to customer backed cloud object storage. This has the added advantage of making the underlying values that power the SLO center available anywhere in the platform.
The metrics written by the SLO center
The SLO center writes a set of metrics, via Recording Rules. These metrics appear as Prometheus time series values that can be leveraged in custom dashboards, alerts, the visual explorer and more.
Metric Name | Description | Valid Values | SLO Types |
---|---|---|---|
cx:slo:request:total_events | A count of the total events (the result of the "all events" query in the SLO center UI) | 0 - * | Request-based |
cx:slo:request:good_events | A count of the good events (the result of the "good events" query in the SLO center UI) | 0 - * | Request-based |
cx:slo:target | The target compliance for the SLO. For example, if set to 95%, then the value is 95 . | 0 - 100 | Request-based, Window-based |
cx:slo:statuses:hourly | A numerical representation of the status of the SLO, calculated on an hourly aggregation. (See below for mapping of number to status description) | 0 - 4 | Request-based, Window-based |
cx:slo:window | A value indicating the evaluation within each time window (was it successful or not) | 0 - 1 | Window-based |
Are total and good events counter or gauge metrics?
A counter metric is a variable that is always increasing. A gauge can go up or down. For cx:slo:request:total_events
and cx:slo:request:good_events
, it depends on the underlying query. For example, if cx:slo:request:total_events
is defined as sum(calls_total) by (country)
, assuming that calls_total
is a counter metric, then the resultant metric will be a counter too. However, if the metric is more like increase(calls_total) by (country)
, this converts the overall type into a gauge.
This means that the metric type will be different for different SLOs, based on the underlying user-defined queries that power the SLI calculations.
How to figure out which metric belongs to which SLO
Every metric is labelled with slo_id
. This value corresponds to the underlying ID of the SLO. This can be pulled from the URL when exploring the SLO in the user interface.
When working with the metrics for a given SLO, you can filter the PromQL metric by the relevant slo_id
.
This will capture only those metric values pertaining to the SLO in question. Every query will likely require this filter.
Where to find the underlying SLO metrics
SLO metrics are defined using Recording Rules. Recording rules can be accessed under Data Flow
in the menu. Each SLO defines a new Recording Rule group, which bundles together all relevant metrics for a specific SLO.
Note, as is indicated above, different metrics will be available based on the SLO type.
Mapping SLO status values to descriptions
The following table explains the SLO statuses and their numerical representations:
Value | Name | Description |
---|---|---|
0 | Breached | There is no error budget remaining, and the overall SLO has been breached. |
1 | Critical | There is less than 25% of the error budget remaining. |
2 & 3 | Warning | There is between 25% and 75% of the error budget remaining. |
4 | OK | There is more than 75% of the error budget remaining. |