Recommended Configurations
Use this guide to configure Span Metrics for production environments, optimize performance, and integrate features such as sampling, latency buckets, setting metric expiration, Serverless environment, and more.
Use span sampling
Span Metrics performs best with minimal span sampling (head or tail) allowing to have enough trace examples to power trace-based features in the UI.
Why sampling matters
While Span Metrics does not require trace sampling to be configured in any specific way, we recommend using a minimal sampling rate, either head or tail sampling, to ensure that enough trace examples remain available for UI features that rely on actual traces. These include service maps, drilldowns, Coralogix traces explorer, Spans Highlights, and more. Span Metrics will still provide complete RED metrics even with sampling, but maintaining a small set of representative traces significantly improves the troubleshooting experience.
By default, head sampling is enabled in our latest Helm chart. This setting can be changed either directly in the values.yaml or from the UI.
To learn more about head and tail sampling, refer to the following documents:
Manage cardinality
Use the following methods to reduce or control cardinality:
- Normalize span names and URLs.
- Sanitize database statements.
- Remove high-cardinality resource attributes.
- Set
aggregationCardinalityLimitandmetricsExpirationto protect the pipeline.
For more details, see Span Metrics Cardinality Limiting.
Manage histogram metric buckets
Histogram buckets define how latency distributions are computed for percentiles, SLOs, and Apdex scoring. Span Metrics does not include default buckets - you must configure them manually.
Configure collector buckets
connectors:
spanmetrics:
histogram:
explicit:
buckets: [100us, 1ms, 2ms, 4ms, 6ms, 10ms, 100ms, 250ms]
Select thresholds that reflect the latency behavior of your workloads.
Apdex settings
For Apdex scoring, your bucket list must include:
- the Apdex threshold (
T) - the tolerated threshold (
4T)
In the example above, the Apdex threshold can be set to 1ms because both T (1ms) and 4T (4ms) are specified.
Modifying buckets used by old SLO or Apdex
If you modify bucket configuration for metrics already used in SLOs or Apdex scoring:
- The active calculation stops.
- The UI shows a bucket mismatch error.
- You must update the SLO/Apdex definition to match the new buckets.
- The calculation restarts as a new SLO.
Selecting buckets
Best practices:
- Include both
Tand4T. - Analyze your latency distribution (minimum, median, maximum).
- Add intermediate buckets for meaningful granularity.
- Exclude buckets for rare anomalies unless they occur frequently.
- Keep bucket lists consistent across all collectors and clusters.
Configure different buckets per application (Kubernetes only)
Use the spanMetricsMulti preset to define bucket sets per service.
presets:
spanMetricsMulti:
enabled: false
defaultHistogramBuckets: [1ms, 4ms, 10ms, 20ms, 50ms, 100ms, 200ms, 500ms, 1s, 2s, 5s]
configs:
- selector: route() where attributes["service.name"] == "one"
histogramBuckets: [1s, 2s]
- selector: route() where attributes["service.name"] == "two"
histogramBuckets: [5s, 10s]
Each selector requires an OTTL statement. Find out more here.
Enable spanMetricsMulti in Helm
Use this mode if you want to define metrics for each service. It is recommended to have a broad bucket definition that covers the system. spanMetricsMulti allows for more detailed per-service metrics.
Note
If you use multiple collectors, ensure that the bucket configuration is consistent across all of them.
Set metric expiration
metricsExpiration removes inactive metric series after a defined window to control memory consumption.
Retention behavior
- Metric series are stored only in-memory.
- All series reset on Collector or pod restart.
- If a service stops sending spans for the duration of the expiration period, its series are removed automatically.
Use resource key attributes
Use resource_metrics_key_attributes to group metric series at the resource level.
This stabilizes metric grouping across services, environments, clusters, and tenants.
Use exemplars
Exemplars link metrics to the traces that generated them, enabling seamless navigation from metric panels to the corresponding span data. This improves issue investigation and helps correlate latency spikes or error rates with specific requests.
Exemplars are enabled by default in most configurations, but you can disable them if your environment if needed:
Configure Serverless environments
The Service Catalog functions seamlessly with both Span Metrics and Events2Metric, as long as all instructions in the documentation are followed correctly.
To display services based on AWS Lambda in the Service Catalog, your organization must send spans or Span Metrics. With Events2Metric (E2M), this is done automatically. For Span Metrics, ensure that Lambda-generated spans are routed through the collector.
Note
This ensures that Lambda-based services, along with their metrics, are displayed in the Service Catalog. Serverless catalog is supported only using Events2Metrics.
Next steps
Explore the Span Metrics Cardinality Limiting guide to learn about the three-layered approach to manage and mitigate cardinality issues.