Recommended Configurations

Use this guide to configure Span Metrics for production environments, optimize performance, and integrate features such as sampling, latency buckets, setting metric expiration, Serverless environment, and more.

Use span sampling

Span Metrics performs best with minimal span sampling (head or tail) allowing to have enough trace examples to power trace-based features in the UI.

Why sampling matters

While Span Metrics does not require trace sampling to be configured in any specific way, we recommend using a minimal sampling rate, either head or tail sampling, to ensure that enough trace examples remain available for UI features that rely on actual traces. These include service maps, drilldowns, Coralogix traces explorer, Spans Highlights, and more. Span Metrics will still provide complete RED metrics even with sampling, but maintaining a small set of representative traces significantly improves the troubleshooting experience.

By default, head sampling is enabled in our latest Helm chart. This setting can be changed either directly in the values.yaml or from the UI.

To learn more about head and tail sampling, refer to the following documents:

Manage cardinality

Use the following methods to reduce or control cardinality:

Normalize span names and URLs.
Sanitize database statements.
Remove high-cardinality resource attributes.
Set aggregationCardinalityLimit and metricsExpiration to protect the pipeline.

For more details, see Span Metrics Cardinality Limiting.

Manage histogram metric buckets

Histogram buckets define how latency distributions are computed for percentiles, SLOs, and Apdex scoring. Span Metrics does not include default buckets - you must configure them manually.

Configure collector buckets

connectors:
  spanmetrics:
    histogram:
      explicit:
        buckets: [100us, 1ms, 2ms, 4ms, 6ms, 10ms, 100ms, 250ms]

Select thresholds that reflect the latency behavior of your workloads.

Apdex settings

For Apdex scoring, your bucket list must include:

the Apdex threshold (T)
the tolerated threshold (4T)

In the example above, the Apdex threshold can be set to 1ms because both T (1ms) and 4T (4ms) are specified.

Modifying buckets used by old SLO or Apdex

If you modify bucket configuration for metrics already used in SLOs or Apdex scoring:

The active calculation stops.
The UI shows a bucket mismatch error.
You must update the SLO/Apdex definition to match the new buckets.
The calculation restarts as a new SLO.

Selecting buckets

Best practices:

Include both T and 4T.
Analyze your latency distribution (minimum, median, maximum).
Add intermediate buckets for meaningful granularity.
Exclude buckets for rare anomalies unless they occur frequently.
Keep bucket lists consistent across all collectors and clusters.

Configure different buckets per application (Kubernetes only)

Use the spanMetricsMulti preset to define bucket sets per service.

presets:
  spanMetricsMulti:
    enabled: false
    defaultHistogramBuckets: [1ms, 4ms, 10ms, 20ms, 50ms, 100ms, 200ms, 500ms, 1s, 2s, 5s]
    configs:
      - selector: route() where attributes["service.name"] == "one"
        histogramBuckets: [1s, 2s]
      - selector: route() where attributes["service.name"] == "two"
        histogramBuckets: [5s, 10s]

Each selector requires an OTTL statement. Find out more here.

Enable spanMetricsMulti in Helm

Use this mode if you want to define metrics for each service. It is recommended to have a broad bucket definition that covers the system. spanMetricsMulti allows for more detailed per-service metrics.

spanMetrics:
  enabled: false

spanMetricsMulti:
  enabled: true

Note

If you use multiple collectors, ensure that the bucket configuration is consistent across all of them.

Set metric expiration

metricsExpiration removes inactive metric series after a defined window to control memory consumption.

metricsExpiration: 5m

Retention behavior

Metric series are stored only in-memory.
All series reset on Collector or pod restart.
If a service stops sending spans for the duration of the expiration period, its series are removed automatically.

Use resource key attributes

Use resource_metrics_key_attributes to group metric series at the resource level.

This stabilizes metric grouping across services, environments, clusters, and tenants.

resource_metrics_key_attributes:
  - service.name
  - service.namespace

Use exemplars

Exemplars link metrics to the traces that generated them, enabling seamless navigation from metric panels to the corresponding span data. This improves issue investigation and helps correlate latency spikes or error rates with specific requests.

Exemplars are enabled by default in most configurations, but you can disable them if your environment if needed:

opentelemetry-agent:
  extraConfig:
    connectors:
      spanmetrics:
        exemplars:
          enabled: true

Configure Serverless environments

The Service Catalog functions seamlessly with both Span Metrics and Events2Metric, as long as all instructions in the documentation are followed correctly.

To display services based on AWS Lambda in the Service Catalog, your organization must send spans or Span Metrics. With Events2Metric (E2M), this is done automatically. For Span Metrics, ensure that Lambda-generated spans are routed through the collector.

Note

This ensures that Lambda-based services, along with their metrics, are displayed in the Service Catalog. Serverless catalog is supported only using Events2Metrics.

Next steps

Explore the Span Metrics Cardinality Limiting guide to learn about the three-layered approach to manage and mitigate cardinality issues.

Need help? Contact Support.

What's new? Find out here.

LLM? Read llms.txt.

Previous Quick Start

Next Cardinality Limiting