Coralogix Span Metrics

Coralogix Span Metrics provides an automated way to transform and aggregate trace data into metrics, ensuring 100% APM metric coverage regardless of the sampling strategy you choose. Whether using head sampling, tail sampling, or a combination of both, this approach allows complete observability without requiring ingestion of every raw span.

These metrics —request counts, error counts, and latency measurements— are derived from all application traffic and work especially well with lower-tier traces, maximizing cost efficiency while providing better coverage than traditional solutions.

Span Metrics leverages the OpenTelemetry Span Metrics Connector to collect and process this data effectively.

Documentation

This guide provides a step-by-step approach to setting up and using Span Metrics in your APM environment. Here's what you'll find:

Why use Span Metrics?
Steps to take before you begin
Configuring the OpenTelemetry collector for Span Metrics
Enabling tail sampling for optimized trace retention
Defining metrics aggregation buckets
Switching the APM UI from Events2Metrics to Span Metrics
Troubleshooting common issues

Why use Span Metrics?

Attempting to ingest every span can be impractical due to cost, throughput limitations, and high data volume. On the other hand, aggressive sampling may lead to blind spots, limiting visibility into critical areas of your system.

Span Metrics solves this challenge by extracting essential performance data from traces, ensuring:

100% APM metric coverage, independent of your chosen sampling strategy
Optimized cost and storage by retaining only key spans (e.g., errors, high-latency spans, or those in the 95th percentile)
Seamless integration with head and tail sampling, allowing you to balance cost and insight effectively

Before you begin

Before you begin, resolve any potential high cardinality issues in your data.

High cardinality (over 300K), which occurs when metrics or spans contain labels with numerous unique values, such as user IDs, UUIDs, or session-specific data. This creates a large number of metric combinations, often exceeding practical limits. For example, using user-specific values in span names or labels can lead to exponentially growing cardinality, complicating metric analysis and visualization.

In cases of high cardinality caused by overly unique span names, we recommend adjusting your instrumentation or using the spanMetrics.spanNameReplacePattern parameter to replace the problematic values with a generic placeholder. For example, if your span name corresponds to template user-1234, you can use the following pattern to replace the user ID with a generic placeholder. This will result in your spans having a generalized name user-{id}. Learn how to replace specific span.name with a generic one as detailed here.

Use metrics_expiration when you want to control how long unexported metrics are kept in memory. Find out more here.

Collector configuration

Before configuring Span Metrics, consider the following key points.

Transitioning from E2M to Span Metrics: For customers transitioning from Events2Metrics (E2M) to Span Metrics, the default method is maintaining both Span Metrics and E2M. This allows E2M-based metrics to be generated alongside Span Metrics, enabling a fallback if necessary. Costs will be incurred for both pipelines. The user can update the default to a single method at any stage.
Data visibility: You cannot view Events2Metrics and Span Metrics data simultaneously in the UI. If both Span Metrics and E2M data are sent, use the API commands provided in this document to toggle between the two methods.
Metrics dimensions: Using Span Metrics allows integration of more dimensions directly from the collector. Remember that each dimension counts toward your quota. We recommend incorporating only essential dimensions into the collector configuration. For optimal performance, limit the total number of permutations to 300,000 per metric during any selected time frame.
SLO and Apdex: These settings are not automatically migrated when transitioning from Events2Metrics to Span Metrics. Define the buckets that represent your latency thresholds during the Span Metrics setup. Then, create the actual SLO and Apdex per service within the Service Catalog UI.
Adjustments in Grafana, Alerts, and Custom Dashboards: After migrating from Events2Metrics (E2M) to Span Metrics, if E2M metrics are no longer being generated, any custom dashboards, Grafana alerts, or other configurations that previously relied on Service Catalog E2M metrics must be updated to use Span Metrics instead.

Updating the collector

Update the collector either manually (using your own OpenTelemetry) or via HELM (using Kubernetes extension).

Metric	Label
duration_ms_sum	span_name, service_name, span_kind, status_code, http_method
duration_ms_bucket	span_name, service_name, span_kind, status_code, http_method, le
calls_total	span_name, service_name, span_kind, status_code, http_method
duration_ms_count	span_name, service_name, span_kind, status_code, http_method

Metric	Label
db_calls_total	status_code, db_system, db_namespace, span_name, db_operation_name, db_collection_name, service_name
db_duration_ms_sum	status_code, db_system, db_namespace, span_name, db_operation_name, db_collection_name, service_name
db_duration_ms_count	status_code, db_system, db_namespace, span_name, db_operation_name, db_collection_name, service_name
db_duration_ms_bucket	status_code, db_system, db_namespace, span_name, db_operation_name, db_collection_name, service_name, le

Metric	Label
duration_ms_sum	cgx_transaction, cgx_transaction_root
duration_ms_bucket	cgx_transaction, cgx_transaction_root
calls_total	cgx_transaction, cgx_transaction_root
duration_ms_count	cgx_transaction, cgx_transaction_root

Metric	Label
duration_ms_sum	rpc.grpc.status_code, http.response.status_code
duration_ms_bucket	rpc.grpc.status_code, http.response.status_code
calls_total	rpc.grpc.status_code, http.response.status_code
duration_ms_count	rpc.grpc.status_code, http.response.status_code

Metric	Label
duration_ms_sum	service.version
duration_ms_bucket	service.version
calls_total	service.version
duration_ms_count	service.version

Coralogix Span Metrics

Documentation

Why use Span Metrics?

Before you begin

Collector configuration

Updating the collector

Creating Span Metrics with the Kubernetes extension for OTel

Enabling Span Metrics

`Enabling tail sampling`

`Creating Span Metrics using your own OpenTelemetry`

`Enabling Span Metrics`

`Enabling tail sampling`

`Validating the metrics`

`Service Catalog (mandatory)`

`Databases Catalog (mandatory)`

`Transactions (optional)`

`API error tracking (optional)`

`Grouping by service version (optional)`

`Span Metrics buckets for percentiles, SLO and Apdex`

`Configuring collector buckets`

`Apdex settings`

`Modifying buckets used for active SLOs or Apdex calculations`

`Selecting buckets`

`Configure different buckets per application`

`Exemplars`

`Using multiple OTel agents`

`Full configuration with Database Catalog`

`Span Metrics`

`Events2Metric`

`Enabling API error tracking`

`Using OTel Kubernetes extension`

`Switching APM UI from Events2Metrics to Span Metrics`

`Switching from Event2Metrics to Span Metrics`

`Reverting to Events2Metrics collection`

`Disabling Events2Metrics`

`Validating your data source`

`Reducing data volume`

`Present Lambda functions with Span Metrics for Service Catalog`

`Permissions`