APM tab
The APM tab introduces a per-service view of APM ingestion. Per-metric measurements remain available in the All metrics tab. The APM tab adds the service dimension so you can see and limit APM usage on a per-service basis.
Previously, a rogue label on a single service exhausted the account's trace billing units, caused an APM outage for the affected metric across the account, and took down other services that shared the metric (noisy neighbor).
Per-service APM spanmetrics splits APM usage and limits by service:
- Per-service attribution. Coralogix tracks each service's share of APM volume, time series, and units independently, so the source of any spike is obvious.
- Per-service limits. Each service has its own APM usage budget across all 16 APM spanmetrics. When one service hits its limit, Coralogix limits only that service's data. Every other service keeps reporting at full resolution.
Use the APM tab to see how each service is tracking against its limit. The APM tab helps observability teams:
- Track APM volume and cost per service.
- Find services approaching their per-service limit and act before they reach it.
- Compare services to prioritize cleanup.
What you need
- The
METRICS.DATA-ANALYTICS#HIGH:READpermission.
Open the APM tab
- Go to Settings, then select Metric data.
- Select the APM tab.
- Use the date picker to select a day. The current day is the default.
- Select Share to copy a permalink to the current view.
Review top-level usage
Three summary charts aggregate ingestion across every APM service on the selected day, so a spike there signals account-wide pressure even before you know which service is responsible. Use the charts to understand how APM volume and time series count change over time:
Total samples: Metric data points ingested on the selected day.
Use this chart to spot ingestion spikes or drops that might indicate deployment issues, noisy services, or unexpected increases in span volume.
Total time series: Unique time series on the selected day.
Use this chart to detect sudden growth in time series count, often caused by new services, unsanitized labels (for example, raw URL paths in
span_name), or dimensions that produce many time series.Total units: Billing units generated on the selected day.
Use this chart to see how ingestion or time series changes influence your cost. A spike in units without a matching spike in samples often points to a small number of services producing excessive time series.
Hover any bar to see the exact value for that day.
Explore the Services table
The Services table attributes APM volume, time series, and units to each service that emitted APM data on the selected day, so the source of any account-level spike is one row away. A single service that starts producing a runaway label appears as an outsized share of the account total before it eats into the account's trace budget or trips its per-service limit.
The table includes the following fields:
- Name: The
service_namelabel value. - Status: The service's current limit state. Use this to spot services that are approaching or have already exceeded their service-level limit.
Usage: Aggregate data volume (in GB or TB) and billing units across all 16 APM spanmetrics for this service.
Use this column to identify services that consume the most storage or cost.
% Usage: This service's share of total APM usage across all services.
Use this to compare the relative cost impact of each service.
Samples: Total sample count for this service across all 16 APM spanmetrics.
Use this to understand ingestion volume and confirm whether high-unit services truly have high sample counts.
% Samples: This service's share of total APM samples across all services.
Time series: Total unique time series across all 16 APM spanmetrics for this service.
Use this to locate services that contribute most to your APM time series footprint. This is the key metric for service-level time series limits, which are configured under Fair usage limits, in the APM service level limits section.
% Time series: This service's share of total APM time series across all services.
Use this to rank services by their impact on time series count.
Use the Search field to filter services by name. Sort any column by selecting its header. The default sort is by Usage descending — highest-usage services appear first.
Investigate and optimize APM ingestion
Use the Services table to:
- Rank services by cost. Sort by Usage or % Usage to find the highest-cost services.
- Identify high-time-series services. Sort by Time series or % Time series to see which services create the most time series.
- Spot outliers. A small number of services often account for a disproportionate share of time series. Look for services whose share of samples is in the single digits but whose share of time series is in the double digits.
Drill into a specific service
Once a service stands out in the Services table, the drilldown isolates its contribution so you can find the specific metric driving the spike. Select any row in the Services table to open the service drilldown. The drilldown header shows the service name, total Time series, total Usage (GB and units), and total Samples on the selected day, along with a date picker.
The Overview tab visualizes daily ingestion trends for the service.
Use the Show toggle to switch the unit displayed across all charts:
- Unit usage: Billing units per day.
- Data volume: Bytes ingested per day.
- Sample count: Data points per day.
- Time series: Unique series per day.
The drilldown includes a Metric [mode] per day panel: a bar chart showing the selected mode aggregated across all 16 APM spanmetrics for this service, broken down by day.
Use the drilldown to:
- Detect ingestion spikes or drops on a specific day.
- Compare trends across modes. For example, a time series climb without a matching sample-count climb often indicates a label-value explosion.
Limitations
- The APM tab covers only the 16 APM spanmetrics. For other metrics, use the All metrics tab.
Limit enforcement
Service-level APM limits are enforced automatically. When a service exceeds its limit, the service writes straight to the archive. Use the limita.labsViolation dataset to continue the workflow.