How to Use Explore Tracing
Overview
Explore tracing is designed around how you investigate real issues.
You rarely start with a complete hypothesis. Instead, you start with a signal:
- An alert fires
- An SLO is breached
- A bug is reported
- An issue is escalated to you
Explore tracing helps you narrow the scope, identify abnormal behavior, and determine which part of the system causes the issue.
Prerequisites
Explore tracing relies on correct trace context propagation across services. Most investigation issues occur when trace context is missing or not configured consistently, resulting in fragmented traces and disconnected spans.
Coralogix tracing follows OpenTelemetry (OTel) requirements. For an overview of required tracing concepts and context propagation, see the OpenTelemetry documentation.
How you arrive at Explore tracing
You typically enter Explore tracing from one of several paths:
- Alerts and SLOs: An error‑rate or latency SLO is breached. From the SLO drilldown, you can get directly into traces or spans related to the affected service or action.
- Investigate traces: You open Explore, filter by service, flow, or time range, and use Highlights to identify spikes in errors or latency.
- Cases and escalations: An issue is escalated by another team, prompting deeper analysis of production behavior.
Investigation journey in Explore Tracing
Explore tracing is an investigation page. On this page, you work with the same underlying data and switch your focus between traces and spans as your investigation evolves.
The investigation usually starts by identifying problematic traces and then narrowing down to the spans that explain the behavior.
Investigation flow: From traces to spans
Explore tracing supports a single, continuous investigation flow. You begin by understanding request-level behavior and then narrow your focus to the operations that explain it.
Start with traces to identify requests that behave abnormally. At this stage, you compare end-to-end executions to answer a simple question: Which trace best represents the problem?
From the trace perspective, you can review many traces for a selected time range, service, action, or flow, compare durations and error status, and identify traces that breached latency thresholds or returned errors. The goal is to select one or more representative traces.
Shift to spans once you have a trace of interest. Switching perspective lets you analyze individual operations across traces to answer a deeper question: Which span explains the behavior?
From the span perspective, you analyze RED signals (rate, errors, duration), filter and query by service, operation, status code, or attributes, and compare similar spans across traces. Spans view helps you understand whether the issue is isolated to a single request or affects the system more broadly.
At any point, you can select a trace or span to open the drilldown view and inspect structure, timing, and related context in detail.
Drilldown: Visualize and analyze traces
Selecting a trace or span opens the trace drilldown view, where you validate findings and gather all relevant context without leaving Explore tracing.
The drilldown brings together structure, timing, and metadata so you can answer why a request behaved the way it did.
Visualize and validate
The trace drilldown view provides multiple ways to visualize a single trace, so you can validate assumptions and understand behavior from different angles.
Rather than learning each visualization in isolation, use them as part of the investigation flow:
- Switch visualizations to understand structure, dependencies, and timing.
- Confirm where latency accumulates or where execution diverges.
- Correlate spans with events, logs, and other telemetry.
For detailed explanations of each visualization mode, see Visualize Spans and Traces.
Info panel: Your source of truth
The Info panel appears alongside the selected trace or span and acts as a structured, searchable source of truth during an investigation. When you select a trace, the panel displays details for the root span.
From the Info Panel, you can also take quick actions such as copying values, pinning important fields, or viewing the related views to continue the investigation seamlessly.
For a detailed breakdown of fields, tags, and visualization behavior, see Info Panel.
Related Data: Correlated signals in one place
The drilldown includes a Related Data section that brings together telemetry connected to the selected span, trace, or service—such as logs (including errors), events, profiling, infrastructure context, and AI insights. This helps you investigate faster by keeping everything in the same view and reducing context switching.
For setup instructions and a deeper walkthrough of each tab, see Span Related Data.
Headers and quick actions: investigate faster
Trace and span headers surface key context (service, operation, status, duration, and time) so you can assess issues without opening the Info panel. Use header menus to copy values, include or exclude attributes in the query, open Logs Explore, or pivot to APM service context and Profiles. For more details, see Headers and Quick Actions.
Example investigation paths
SLO-driven investigation with span distribution
You notice that the error-rate SLO for a frontend service is breached and want to understand whether the issue is isolated or systemic.
- Open the SLO drilldown and confirm an increase in HTTP 500 errors.
- Navigate into Explore tracing, already filtered to the affected service and time range.
- Start from the trace perspective to identify traces that returned errors or show abnormal latency.
- Select a representative trace and open it in the drilldown view.
- Shift focus to the span perspective to identify spans returning HTTP 500 or showing increased duration.
- Select a suspect span to open the Info Panel.
- Use the span duration distribution heatmap to compare the selected span against recent similar spans (same service and operation).
- Confirm whether the span falls outside the normal latency distribution.
- Determine whether the slowdown is an outlier or part of a broader regression.
- Open Highlights to understand what changed in the specific time bucket or cell you came from.
- Inspect field distributions for that slice of time.
- Identify which attributes or values explain the spike or anomaly you observed.
- From the Info Panel or Highlights, navigate to related logs or databases to validate the cause of the error or latency.
This flow helps you move from an SLO breach to concrete evidence of abnormal span behavior, and then to the supporting context needed to confirm root cause.
Exploratory analysis
You want to understand how a request flows through your system and whether recent behavior looks different from normal.
- Open Explore tracing and filter by your service of interest and time range.
- Review traces to get a high-level sense of latency and error behavior.
- Select a trace of interest and open it in the drilldown view.
- Switch to Service view to visualize how services communicate within the trace.
- Understand the order of activities across services.
- See how requests move through microservices and message brokers such as Kafka, RabbitMQ, or SQS.
- Use this view to identify unexpected dependencies or slow service interactions.
- Switch back to Spans view to analyze the specific operations responsible for the behavior.
Explore tracing investigation principles
Use Explore tracing as an investigation workflow, not just a visualization:
- Start broad, then narrow your focus.
- Compare behavior before drilling into details.
- Let traces guide you to the spans that matter.
- Use related data to confirm or challenge your assumptions.
By following this approach, you can move from symptoms to root cause with confidence, even in complex distributed systems.