Aggregating data
Goal
By the end of this guide you should be able to use aggregate
, groupby
, avg
, count_if
, and other functions to calculate metrics such as averages, totals, and conditional counts from your logs or traces.
Why it matters
Understanding what your systems are doing at scale often requires more than just viewing raw logs, it requires summarization. Aggregations let you answer questions like “Which endpoints are the slowest?”, “How many error traces exceeded 1s?” or “What is the average request size per service?” without manually inspecting individual events.
Count all logs or traces
Description
Use count()
to return the total number of documents in the dataset or current time range. This is the simplest and fastest way to verify volume before applying filters or grouping.
Syntax
Example: Count total number of trace spans
Sample data
{ "trace_id": "t1", "duration": 500 }
{ "trace_id": "t2", "duration": 620 }
{ "trace_id": "t3", "duration": 80 }
Query
Result
total_spans |
---|
3 |
Group by a field and calculate an average
Description
Use groupby
with avg()
to measure how a numerical value (e.g., duration, size, latency) varies by category.
Syntax
Example: Compute average span duration per region
Sample data
{ "cloud_region": "eu-north-1", "duration": 500 }
{ "cloud_region": "eu-north-1", "duration": 654 }
{ "cloud_region": "eu-west-1", "duration": 2100 }
{ "cloud_region": "us-east-2", "duration": 150 }
Query
Result
cloud_region | avg_duration |
---|---|
eu-north-1 | 577.0 |
eu-west-1 | 2100.0 |
us-east-2 | 150.0 |
This highlights regional performance differences. You can use this pattern to monitor latency by path, status code, or service.
Add multiple aggregations at once
Description
You can combine multiple aggregation functions in a single query to analyze different metrics simultaneously.
Example: Measure average, max, and count of spans per region
Sample data
{ "cloud_region": "eu-north-1", "duration": 500 }
{ "cloud_region": "eu-north-1", "duration": 900 }
{ "cloud_region": "eu-west-1", "duration": 2100 }
{ "cloud_region": "us-east-2", "duration": 150 }
Query
source spans
| groupby cloud_region aggregate
avg(duration) as avg_duration,
max(duration) as max_duration,
count() as request_count
Result
cloud_region | avg_duration | max_duration | request_count |
---|---|---|---|
eu-north-1 | 700.0 | 900 | 2 |
eu-west-1 | 2100.0 | 2100 | 1 |
us-east-2 | 150.0 | 150 | 1 |
This approach is useful for analyzing scale and performance in the same query.
Filter before aggregating
Description
Use filter
to narrow down the dataset before aggregating. This improves performance and ensures your calculations reflect only relevant data.
Example: Count only slow requests per region
Sample data
{ "cloud_region": "eu-north-1", "duration": 900 }
{ "cloud_region": "eu-north-1", "duration": 300 }
{ "cloud_region": "us-east-2", "duration": 800 }
Query
Result
cloud_region | slow_request_count |
---|---|
eu-north-1 | 1 |
us-east-2 | 1 |
The filter ensures that only spans above 500ms are included in the aggregation.
Use count_if
for conditional metrics
Description
Use count_if()
to count how many documents meet a specific condition within a group.
Syntax
Example: Compare total and slow spans per region
Sample data
{ "cloud_region": "eu-west-1", "duration": 700 }
{ "cloud_region": "eu-west-1", "duration": 400 }
{ "cloud_region": "us-east-2", "duration": 300 }
{ "cloud_region": "us-east-2", "duration": 1200 }
Query
source spans
| groupby cloud_region aggregate
count() as total_requests,
count_if(duration > 500) as slow_requests
Result
cloud_region | total_requests | slow_requests |
---|---|---|
eu-west-1 | 2 | 1 |
us-east-2 | 2 | 1 |
This lets you track the proportion of problematic spans per group.
Group by calculated severity level
Description
You can dynamically group documents using logic via a case
expression. This is useful for creating buckets like "fast", "slow", and "critical".
Syntax
groupby case {
<condition1> -> <label1>,
<condition2> -> <label2>,
_ -> <default_label>
} as <alias>
aggregate count() as <alias>
Example: Group spans into latency tiers
Sample data
Query
source spans
| groupby case {
duration > 1000 -> 'super slow',
duration > 500 -> 'slow',
_ -> 'ok'
} as latency_level
| aggregate count() as request_count
Result
latency_level | request_count |
---|---|
super slow | 1 |
slow | 2 |
ok | 8 |
Aggregate over spans (traces)
Description
You can aggregate directly on spans using basic metrics like count and max. This is often used to get an overview of tracing volume and performance.
Example: Count spans and find longest one
Sample data
Query
Result
trace_count | max_trace_duration |
---|---|
3 | 20000 |
This is helpful for validating trace ingestion and spotting extreme latency outliers.
Common pitfalls or gotchas
groupby
removes all fields not explicitly included. Add fields as grouping keys or aggregations if you want to preserve them.- Aggregations like
avg
,count_if
, andmax
ignorenull
values. Make sure fields you're aggregating on are populated. - You can only group by scalar values—arrays and objects are not supported unless flattened or transformed first.