Skip to content

Aggregating data

Goal

By the end of this guide you should be able to use aggregate, groupby, avg, count_if, and other functions to calculate metrics such as averages, totals, and conditional counts from your logs or traces.

Why it matters

Understanding what your systems are doing at scale often requires more than just viewing raw logs, it requires summarization. Aggregations let you answer questions like “Which endpoints are the slowest?”, “How many error traces exceeded 1s?” or “What is the average request size per service?” without manually inspecting individual events.

Count all logs or traces

Description

Use count() to return the total number of documents in the dataset or current time range. This is the simplest and fastest way to verify volume before applying filters or grouping.

Syntax

count(): number

Example: Count total number of trace spans

Sample data

{ "trace_id": "t1", "duration": 500 }
{ "trace_id": "t2", "duration": 620 }
{ "trace_id": "t3", "duration": 80 }

Query

source spans
| aggregate count() as total_spans

Result

total_spans
3

Group by a field and calculate an average

Description

Use groupby with avg() to measure how a numerical value (e.g., duration, size, latency) varies by category.

Syntax

groupby <key> aggregate avg(<number_field>) as <alias>

Example: Compute average span duration per region

Sample data

{ "cloud_region": "eu-north-1", "duration": 500 }
{ "cloud_region": "eu-north-1", "duration": 654 }
{ "cloud_region": "eu-west-1", "duration": 2100 }
{ "cloud_region": "us-east-2", "duration": 150 }

Query

source spans
| groupby cloud_region aggregate avg(duration) as avg_duration

Result

cloud_regionavg_duration
eu-north-1577.0
eu-west-12100.0
us-east-2150.0

This highlights regional performance differences. You can use this pattern to monitor latency by path, status code, or service.


Add multiple aggregations at once

Description

You can combine multiple aggregation functions in a single query to analyze different metrics simultaneously.

Example: Measure average, max, and count of spans per region

Sample data

{ "cloud_region": "eu-north-1", "duration": 500 }
{ "cloud_region": "eu-north-1", "duration": 900 }
{ "cloud_region": "eu-west-1", "duration": 2100 }
{ "cloud_region": "us-east-2", "duration": 150 }

Query

source spans
| groupby cloud_region aggregate
    avg(duration) as avg_duration,
    max(duration) as max_duration,
    count() as request_count

Result

cloud_regionavg_durationmax_durationrequest_count
eu-north-1700.09002
eu-west-12100.021001
us-east-2150.01501

This approach is useful for analyzing scale and performance in the same query.


Filter before aggregating

Description

Use filter to narrow down the dataset before aggregating. This improves performance and ensures your calculations reflect only relevant data.

Example: Count only slow requests per region

Sample data

{ "cloud_region": "eu-north-1", "duration": 900 }
{ "cloud_region": "eu-north-1", "duration": 300 }
{ "cloud_region": "us-east-2", "duration": 800 }

Query

source spans
| filter duration > 500
| groupby cloud_region aggregate count() as slow_request_count

Result

cloud_regionslow_request_count
eu-north-11
us-east-21

The filter ensures that only spans above 500ms are included in the aggregation.


Use count_if for conditional metrics

Description

Use count_if() to count how many documents meet a specific condition within a group.

Syntax

count_if(condition: bool): number

Example: Compare total and slow spans per region

Sample data

{ "cloud_region": "eu-west-1", "duration": 700 }
{ "cloud_region": "eu-west-1", "duration": 400 }
{ "cloud_region": "us-east-2", "duration": 300 }
{ "cloud_region": "us-east-2", "duration": 1200 }

Query

source spans
| groupby cloud_region aggregate
    count() as total_requests,
    count_if(duration > 500) as slow_requests

Result

cloud_regiontotal_requestsslow_requests
eu-west-121
us-east-221

This lets you track the proportion of problematic spans per group.


Group by calculated severity level

Description

You can dynamically group documents using logic via a case expression. This is useful for creating buckets like "fast", "slow", and "critical".

Syntax

groupby case {
  <condition1> -> <label1>,
  <condition2> -> <label2>,
  _ -> <default_label>
} as <alias>
aggregate count() as <alias>

Example: Group spans into latency tiers

Sample data

{ "duration": 1200 }
{ "duration": 800 }
{ "duration": 300 }

Query

source spans
| groupby case {
    duration > 1000 -> 'super slow',
    duration > 500 -> 'slow',
    _ -> 'ok'
  } as latency_level
| aggregate count() as request_count

Result

latency_levelrequest_count
super slow1
slow2
ok8

Aggregate over spans (traces)

Description

You can aggregate directly on spans using basic metrics like count and max. This is often used to get an overview of tracing volume and performance.

Example: Count spans and find longest one

Sample data

{ "duration": 8000 }
{ "duration": 20000 }
{ "duration": 1000 }

Query

source spans
| aggregate
    count() as trace_count,
    max(duration) as max_trace_duration

Result

trace_countmax_trace_duration
320000

This is helpful for validating trace ingestion and spotting extreme latency outliers.


Common pitfalls or gotchas

  • groupby removes all fields not explicitly included. Add fields as grouping keys or aggregations if you want to preserve them.
  • Aggregations like avg, count_if, and max ignore null values. Make sure fields you're aggregating on are populated.
  • You can only group by scalar values—arrays and objects are not supported unless flattened or transformed first.