Skip to content

groupby

Description

The groupby command aggregates documents that share one or more common values or calculated expressions, allowing you to compute metrics such as sum, avg, max, min, and count. It is the cornerstone of DataPrime’s analytical capabilities, enabling powerful summarization and insight generation from raw event data. Each unique combination of grouping expressions produces a single output document. The aggregate or agg keyword specifies which aggregation functions to apply within each group.

Note

You can group by both keypaths and calculated expressions. When grouping by an expression, DataPrime evaluates it dynamically for each document before grouping.

Syntax

groupby <grouping_expression> [as <alias>] [, <grouping_expression_2> [as
<alias_2>], ...] [aggregate|agg] <aggregate_function> [as <result_keypath>] [,
<aggregate_function_2> [as <result_keypath_2], ...]

Example 1

Use case: Calculate average request duration per HTTP path

The groupby command is most commonly used to aggregate numerical metrics by a categorical field. In this example, we group by path and calculate the average request duration, producing one result per endpoint.

Example data

{ "path": "/home", "duration": 400 },
{ "path": "/home", "duration": 600 },
{ "path": "/about", "duration": 512 },
{ "path": "/store", "duration": 33 }

Example query

groupby path agg avg(duration) as average_duration_for_path

Example output

pathaverage_duration_for_path
/home500
/about512
/store33

The groupby command collapses all documents with the same path value into a single record, applying the avg() aggregation function to compute the mean duration. Only the grouping fields and calculated aggregates are retained in the output.

Example 2

Group by a transformed key:

Example query

groupby username.toLowerCase() agg count() as activity_count

Example 3

Cluster IPs into subnets:

Example query

groupby ipPrefix(process.tags['k8s.pod.ip'], 24) as subnet_range agg count() as activity_count

Example 4

Compute 99th percentile from traces:

Example query

filter $l.operationName == 'Lambda.Invoke'
| groupby $l.operationName agg percentile(0.99, $m.duration)