groupby
Description
The groupby
command aggregates documents that share one or more common values or calculated expressions, allowing you to compute metrics such as sum
, avg
, max
, min
, and count
. It is the cornerstone of DataPrime’s analytical capabilities, enabling powerful summarization and insight generation from raw event data. Each unique combination of grouping expressions produces a single output document. The aggregate
or agg
keyword specifies which aggregation functions to apply within each group.
Note
You can group by both keypaths and calculated expressions. When grouping by an expression, DataPrime evaluates it dynamically for each document before grouping.
Syntax
groupby <grouping_expression> [as <alias>] [, <grouping_expression_2> [as
<alias_2>], ...] [aggregate|agg] <aggregate_function> [as <result_keypath>] [,
<aggregate_function_2> [as <result_keypath_2], ...]
Example 1
Use case: Calculate average request duration per HTTP path
The groupby
command is most commonly used to aggregate numerical metrics by a categorical field. In this example, we group by path
and calculate the average request duration, producing one result per endpoint.
Example data
{ "path": "/home", "duration": 400 },
{ "path": "/home", "duration": 600 },
{ "path": "/about", "duration": 512 },
{ "path": "/store", "duration": 33 }
Example query
Example output
path | average_duration_for_path |
---|---|
/home | 500 |
/about | 512 |
/store | 33 |
The groupby
command collapses all documents with the same path
value into a single record, applying the avg()
aggregation function to compute the mean duration. Only the grouping fields and calculated aggregates are retained in the output.
Example 2
Group by a transformed key:
Example query
Example 3
Cluster IPs into subnets:
Example query
Example 4
Compute 99th percentile from traces:
Example query
filter $l.operationName == 'Lambda.Invoke'
| groupby $l.operationName agg percentile(0.99, $m.duration)