# `groupby`

## Description

The `groupby` command aggregates documents that share one or more common values or calculated expressions, allowing you to compute metrics such as `sum`, `avg`, `max`, `min`, and `count`. It is the cornerstone of DataPrime’s analytical capabilities, enabling powerful summarization and insight generation from raw event data. Each unique combination of grouping expressions produces a single output document. The `aggregate` or `agg` keyword specifies which aggregation functions to apply within each group.

Note

You can group by both keypaths and calculated expressions. When grouping by an expression, DataPrime evaluates it dynamically for each document before grouping.

## Syntax

```dataprime
groupby <grouping_expression> [as <alias>] [, <grouping_expression_2> [as
<alias_2>], ...] [aggregate|agg] <aggregate_function> [as <result_keypath>] [,
<aggregate_function_2> [as <result_keypath_2], ...]
```

## Example 1

**Use case: Calculate average request duration per HTTP path**

The `groupby` command is most commonly used to aggregate numerical metrics by a categorical field. In this example, we group by `path` and calculate the average request duration, producing one result per endpoint.

### Example data

```json
{ "path": "/home", "duration": 400 },
{ "path": "/home", "duration": 600 },
{ "path": "/about", "duration": 512 },
{ "path": "/store", "duration": 33 }
```

### Example query

```dataprime
groupby path agg avg(duration) as average_duration_for_path
```

### Example output

| path   | average_duration_for_path |
| ------ | ------------------------- |
| /home  | 500                       |
| /about | 512                       |
| /store | 33                        |

The `groupby` command collapses all documents with the same `path` value into a single record, applying the `avg()` aggregation function to compute the mean duration. Only the grouping fields and calculated aggregates are retained in the output.

## Example 2

Group by a transformed key:

### Example query

```dataprime
groupby username.toLowerCase() agg count() as activity_count
```

## Example 3

Cluster IPs into subnets:

### Example query

```dataprime
groupby ipPrefix(process.tags['k8s.pod.ip'], 24) as subnet_range agg count() as activity_count
```

## Example 4

Compute 99th percentile from traces:

### Example query

```dataprime
filter $l.operationName == 'Lambda.Invoke'
| groupby $l.operationName agg percentile(0.99, $m.duration)
```
