Skip to content

dedupeby

Description

The dedupeby command removes duplicate documents based on one or more expressions, keeping only N events for each unique combination of the specified fields. This is especially useful for sampling representative data from large datasets without aggregation.

Conceptually, it functions like a smart filter: it doesn’t modify event content or compute summaries—it simply trims redundancy by retaining a limited number of examples per group.

Note

The content of each retained document remains unchanged. dedupeby only limits how many documents are kept for each unique grouping.

Syntax

dedupeby <expression1> [, <expression2> ...] keep N

Example

Use case: Sample unique requests per operation name

Suppose your application receives many repeated requests across endpoints, such as /index and /healthcheck. You want to inspect only a few examples of each to spot anomalies or patterns without processing every event. dedupeby can keep just a fixed number of samples for each unique operation.

Example data

{ "operationName": "index", "latency": 120 },
{ "operationName": "index", "latency": 98 },
{ "operationName": "index", "latency": 110 },
{ "operationName": "healthcheck", "latency": 4000 },
{ "operationName": "healthcheck", "latency": 200 },
{ "operationName": "healthcheck", "latency": 350 },
{ "operationName": "index", "latency": 125 },
{ "operationName": "index", "latency": 135 },
{ "operationName": "healthcheck", "latency": 109 },
{ "operationName": "healthcheck", "latency": 4150 }

Example query

dedupeby operationName keep 2

Example output

{ "operationName": "index", "latency": 120 },
{ "operationName": "index", "latency": 98 },
{ "operationName": "healthcheck", "latency": 4000 },
{ "operationName": "healthcheck", "latency": 200 }

The dedupeby command keeps two events for each unique operationName, trimming duplicates while preserving the original event content. This provides a quick, representative sample for inspection or debugging.