dedupeby
Description
The dedupeby command removes duplicate documents based on one or more expressions, keeping only N events for each unique combination of the specified fields. This is especially useful for sampling representative data from large datasets without aggregation.
Conceptually, it functions like a smart filter: it doesn’t modify event content or compute summaries—it simply trims redundancy by retaining a limited number of examples per group.
Note
The content of each retained document remains unchanged. dedupeby only limits how many documents are kept for each unique grouping.
Syntax
Example
Use case: Sample unique requests per operation name
Suppose your application receives many repeated requests across endpoints, such as /index and /healthcheck. You want to inspect only a few examples of each to spot anomalies or patterns without processing every event. dedupeby can keep just a fixed number of samples for each unique operation.
Example data
{ "operationName": "index", "latency": 120 },
{ "operationName": "index", "latency": 98 },
{ "operationName": "index", "latency": 110 },
{ "operationName": "healthcheck", "latency": 4000 },
{ "operationName": "healthcheck", "latency": 200 },
{ "operationName": "healthcheck", "latency": 350 },
{ "operationName": "index", "latency": 125 },
{ "operationName": "index", "latency": 135 },
{ "operationName": "healthcheck", "latency": 109 },
{ "operationName": "healthcheck", "latency": 4150 }
Example query
Example output
{ "operationName": "index", "latency": 120 },
{ "operationName": "index", "latency": 98 },
{ "operationName": "healthcheck", "latency": 4000 },
{ "operationName": "healthcheck", "latency": 200 }
The dedupeby command keeps two events for each unique operationName, trimming duplicates while preserving the original event content. This provides a quick, representative sample for inspection or debugging.