Understanding commands

Goal

By the end of this guide, you should be able to:

Recognize the core DataPrime commands and when to use them.
Combine multiple commands to perform real-time filtering, transformations, and aggregations.
Understand the structural difference between commands and functions in a query pipeline.

Why It Matters

Commands are the engine behind most DataPrime queries. Raw logs are rarely in the shape you need. Whether you're triaging an incident, generating a report, or building a dashboard, you’ll need to transform your data quickly and safely. Commands in DataPrime provide the building blocks to:

filter out noise
extract structure
enrich and join values
compute statistics
reshape your dataset in real time

Mastering these tools is what turns a basic query into a flexible investigation or automation pipeline.

Commands vs. Functions

In DataPrime, commands are top-level operations that act on rows and datasets. They differ from functions, which transform individual values. Commands act on rows, fields, and entire document sets.

Common Patterns and Syntax

Most commands appear at the start of a line and accept one or more arguments or expressions.

They’re usually chained with the pipe (|) operator like with filter, groupby, and top in the following example:

source logs
| filter status_code >= 500
| groupby path aggregate count() as error_count
| top 5 path by error_count

Data flows from left to right and top to bottom. Each command transforms the dataset further.

Core Command Categories and Examples

Type- and Format-Specific Operations

source – Explicitly define your data source (logs, spans, metrics, or enrichment tables).
```
source logs
```
limit – Limit number of rows.
```
limit 100
```
orderby / sortby – Sort by an expression.
```
sortby duration desc
```

Selection & Filtering

These commands reduce the dataset by applying filters or keeping only relevant fields.

filter – Keep rows where a condition is true.
```
filter status_code >= 500
```
block – The opposite of filter. Remove rows that match a condition.
```
block method == 'OPTIONS'
```
choose / select – Keep only the specified fields.
```
choose path, status_code
```
distinct – Return one row per unique value.
```
distinct user_id
```
find / text – Free-text search within a field or across all data.
```
find 'timeout' in message
text '503'
```

Data Creation & Mutation

Commands that let you generate new fields or modify existing ones.

create / add / c – Define a new field based on an expression. This acts similarly to a variable.
```
create is_error from status_code >= 500
```

replace – Overwrite a field with a new value.

replace duration_ms with duration / 1_000_000

remove – Remove fields from the document.
```
remove user_agent
```
convert – Make type conversions explicit for readability.
```
convert datatypes status_code:number
```

redact – Mask sensitive data using regex.

redact message matching /[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}/ to '[EMAIL]'

Aggregation & Grouping

These commands reduce many rows into a summary using groupings or statistics.

aggregate – Run one or more aggregation functions on the entire dataset.
```
aggregate count() as total_logs
```
groupby – Group by a field or expression, and aggregate within those groups.
```
groupby path aggregate avg(duration) as avg_duration
```
countby – Shorthand to group and count.
```
countby status_code into error_counts
```
top / bottom – Get top/bottom N records by a sort metric.
```
top 5 path by count()
```
multigroupby – Nested groupings (use sparingly for performance).

Parsing & Extraction

Commands that turn unstructured data into usable fields.

extract – Use regex or key-value logic to pull fields out of strings.

extract message into fields using regexp(e=/(?<user>\w+) did (?<action>\w+)/)

explode – Split an array into multiple rows.

explode scopes into scope original preserve

Joins & Enrichment

Commands that combine data from other sources or augment with external context.

enrich – Join with a lookup table (e.g., employee info).
```
enrich user_id into user_info using employees
```

join – Combine two queries based on a condition.

source users | join (source logs | countby userid) on id == userid into logins

Deduplication

Reduce redundancy or volume.

dedupeby – Keep N unique combinations based on expression(s).
```
dedupeby operationName keep 5
```

When to Use a Command vs a Function

Use a function when working with individual field values (ipInSubnet, length, urlDecode, etc.).
Use a command when transforming the shape, size, or structure of your dataset.

Gotchas

Commands must be in the correct order. For example, create before filter if you're filtering on a derived field.
Type mismatches can break filters or aggregations. Use convert or casts if necessary.
Chaining too many heavy operations on large datasets may exceed limits. Break into smaller queries if needed.

Understanding spans

Next Understanding functions