How to use DataPrime to detect patterns and anomalies in your data

Goal

By the end of this guide you should be able to use DataPrime to filter, group, and statistically analyze your data to uncover frequent patterns and rare anomalies. You'll learn how to apply operators like matches, countby, percentile, and bottom to identify expected and unexpected behavior in your logs.

Why it matters

Every system has a baseline of "normal" behavior—and detecting when things deviate is critical. Whether you're tracking user behavior, spotting error spikes, or monitoring rare edge cases, DataPrime gives you powerful tools to surface these patterns in real time or retroactively.

`matches` – Filter logs that match a regular expression

Description

Use matches to test if a string field matches a given regular expression. This is useful for detecting structured patterns in log messages.

Syntax

matches(string: string, pattern: regexp): bool

Example: Detect well-formed user purchase logs

Description: Filter logs where the msg field matches the pattern "User [Name] bought [Number] [Item]".

Sample data

{ "msg": "User Chris bought 10 sunglasses" }
{ "msg": "User James bought 1 bed" }
{ "msg": "User X bo" }

Query

filter matches(msg, /User [A-Za-z]+ bought \d+ [A-Za-z]+/)

Result

{ "msg": "User Chris bought 10 sunglasses" }
{ "msg": "User James bought 1 bed" }

Example: Detect malformed or truncated messages

Description: Flip the logic using !matches to find logs that deviate from the expected pattern.

Query

filter !matches(msg, /User [A-Za-z]+ bought \d+ [A-Za-z]+/)

Result

{ "msg": "User X bo" }

countby – Count grouped values

Description

Use countby to group documents by a key and count how many times each value appears. Ideal for identifying common or rare categories.

Syntax

countby <expression> [as <alias>] [into <keypath>]

Example: Count logs by HTTP status code

Sample data

{ "status_code": 200 }
{ "status_code": 404 }
{ "status_code": 500 }
{ "status_code": 200 }
{ "status_code": 403 }

Query

countby status_code into status_count

Result
status_code status_count
200 2
404 1
500 1
403 1

Example: Group status codes into buckets

Query

filter responseStatus.code != null
| create status_range from 
    case_greaterthan {
      responseStatus.code,
      499 -> '5xx',
      399 -> '4xx',
      299 -> '3xx',
      199 -> '2xx',
      99  -> '1xx',
      _   -> 'other'
    }
| countby status_range into status_count

Result
status_range status_count
2xx 6619
4xx 172
5xx 101

percentile – Find statistical outliers

Description

Use percentile to calculate the Nth percentile of a numeric field within a group. Useful for identifying slowest responses or largest payloads.

Syntax

percentile(percent: number, value: number): number

Example: Find 99th percentile latency by path

Sample data

{ "operationName": "SELECT", "duration": 120 }
{ "operationName": "SELECT", "duration": 350 }
{ "operationName": "DELETE", "duration": 90 }
{ "operationName": "SELECT", "duration": 900 }

Query

source spans
| groupby path agg percentile(0.99, latency) as latency_p99

Result
path latency_p99
/home 900
/about 90

bottom – Return the least frequent items

Description

Use bottom to return the bottom N values based on count or another metric. This is ideal for spotting rarely used accounts, endpoints, or hosts.

Syntax

bottom <limit> <expression> by <aggregation>

Example: Least active users by count

Sample data

{ "user": "Alice" }
{ "user": "Bob" }
{ "user": "Alice" }
{ "user": "Charlie" }

Query

bottom 2 user by count()

Result
user count
Bob 1
Charlie 1

Example: Least active users by total time

Sample data

{ "user": "Alice", "duration_ms": 500 }
{ "user": "Alice", "duration_ms": 800 }
{ "user": "Bob", "duration_ms": 100 }
{ "user": "Charlie", "duration_ms": 200 }

Query

bottom 2 user, count() as activity_count by sum(duration_ms) as total_time

Result
user activity_count total_time
Bob 1 100
Charlie 1 200

Visualize result

Each of the above commands transforms your dataset and gives you focused insight:

matches filters for structural conformity.
countby helps visualize dominant categories.
percentile highlights statistical outliers.
bottom finds rare or underused values.

You should see logs, tables, or metrics showing these patterns clearly—ready for dashboards or alerts.

Common pitfalls

Too narrow filters: If your regular expressions or filters are too strict, you might exclude valuable data. Try starting broad, then refining.
Misusing bottom: Without an explicit by clause, bottom may return confusing results. Always pair it with count() or another aggregation.

Previous Conditional logic

Next Time

How to use DataPrime to detect patterns and anomalies in your data

Goal

Why it matters

matches – Filter logs that match a regular expression

Description

Syntax

Example: Detect well-formed user purchase logs

Example: Detect malformed or truncated messages

countby – Count grouped values

Description

Syntax

Example: Count logs by HTTP status code

Example: Group status codes into buckets

percentile – Find statistical outliers

Description

Syntax

Example: Find 99th percentile latency by path

bottom – Return the least frequent items

Description

Syntax

Example: Least active users by count

Example: Least active users by total time

Visualize result

Common pitfalls

`matches` – Filter logs that match a regular expression