How to use DataPrime to detect patterns and anomalies in your data
Goal
By the end of this guide you should be able to use DataPrime to filter, group, and statistically analyze your data to uncover frequent patterns and rare anomalies. You'll learn how to apply operators like matches
, countby
, percentile
, and bottom
to identify expected and unexpected behavior in your logs.
Why it matters
Every system has a baseline of "normal" behavior—and detecting when things deviate is critical. Whether you're tracking user behavior, spotting error spikes, or monitoring rare edge cases, DataPrime gives you powerful tools to surface these patterns in real time or retroactively.
matches
– Filter logs that match a regular expression
Description
Use matches
to test if a string field matches a given regular expression. This is useful for detecting structured patterns in log messages.
Syntax
Example: Detect well-formed user purchase logs
Description: Filter logs where the msg
field matches the pattern "User [Name] bought [Number] [Item]"
.
Sample data
{ "msg": "User Chris bought 10 sunglasses" }
{ "msg": "User James bought 1 bed" }
{ "msg": "User X bo" }
Query
Result
Example: Detect malformed or truncated messages
Description: Flip the logic using !matches
to find logs that deviate from the expected pattern.
Query
Result
countby – Count grouped values
Description
Use countby
to group documents by a key and count how many times each value appears. Ideal for identifying common or rare categories.
Syntax
Example: Count logs by HTTP status code
Sample data
{ "status_code": 200 }
{ "status_code": 404 }
{ "status_code": 500 }
{ "status_code": 200 }
{ "status_code": 403 }
Query
Result
status_code | status_count |
---|---|
200 | 2 |
404 | 1 |
500 | 1 |
403 | 1 |
Example: Group status codes into buckets
Query
filter responseStatus.code != null
| create status_range from
case_greaterthan {
responseStatus.code,
499 -> '5xx',
399 -> '4xx',
299 -> '3xx',
199 -> '2xx',
99 -> '1xx',
_ -> 'other'
}
| countby status_range into status_count
Result
status_range | status_count |
---|---|
2xx | 6619 |
4xx | 172 |
5xx | 101 |
percentile – Find statistical outliers
Description
Use percentile
to calculate the Nth percentile of a numeric field within a group. Useful for identifying slowest responses or largest payloads.
Syntax
Example: Find 99th percentile latency by path
Sample data
{ "operationName": "SELECT", "duration": 120 }
{ "operationName": "SELECT", "duration": 350 }
{ "operationName": "DELETE", "duration": 90 }
{ "operationName": "SELECT", "duration": 900 }
Query
Result
path | latency_p99 |
---|---|
/home | 900 |
/about | 90 |
bottom – Return the least frequent items
Description
Use bottom
to return the bottom N values based on count or another metric. This is ideal for spotting rarely used accounts, endpoints, or hosts.
Syntax
Example: Least active users by count
Sample data
Query
Result
user | count |
---|---|
Bob | 1 |
Charlie | 1 |
Example: Least active users by total time
Sample data
{ "user": "Alice", "duration_ms": 500 }
{ "user": "Alice", "duration_ms": 800 }
{ "user": "Bob", "duration_ms": 100 }
{ "user": "Charlie", "duration_ms": 200 }
Query
Result
user | activity_count | total_time |
---|---|---|
Bob | 1 | 100 |
Charlie | 1 | 200 |
Visualize result
Each of the above commands transforms your dataset and gives you focused insight:
matches
filters for structural conformity.countby
helps visualize dominant categories.percentile
highlights statistical outliers.bottom
finds rare or underused values.
You should see logs, tables, or metrics showing these patterns clearly—ready for dashboards or alerts.
Common pitfalls
- Too narrow filters: If your regular expressions or filters are too strict, you might exclude valuable data. Try starting broad, then refining.
- Misusing
bottom
: Without an explicitby
clause,bottom
may return confusing results. Always pair it withcount()
or another aggregation.