# How to use DataPrime to detect patterns and anomalies in your data

## Goal

By the end of this guide you should be able to use DataPrime to filter, group, and statistically analyze your data to uncover frequent patterns and rare anomalies. You'll learn how to apply operators like [`matches`](https://coralogix.com/docs/dataprime/language-reference/functions-reference/string/matches/index.md), [`countby`](https://coralogix.com/docs/dataprime/language-reference/commands-reference/countby/index.md), [`percentile`](https://coralogix.com/docs/dataprime/language-reference/functions-reference/aggregation/percentile/index.md), and [`bottom`](https://coralogix.com/docs/dataprime/language-reference/commands-reference/bottom/index.md) to identify expected and unexpected behavior in your logs.

## Why it matters

Every system has a baseline of "normal" behavior—and detecting when things deviate is critical. Whether you're tracking user behavior, spotting error spikes, or monitoring rare edge cases, DataPrime gives you powerful tools to surface these patterns in real time or retroactively.

______________________________________________________________________

## `matches` – Filter logs that match a regular expression

### Description

Use [`matches`](https://coralogix.com/docs/dataprime/language-reference/functions-reference/string/matches/index.md) to test if a string field matches a given regular expression. This is useful for detecting structured patterns in log messages.

### Syntax

```dataprime
matches(string: string, pattern: regexp): bool
```

### Example: Detect well-formed user purchase logs

**Description**: Filter logs where the `msg` field matches the pattern `"User [Name] bought [Number] [Item]"`.

**Sample data**

```json
{ "msg": "User Chris bought 10 sunglasses" }
{ "msg": "User James bought 1 bed" }
{ "msg": "User X bo" }
```

**Query**

```dataprime
filter matches(msg, /User [A-Za-z]+ bought \d+ [A-Za-z]+/)
```

**Result**

```json
{ "msg": "User Chris bought 10 sunglasses" }
{ "msg": "User James bought 1 bed" }
```

______________________________________________________________________

### Example: Detect malformed or truncated messages

**Description**: Flip the logic using `!matches` to find logs that deviate from the expected pattern.

**Query**

```dataprime
filter !matches(msg, /User [A-Za-z]+ bought \d+ [A-Za-z]+/)
```

**Result**

```json
{ "msg": "User X bo" }
```

______________________________________________________________________

## **countby** – Count grouped values

### Description

Use [`countby`](https://coralogix.com/docs/dataprime/language-reference/commands-reference/countby/index.md) to group documents by a key and count how many times each value appears. Ideal for identifying common or rare categories.

### Syntax

```dataprime
countby <expression> [as <alias>] [into <keypath>]
```

### Example: Count logs by HTTP status code

**Sample data**

```json
{ "status_code": 200 }
{ "status_code": 404 }
{ "status_code": 500 }
{ "status_code": 200 }
{ "status_code": 403 }
```

**Query**

```dataprime
countby status_code into status_count
```

**Result**

| status_code | status_count |
| ----------- | ------------ |
| 200         | 2            |
| 404         | 1            |
| 500         | 1            |
| 403         | 1            |

______________________________________________________________________

### Example: Group status codes into buckets

**Query**

```dataprime
filter responseStatus.code != null
| create status_range from 
    case_greaterthan {
      responseStatus.code,
      499 -> '5xx',
      399 -> '4xx',
      299 -> '3xx',
      199 -> '2xx',
      99  -> '1xx',
      _   -> 'other'
    }
| countby status_range into status_count
```

**Result**

| status_range | status_count |
| ------------ | ------------ |
| 2xx          | 6619         |
| 4xx          | 172          |
| 5xx          | 101          |

______________________________________________________________________

## **percentile** – Find statistical outliers

### Description

Use [`percentile`](https://coralogix.com/docs/dataprime/language-reference/functions-reference/aggregation/percentile/index.md) to calculate the Nth percentile of a numeric field within a group. Useful for identifying slowest responses or largest payloads.

### Syntax

```dataprime
percentile(percent: number, value: number): number
```

### Example: Find 99th percentile latency by path

**Sample data**

```json
{ "operationName": "SELECT", "duration": 120 }
{ "operationName": "SELECT", "duration": 350 }
{ "operationName": "DELETE", "duration": 90 }
{ "operationName": "SELECT", "duration": 900 }
```

**Query**

```dataprime
source spans
| groupby path agg percentile(0.99, latency) as latency_p99
```

**Result**

| path   | latency_p99 |
| ------ | ----------- |
| /home  | 900         |
| /about | 90          |

______________________________________________________________________

## **bottom** – Return the least frequent items

### Description

Use [`bottom`](https://coralogix.com/docs/dataprime/language-reference/commands-reference/bottom/index.md) to return the bottom N values based on count or another metric. This is ideal for spotting rarely used accounts, endpoints, or hosts.

### Syntax

```dataprime
bottom <limit> <expression> by <aggregation>
```

______________________________________________________________________

### Example: Least active users by count

**Sample data**

```json
{ "user": "Alice" }
{ "user": "Bob" }
{ "user": "Alice" }
{ "user": "Charlie" }
```

**Query**

```dataprime
bottom 2 user by count()
```

**Result**

| user    | count |
| ------- | ----- |
| Bob     | 1     |
| Charlie | 1     |

______________________________________________________________________

### Example: Least active users by total time

**Sample data**

```json
{ "user": "Alice", "duration_ms": 500 }
{ "user": "Alice", "duration_ms": 800 }
{ "user": "Bob", "duration_ms": 100 }
{ "user": "Charlie", "duration_ms": 200 }
```

**Query**

```dataprime
bottom 2 user, count() as activity_count by sum(duration_ms) as total_time
```

**Result**

| user    | activity_count | total_time |
| ------- | -------------- | ---------- |
| Bob     | 1              | 100        |
| Charlie | 1              | 200        |

______________________________________________________________________

## Visualize result

Each of the above commands transforms your dataset and gives you focused insight:

- `matches` filters for structural conformity.
- `countby` helps visualize dominant categories.
- `percentile` highlights statistical outliers.
- `bottom` finds rare or underused values.

You should see logs, tables, or metrics showing these patterns clearly—ready for dashboards or alerts.

______________________________________________________________________

## Common pitfalls

- **Too narrow filters**: If your regular expressions or filters are too strict, you might exclude valuable data. Try starting broad, then refining.
- **Misusing `bottom`**: Without an explicit `by` clause, `bottom` may return confusing results. Always pair it with `count()` or another aggregation.
