# Using DataPrime to isolate and shape logs for deeper analysis

## Goal

By the end of this guide you should be able to use [`filter`](https://coralogix.com/docs/dataprime/language-reference/commands-reference/filter/index.md), [`block`](https://coralogix.com/docs/dataprime/language-reference/commands-reference/block/index.md), [`choose`](https://coralogix.com/docs/dataprime/language-reference/commands-reference/choose/index.md), and [`create`](https://coralogix.com/docs/dataprime/language-reference/commands-reference/create/index.md) to isolate relevant log data, transform it, and prepare it for further analysis.

## Why it matters

When debugging issues or investigating anomalies, you’ll rarely get what you need from a single `filter`. Real investigations require peeling back layers: filtering what matters, cutting what doesn’t, shaping the remaining data, and adding context for further questions. This guide shows you how to combine multiple DataPrime commands into a focused, intermediate-level workflow.

______________________________________________________________________

## Filter logs based on key conditions

### Description

Use [`filter`](https://coralogix.com/docs/dataprime/language-reference/commands-reference/filter/index.md) to include only logs that meet specific conditions. This is usually your first step in narrowing down the dataset to only the relevant events. You can use simple comparisons, functions like [`ipInSubnet`](https://coralogix.com/docs/dataprime/language-reference/functions-reference/ip/ipinsubnet/index.md), or even combine multiple conditions with `&&` and `||`.

### Syntax

```dataprime
filter <boolean_expression>
```

### Example: Keep only internal IP traffic

#### Sample data

```json
{ "ip_address": "10.8.0.45", "status_code": 200 }
{ "ip_address": "192.168.1.10", "status_code": 500 }
```

#### Query

```dataprime
filter ipInSubnet(ip_address, '10.8.0.0/16')
```

#### Result

```json
{ "ip_address": "10.8.0.45", "status_code": 200 }
```

Only the logs where `ip_address` falls inside the `10.8.0.0/16` subnet are kept.

______________________________________________________________________

## Remove unwanted noise with `block`

### Description

Use [`block`](https://coralogix.com/docs/dataprime/language-reference/commands-reference/block/index.md) to explicitly exclude logs that match a condition. It works as the inverse of [`filter`](https://coralogix.com/docs/dataprime/language-reference/commands-reference/filter/index.md). If the condition is true, the log is discarded. This is helpful for trimming down “happy path” events such as successful HTTP requests.

### Syntax

```dataprime
block <boolean_expression>
```

### Example: Drop successful requests

#### Sample data

```json
{ "status_code": "200", "path": "/health" }
{ "status_code": "500", "path": "/login" }
```

#### Query

```dataprime
block status_code.startsWith('2')
```

#### Result

```json
{ "status_code": "500", "path": "/login" }
```

All logs with `2xx` status codes are removed, leaving only errors and other non-successful responses.

______________________________________________________________________

## Reduce and standardize the shape with `choose`

### Description

Logs often contain many fields, and different sources may name the same field inconsistently. Use [`choose`](https://coralogix.com/docs/dataprime/language-reference/commands-reference/choose/index.md) to project only the fields you care about, while renaming or unifying them. Combine it with functions like [`firstNonNull`](https://coralogix.com/docs/dataprime/language-reference/functions-reference/general/firstnonnull/index.md) to standardize schema across sources.

### Syntax

```dataprime
choose <field1> [as alias], <field2>, ...
```

### Example: Unify user ID field names

#### Sample data

```json
{ "user_id": "123", "path": "/checkout", "status_code": 200 }
{ "userId": "456", "path": "/home", "status_code": 500 }
```

#### Query

```dataprime
choose firstNonNull(user_id, userId) as canonical_user_id, path, status_code
```

#### Result

```json
{ "canonical_user_id": "123", "path": "/checkout", "status_code": 200 }
{ "canonical_user_id": "456", "path": "/home", "status_code": 500 }
```

Now all records share the same standardized `canonical_user_id` field.

______________________________________________________________________

## Add computed or contextual fields with `create`

### Description

Use [`create`](https://coralogix.com/docs/dataprime/language-reference/commands-reference/create/index.md) to enrich your logs with new fields. These can be derived from existing values, lookups, or constants. This helps prepare logs for deeper analysis, without repeatedly recalculating the same expressions.

### Syntax

```dataprime
create <new_field> from <expression>
```

### Example 1: Tag internal IP traffic

#### Sample data

```json
{ "ip_address": "10.1.2.3" }
{ "ip_address": "203.0.113.8" }
```

#### Query

```dataprime
create is_internal from ipInSubnet(ip_address, '10.0.0.0/8')
```

#### Result

```json
{ "ip_address": "10.1.2.3", "is_internal": true }
{ "ip_address": "203.0.113.8", "is_internal": false }
```

______________________________________________________________________

### Example 2: Generate batch IDs

#### Sample data

```json
{ "event": "login", "user": "alice" }
{ "event": "purchase", "user": "bob" }
```

#### Query

```dataprime
create analysis_batch_id from randomUuid()
```

#### Result

```json
{ "event": "login", "user": "alice", "analysis_batch_id": "a17f8f0c-5b2c-4c9f-a96a-2d4e93c5e678" }
{ "event": "purchase", "user": "bob", "analysis_batch_id": "e39b6a90-0b71-4427-8f53-1a2c5fa47de0" }
```

Each log gets a unique identifier, useful for tagging export batches or investigations.

______________________________________________________________________

## Common pitfalls

When shaping and isolating logs with `filter`, `block`, `choose`, and `create`, a few issues come up often:

- **Mixing up `filter` and `block`:** `filter` *keeps* matching events, while `block` *removes* them.
- **Null values in conditions:** `null` works only on scalar values (strings, numbers, timestamps).
- **Overwriting fields with `create`:** `create` overwrites existing fields if the key already exists.
- **Performance trade-offs:** Running expensive functions (e.g., regex `extract`, `ipInSubnet`) inside a `filter` can slow queries on large datasets. Where possible, pre-filter with simpler conditions to minimize the scanned set.
- **Ambiguous field names:** Inconsistent field naming (like `user_id` vs. `userId`) can cause incomplete results. Use helpers such as `firstNonNull` to standardize schema.
