# Using DataPrime to clean and normalize data with functions

## Goal

By the end of this guide you should be able to use the [`firstNonNull`](https://coralogix.com/docs/dataprime/language-reference/functions-reference/general/firstnonnull/index.md), [`ipInSubnet`](https://coralogix.com/docs/dataprime/language-reference/functions-reference/ip/ipinsubnet/index.md), [`parseInterval`](https://coralogix.com/docs/dataprime/language-reference/functions-reference/time/parseinterval/index.md), and [`urlDecode`](https://coralogix.com/docs/dataprime/language-reference/functions-reference/url/urldecode/index.md) functions to clean, transform, and standardize inconsistent or encoded data in your logs.

## Why it matters

Raw logs are rarely consistent. Fields can be missing, renamed, or encoded—making it hard to search, group, or build alerts. These functions let you create structure out of inconsistency, enriching your logs without requiring upstream changes.

______________________________________________________________________

## `firstNonNull` – Use the first available non-null value

### Description

Use [`firstNonNull`](https://coralogix.com/docs/dataprime/language-reference/functions-reference/general/firstnonnull/index.md) to return the first non-null value from a list of scalar fields. Ideal for normalizing inconsistent key names or fallback values.

### Syntax

```dataprime
firstNonNull(value: any, ...values: any): any
```

### Example – Normalize user identifiers across schemas

#### Sample data

```json
{ "userId": "123", "user_id": null, "user_identifier": null }
{ "userId": null, "user_id": "456", "user_identifier": null }
{ "userId": null, "user_id": null, "user_identifier": "789" }
```

#### Query

```dataprime
choose canonical_user_id from firstNonNull(userId, user_id, user_identifier)
```

#### Result

```json
{ "canonical_user_id": "123" }
{ "canonical_user_id": "456" }
{ "canonical_user_id": "789" }
```

Use this when multiple versions of a field may exist, and you want to unify them under a single key.

______________________________________________________________________

## `ipInSubnet` – Check if an IP address belongs to a given subnet

### Description

[`ipInSubnet`](https://coralogix.com/docs/dataprime/language-reference/functions-reference/ip/ipinsubnet/index.md) tests whether an IP address is within a given CIDR block. Useful for filtering internal vs external traffic, identifying environments, or tagging traffic by region.

### Syntax

```dataprime
ipInSubnet(ip: string, ipPrefix: string): bool
```

### Example – Filter logs by internal IP range

#### Sample data

```json
{ "ip": "10.8.0.8" }
{ "ip": "192.168.1.45" }
{ "ip": "8.8.8.8" }
```

#### Query

```dataprime
filter ipInSubnet(ip, '10.0.0.0/8')
```

#### Result

```json
{ "ip": "10.8.0.8" }
```

Only logs where the IP is within `10.0.0.0/8` are retained. Useful for segmenting or excluding traffic.

______________________________________________________________________

## `parseInterval` – Convert a string duration to an interval type

### Description

With [`parseInterval`](https://coralogix.com/docs/dataprime/language-reference/functions-reference/time/parseinterval/index.md) you can parse duration strings like `"2d3h"` into an `interval` so you can perform time-based arithmetic (e.g., add to a timestamp). Enables accurate time math for scheduling, tracking, and alerts.

### Syntax

```dataprime
parseInterval(string: string): interval
```

### Example – Add a duration to a start time

#### Sample data

```json
{
  "timestamp": 1728763337,
  "job_name": "BUILD_VIDEO",
  "completed_in": "35m10s"
}
```

#### Query

```dataprime
create completed_time from addTime(timestamp, parseInterval(completed_in))
```

#### Result

```json
{
  "timestamp": 1728763337,
  "completed_in": "35m10s",
  "completed_time": 1728782447  // +2110 seconds
}
```

Now the log includes a concrete end time, useful for SLA tracking or time-based filtering.

______________________________________________________________________

## `urlDecode` – Decode URL-encoded values

### Description

Decode percent-encoded characters (e.g., `%20`) back into human-readable form with [`urlDecode`](https://coralogix.com/docs/dataprime/language-reference/functions-reference/url/urldecode/index.md). Use when logs contain encoded query strings or URLs.

### Syntax

```dataprime
urlDecode(string: string): string
```

### Example – Decode a URL-encoded name field

#### Sample data

```json
{
  "query_string": "name=Tom%20Kosman&b=c&c=d"
}
```

#### Query

```dataprime
extract query_string into query_params using kv(pair_delimiter='&', key_delimiter='=')
| replace query_params.name with urlDecode(query_params.name)
```

#### Result

```json
{
  "query_params": {
    "name": "Tom Kosman",
    "b": "c",
    "c": "d"
  }
}
```

Now your logs contain readable names and values, improving searchability and visualization.

______________________________________________________________________

## Common pitfalls

- `firstNonNull` only works on scalar values (not arrays or objects).
- `ipInSubnet` requires valid CIDR strings like `'192.168.0.0/24'`.
- `parseInterval` fails silently if the input format is invalid (`1s1d` is not allowed).
- `urlDecode` works only on strings—decoding must be done field by field.
