Skip to content

Using DataPrime to clean and normalize data with functions

Goal

By the end of this guide you should be able to use the firstNonNull, ipInSubnet, parseInterval, and urlDecode functions to clean, transform, and standardize inconsistent or encoded data in your logs.

Why it matters

Raw logs are rarely consistent. Fields can be missing, renamed, or encoded—making it hard to search, group, or build alerts. These functions let you create structure out of inconsistency, enriching your logs without requiring upstream changes.


firstNonNull – Use the first available non-null value

Description

Use firstNonNull to return the first non-null value from a list of scalar fields. Ideal for normalizing inconsistent key names or fallback values.

Syntax

firstNonNull(value: any, ...values: any): any

Example – Normalize user identifiers across schemas

Sample data

{ "userId": "123", "user_id": null, "user_identifier": null }
{ "userId": null, "user_id": "456", "user_identifier": null }
{ "userId": null, "user_id": null, "user_identifier": "789" }

Query

choose canonical_user_id from firstNonNull(userId, user_id, user_identifier)

Result

{ "canonical_user_id": "123" }
{ "canonical_user_id": "456" }
{ "canonical_user_id": "789" }

Use this when multiple versions of a field may exist, and you want to unify them under a single key.


ipInSubnet – Check if an IP address belongs to a given subnet

Description

ipInSubnet tests whether an IP address is within a given CIDR block. Useful for filtering internal vs external traffic, identifying environments, or tagging traffic by region.

Syntax

ipInSubnet(ip: string, ipPrefix: string): bool

Example – Filter logs by internal IP range

Sample data

{ "ip": "10.8.0.8" }
{ "ip": "192.168.1.45" }
{ "ip": "8.8.8.8" }

Query

filter ipInSubnet(ip, '10.0.0.0/8')

Result

{ "ip": "10.8.0.8" }

Only logs where the IP is within 10.0.0.0/8 are retained. Useful for segmenting or excluding traffic.


parseInterval – Convert a string duration to an interval type

Description

With parseInterval you can parse duration strings like "2d3h" into an interval so you can perform time-based arithmetic (e.g., add to a timestamp). Enables accurate time math for scheduling, tracking, and alerts.

Syntax

parseInterval(string: string): interval

Example – Add a duration to a start time

Sample data

{
  "timestamp": 1728763337,
  "job_name": "BUILD_VIDEO",
  "completed_in": "35m10s"
}

Query

create completed_time from addTime(timestamp, parseInterval(completed_in))

Result

{
  "timestamp": 1728763337,
  "completed_in": "35m10s",
  "completed_time": 1728782447  // +2110 seconds
}

Now the log includes a concrete end time, useful for SLA tracking or time-based filtering.


urlDecode – Decode URL-encoded values

Description

Decode percent-encoded characters (e.g., %20) back into human-readable form with urlDecode. Use when logs contain encoded query strings or URLs.

Syntax

urlDecode(string: string): string

Example – Decode a URL-encoded name field

Sample data

{
  "query_string": "name=Tom%20Kosman&b=c&c=d"
}

Query

extract query_string into query_params using kv(pair_delimiter='&', key_delimiter='=')
| replace query_params.name with urlDecode(query_params.name)

Result

{
  "query_params": {
    "name": "Tom Kosman",
    "b": "c",
    "c": "d"
  }
}

Now your logs contain readable names and values, improving searchability and visualization.


Common pitfalls

  • firstNonNull only works on scalar values (not arrays or objects).
  • ipInSubnet requires valid CIDR strings like '192.168.0.0/24'.
  • parseInterval fails silently if the input format is invalid (1s1d is not allowed).
  • urlDecode works only on strings—decoding must be done field by field.