Using DataPrime to clean and normalize data with functions
Goal
By the end of this guide you should be able to use the firstNonNull
, ipInSubnet
, parseInterval
, and urlDecode
functions to clean, transform, and standardize inconsistent or encoded data in your logs.
Why it matters
Raw logs are rarely consistent. Fields can be missing, renamed, or encoded—making it hard to search, group, or build alerts. These functions let you create structure out of inconsistency, enriching your logs without requiring upstream changes.
firstNonNull
– Use the first available non-null value
Description
Use firstNonNull
to return the first non-null value from a list of scalar fields. Ideal for normalizing inconsistent key names or fallback values.
Syntax
Example – Normalize user identifiers across schemas
Sample data
{ "userId": "123", "user_id": null, "user_identifier": null }
{ "userId": null, "user_id": "456", "user_identifier": null }
{ "userId": null, "user_id": null, "user_identifier": "789" }
Query
Result
Use this when multiple versions of a field may exist, and you want to unify them under a single key.
ipInSubnet
– Check if an IP address belongs to a given subnet
Description
ipInSubnet
tests whether an IP address is within a given CIDR block. Useful for filtering internal vs external traffic, identifying environments, or tagging traffic by region.
Syntax
Example – Filter logs by internal IP range
Sample data
Query
Result
Only logs where the IP is within 10.0.0.0/8
are retained. Useful for segmenting or excluding traffic.
parseInterval
– Convert a string duration to an interval type
Description
With parseInterval
you can parse duration strings like "2d3h"
into an interval
so you can perform time-based arithmetic (e.g., add to a timestamp). Enables accurate time math for scheduling, tracking, and alerts.
Syntax
Example – Add a duration to a start time
Sample data
Query
Result
{
"timestamp": 1728763337,
"completed_in": "35m10s",
"completed_time": 1728782447 // +2110 seconds
}
Now the log includes a concrete end time, useful for SLA tracking or time-based filtering.
urlDecode
– Decode URL-encoded values
Description
Decode percent-encoded characters (e.g., %20
) back into human-readable form with urlDecode
. Use when logs contain encoded query strings or URLs.
Syntax
Example – Decode a URL-encoded name field
Sample data
Query
extract query_string into query_params using kv(pair_delimiter='&', key_delimiter='=')
| replace query_params.name with urlDecode(query_params.name)
Result
Now your logs contain readable names and values, improving searchability and visualization.
Common pitfalls
firstNonNull
only works on scalar values (not arrays or objects).ipInSubnet
requires valid CIDR strings like'192.168.0.0/24'
.parseInterval
fails silently if the input format is invalid (1s1d
is not allowed).urlDecode
works only on strings—decoding must be done field by field.