Our next-gen architecture is built to help you make sense of your ever-growing data Watch a 4-min demo video!

Back to All Docs

Glossary: DataPrime Operators & Expressions

Last Updated: May. 11, 2023

This guide provides a glossary of all available DataPrime operators and expressions.

Operators

source

Tap into <store_id> as a source. Currently, the only supported store is logs. In the future, other stores such as metrics, traces and spans would be available, and also ad-hoc stores that can be defined by the user.

source <store-id>

Examples:

source logs

filter

Filter events, leaving only events for which the condition evaluates to true

(f|filter) <condition-expression>

Examples:

f $d.radius > 10
filter $m.severity.toUpperCase() == 'INFO'
filter $l.applicationname == 'recommender'
filter $l.applicationname == 'myapp' && $d.msg.contains('failure')

NOTE: comparison with null currently works only for scalar values and will always return null on JSON subtrees.

move

Move a key (including its subkeys, if any) to a new location.

(m|move) <source-keypath> to <target-keypath>

Examples:

move $d.my_data.hostname to $d.my_new_data.host
m $d.kubernetes.labels to $d.my_labels

remove

Remove a keypath from the object.

r|remove <keypath1> [ "," <keypath2> ]...

Examples:

r $d.mydata.unneeded_key
remove $d.mysuperkey.service_name, $d.mysuperkey.unneeded_key

replace

Replace the value of some key with a new value.

By stating on datatype change you can decide what should happen if the replacement value changes the datatype of the keypath:

  • overwrite – The new value will overwrite the previous one, changing the datatype of the keypath. This is the default value
  • skip – The replacement will be ignored
  • fail – The query will fail
replace <keypath> with <expression> [on datatype changed skip/fail/overwrite]

Examples:

replace $d.message with null
replace $d.some_superkey.log_length_plus_10 with $d.original_log.length()+10 on datatype changed skip

replace

Replace all substrings matching a regexp pattern from some keypath value, effectively hiding the original content.

The matching keyword is optional and can be used to increase readability.

redact <keypath> [matching] /<regular-expression>/ to '<redacted_str>'
redact <keypath> [matching] <string> to '<redacted_str>'

Examples:

redact $d.mykey /[0-9]+/ to 'SOME_INTEGER'
redact $d.mysuperkey.user_id 'root' to 'UNKNOWN_USER'
redact $d.mysuperkey.user_id matchingn 'root' to 'UNKNOWN_USER'

choose

Construct a new object based on expressions from the original object.

If the expression is a keypath, then by default the output_keypath will have a similar name. If it isn’t, then output_keypath needs to be provided.

choose <expression1> [as <output_keypath1>],<expression2> [as <output_keypath2>],...

Examples:

choose $d.mysuperkey.myfield
choose $d.my_superkey.mykey as $d.important_value, 10 as $d.the_value_ten

create

Create a new key and set its value to the result of the expression. Key creation is granular, meaning that superkeys in the path are not overwritten.

The creation can be controlled by providing one or more of several clauses. See below for details.

  (a|add|c|create) <keypath> from <expression> [on keypath exists (fail|skip|overwrite)] [on keypath missing (fail|create|skip)] [on datatype change (skip|fail|overwrite)

Examples:

create $d.radius from 100+23
c $d.log_data.truncated_message from $d.message.substring(1,50)
c $data.trimmed_name from $data.username.trim()

create $d.temperature from 100*23 on datatype changed skip

Controlling what happens when a key exists, is missing, or when the data type of the key has changed:

  • Adding on keypath exists allows to choose what to do when the keypath already exists.
    • overwrite – Overwrites the old value. This is the default value
    • fail – Fails the query
    • skip – Skips the creation of the key
  • Adding on keypath missing allows to choose what to do when the new keypath does not exist.
    • create – Creates the key. This is the default value
    • fail – Fails the query
    • skip – Skips the creation of the new key
  • Adding on datatype changed allows to choose what to do if the key already exists and the new data changes the datatype of the value
    • overwrite – Overwrites the value anyway. This is the default value
    • fail – Fails the query
    • skip – Leaves the key with the original value (and type)

convert

Convert the data type of one or more keys.

The datatypes keyword is optional and can be used for readability.

(conv|convert) [datatypes] <keypath1>:<datatype1>,<keypath2>:<datatype2>,...

Examples:

convert $d.level:number
conv datatypes $d.long:number,$d.lat:number
convert $d.data.color:number,$d.item:string

NOTE: The extract operator contains a special clause called datatypes which provides on-the-fly conversion of datatypes during extraction.

extract

Extract data from some string value into a new object. Multiple extraction methods are supported.

(e|extract) <expression> into <keypath> using <extraction-type>(<extraction-params>) [datatypes keypath1:datatype1,keypath2:datatype2,...]

Here are the currently supported extraction methods, and their parameters:

  • regexp – Create a new object based on regexp capture-groups
    • e – A regular expression with names capture-groups.

Example:

extract $d.my_text into $d.my_data using regexp(e=/user (?<user>.*) has logged in/)

# Here's an example result event:
{
  "my_text": "user A1000 has logged in",
  ...
  "my_data": {
    "user": "A1000"
  }
}

  • kv – Extract a new object from a string that contains key=value key=value... pairs
    • pair_delimiter – The delimiter to expect between pairs. Default is “ (a space)
    • key_delimiter – The delimiter to expect separating between a key and a value. Default is =.

Examples:

extract $d.text into $d.my_kvs using kv()

# Here's an example result event:
{
  "text": "query_id=100 query_source=UI query_duration_ms=233",
  ...
  "my_kvs": {
    "query_id": "100",
    "query_source": "UI",
    "query_duration_ms": "233"
  }
}

  • jsonobject – Extract a new object from a string that contains an encoded JSON object, potentially attempting to unescape the string before decoding it into a JSON
    • unescape_if_needed – A boolean stating whether to unescape the JSON if needed. When set to true, the engine will detect whether or not the value contains an escaped JSON string and handle it accordingly

Example:

extract $d.json_message_as_str into $d.json_message using jsonobject()

# Here's an example result event:
{
  "json_message_as_str": "{\\\\"longdate\\\\":\\\\"2022-07-31 11:38:57.6482\\\\",\\\\"duration\\\\":\\\\"1939\\\\",\\\\"message\\\\":\\\\"my message\\\\"}"},
  "json_message": {
    "longdate": "2022-07-31 11:38:57.6482",
    "duration": "1939",
    "message": "my message"
  }
}

Additional extraction methods will be supported soon.

It is possible to provide data type information as part of the extraction, by using the datatypes clause. For example, adding datatypes my_field:number to an extraction would cause the extract my_field keypath to be a number instead of a string. For example:

extract $d.my_msg into $d.data using kv() datatypes my_field:number

Extracted data always goes into a new keypath as an object, allowing further processing of the new keys inside that new object. For example:

# Assuming a dataset which look like that:
{ "msg": "query_type=fetch query_id=100 query_results_duration_ms=232" }
{ "msg": "query_type=fetch query_id=200 query_results_duration_ms=1001" }

# And the following DataPrime query:
source logs
  | extract $d.msg into $d.query_data using kv() datatypes query_results_duration_ms:number
  | filter $d.query_data.query_results_duration_ms > 500

# The resulting dataset will be as follows (only the second object, in which the duration is >500ms will remain)
{
  "msg": "query_type=fetch query_id=200 query_results_duration_ms=1001",
  "query_data": {
    "query_type": "fetch",
    "query_id": "200",
    "query_results_duration_ms": 1001
  }
}

limit

Limits the output to the first <event-count> events.

limit <event-count>

Examples

limit 100

orderby

Sort the data by ascending/descending order of the expression value. Ordering by multiple expressions is supported.

(orderby|sortby|order by|sort by) <expression> [(asc|desc)] , ...

Examples:

orderby $d.kubernetes.myfield
orderby $d.duration:number desc
sortby $d.myfield desc

NOTE: Sorting numeric values can be done by casting expression to the type: e.g.<expression>:number. In some cases, this will be inferred automatically by the engine.

block

The negation of filter. Filters-out all events where the condition is true. The same effect can be achieved by using filter with !(condition).

block $d.status_code == 200        # Block events which have a status code of 200

timeround

Rounds the time of the event into some time interval, possibly creating a new key for the result.

timeround [source-timestamp] to <time-interval> [into <target-keypath>]

If source-timestamp is not provided, then $m.timestamp is used as the source timestamp.

By default, the rounded result is written back to the original event time $m.timestamp. If into <target-keypath> is provided, then $m.timestamp is not modified, and the result is written to a new target-keypath.

Supported time intervals are:

  • Xs – X seconds
  • Xm – X minutes
  • Xh – X hours
  • Xd – X days

Examples:

timeround to 1h
timeround $d.another_event_timestamp to 60m
timeround to 60s into $d.rounded_ts_to_the_minute

groupby

Groups the results of the preceding operators by the specified grouping expressions and calculates aggregate functions for every group created.

groupby <grouping_expression> [as <alias>] [, <grouping_expression_2> [as <alias_2>], ...] [calculate]
  <aggregate_function> [as <result_keypath>]
  [, <aggregate_function_2> [as <result_keypath_2], ...]

Examples:

The following query:

groupby $m.severity calculate sum($d.duration)

Will result in logs of the following form:

{ “severity”: “Warning”, “_sum”: 17045 }

The keypaths for the grouping expressions will always be under $d. Using the as keyword, we can rename the keypath for the grouping expressions and aggregation functions. The following query:

groupby $l.applicationname as $d.app calculate sum($d.duration) as $d.sum_duration

Will result in logs of the following form:

{ “app”: “web-api”, “sum_duration”: 17045 }

All supported aggregation functions are listed in the “Aggregation Functions” section below.

Text Search Operators

find / text

Search for the string in a certain keypath.

(find|text) <free-text-string> in <keypath>

Examples:

find 'host1000' in $d.kubernetes.hostname
text 'us-east-1' in $d.msg

wildfind / wildtext

Search for the string in the entire user data. This can be used when the keypath in which the text resides is unknown.

NOTE: The performance of this operator is worse than when using the find/text operator. Prefer using those operators when you know the keypath to search for.

(wildfind/wildtext) <string>

Examples:

wildfind 'my-region'
wildtext ':9092'

lucene

A generic lucene-compatible operator, allowing both free and wild text searches, and more complex search queries.

Field names inside the lucene query are relative to $d (the root level of user-data).

lucene <lucene-query-as-a-string>

Examples:

lucene 'pod:recommender AND (is_error:true or status_code:404)'

Expressions

Case expressions are special constructs in the language that allow choosing between multiple options in an easy manner and in a readable way. They can be wherever an expression is expected.

case

Choose between multiple values based on several generic conditions. Resort to a default-value if no condition is met.

case {
  condition1 -> value1,
  condition2 -> value2,
  ...
  conditionN -> valueN,
  _          -> <default-value>
}

Example:

case {
  $d.status_code == 200 -> 'success'
  $d.status_code == 201 -> 'created'
  $d.status_code == 404 -> 'not-found'
  _ -> 'other'
}

# Here's the same example inside the context of a query. A new field is created with the `case` result,
# and then a filter will be applied, leaving only non-successful responses.

source logs | ... | create $d.http_response_outcome from case {
  $d.status_code == 200 -> 'success'
  $d.status_code == 201 -> 'created'
  $d.status_code == 404 -> 'not-found'
  _                     -> 'other'
} | filter $d.http_response_outcome != 'success'

case_contains

A shorthand for case which allowing checking if a string s contains one of several substrings without repeating the expression leading to s. The chosen value is the first which matches s.contains(substring).

case_contains {
  s: string,
  substring1 -> result1,
  substring2 -> result2,
  ...
  substring3 -> resultN
}

Example:

case_contains {
  $l.subsystemname,
  '-prod-' -> 'production',
  '-dev-'  -> 'development',
  '-stg-'  -> 'staging',
  _        -> 'test'
}

case_equals

A shorthand for case which allowing comparing some expression e to several results without repeating the expression. The chosen value is the first which matches s == value

case_equals {
  e: any,
  value1 -> result1,
  value2 -> result2,
  ...
  valueN -> resultN
}

Example:

case_equals {
  $m.severity,
  'info'   -> true,
  'warning -> true,
  _        -> false
}

case_greaterthan

A shorthand for case which allows comparing n to multiple values without repeating the expression leading to n. The chosen value is the first which matches expression > value.

case_greaterthan {
  n: number,
  value1: number -> result1,
  value2: number -> result2
  ...
  valueN: number -> resultN
  _              -> <default-value>
}

Example:

case_greaterthan {
  $d.status_code,
  500 -> 'server-error',
  400 -> 'client-error',
  300 -> 'redirection',
  200 -> 'success',
  100 -> 'information'
  _   -> 'other'
}

case_lessthan

A shorthand for case which allows comparing a number n to multiple values without repeating the expression leading to n. The chosen value is the first which matches expression < value.

case_lessthan {
  n: number,
  value1: number -> result1,
  value2: number -> result2
  ...
  valueN: number -> resultN
  _              -> <default-value>
}

Example:

case_lessthan {
  $d.temperature_celsius,
  10 -> 'freezing',
  20 -> 'cold',
  30 -> 'fun',
  45 -> 'hot',
  _  -> 'burning'
}

Aggregation Functions

avg

Calculates the average value of a numerical expression in the group.

avg(expression: number)

Example:

groupby $m.severity calculate avg($d.duration) as average_duration

count_if

Counts non-null expression values on rows which satisfy condition. If expression is not defined, all rows that satisfy condition will be counted.

count_if(condition: boolean, expression: any)

Example:

groupby $m.severity calculate count_if($d.duration > 500) as $d.high_duration_logs
groupby $m.severity calculate count_if($d.duration > 500, $d.company_id) as $d.high_duration_logs

distinct_count

Counts non-null distinct expression values.

distinct_count(expression: any)

Example:

groupby $l.applicationname calculate distinct_count($d.username) as active_users

distinct_count_if

Counts non-null distinct expression values on rows which satisfy condition.

distinct_count_if(condition: boolean, expression: any)

Example:

groupby $l.applicationname calculate distinct_count_if($m.severity == 'Error', $d.username) as users_with_errors

max

Calculates the maximum value of a numerical expression in the group.

max(expression: number)

Example:

groupby $m.severity calculate max($d.duration)

min

Calculates the minimum value of a numerical expression in the group.

min(expression: number)

Example:

groupby $m.severity calculate min($d.duration)

percentile

Calculates the approximate n-th percentile value of a numerical expression in the group.

percentile(n: number, expression: number, accuracy: number = 0.01)

Since the percentile calculation is approximate, the accuracy may be controlled with the accuracy parameter which ranges from 0 to 1. A lower value will result in better accuracy at the cost of longer query times.

Example:

groupby $m.severity calculate percentile(0.99, $d.duration) as p99_latency

sample_stddev

Computes the sample standard deviation of a numerical expression in the group.

sample_stddev(expression: number)

Example:

groupby $m.severity calculate sample_stddev($d.duration)

sample_variance

Computes the variance of a numerical expression in the group.

sample_variance(expression: number)

Example:

groupby $m.severity calculate sample_variance($d.duration)

stddev

Computes the standard deviation of a numerical expression in the group.

stddev(expression: number)

Example:

groupby $m.severity calculate stddev($d.duration)

sum

Calculates the sum of a numerical expression in the group.

sum(expression: number)

Example:

groupby $m.severity calculate sum($d.duration) as total_duration

variance

Computes the variance of a numerical expression in the group.

variance(expression: number)

Example:

groupby $m.severity calculate variance($d.duration)

On this page