Our next-gen architecture is built to help you make sense of your ever-growing data Watch a 4-min demo video!

Back to All Docs

DataPrime Cheat Sheet DataPrime Cheat Sheet

Last Updated: Sep. 20, 2022

Concept

DataPrime is a language that allows one to query data by describing a series of “Operators” on it. The operators are separated by pipes (|), in a way that resembles Linux shells.

This cheat sheet will help you get started with the basics and more advanced operators and expressions that you can leverage with the DataPrime language.

Query Format

Query format is as follows:

source logs | operator ... | operator ... | operator | ...

Any whitespace between operators is ignored, allowing writing queries as multi-line queries that are more readable in many cases. For example:

source logs
  | operator1 ....
  | operator2 ....
  | ...

Data Types

These are the data types currently supported:

  • string
  • number/num – A number (double or integer)
  • boolean – A boolean type, with true or false values
  • null – A null value
  • timestamp – A UTC timestamp in milliseconds

Operators

source

Tap into <store_id> as a source. Currently, the only supported store is logs. In the future, other stores such as metrics, traces and spans would be available, and also ad-hoc stores that can be defined by the user.

source <store-id>

Examples:

source logs

filter

Filter events, leaving only events for which the condition evaluates to true

(f|filter) <condition-expression>

Examples:

f $d.radius > 10
filter $m.severity.toUpperCase() == 'INFO'
filter $l.applicationname == 'recommender'
filter $l.applicationname == 'myapp' && $d.msg.contains('failure')

NOTE: comparison with null currently works only for scalar values and will always return null on JSON subtrees.

move

Move a key (including its subkeys, if any) to a new location.

(m|move) <source-keypath> to <target-keypath>

Examples:

move $d.my_data.hostname to $d.my_new_data.host
m $d.kubernetes.labels to $d.my_labels

remove

Remove a keypath from the object.

r|remove <keypath1> [ "," <keypath2> ]...

Examples:

r $d.mydata.unneeded_key
remove $d.mysuperkey.service_name, $d.mysuperkey.unneeded_key

replace

Replace the value of some key with a new value.

By stating on datatype change you can decide what should happen if the replacement value changes the datatype of the keypath:

  • overwrite – The new value will overwrite the previous one, changing the datatype of the keypath. This is the default value
  • skip – The replacement will be ignored
  • fail – The query will fail
replace <keypath> with <expression> [on datatype changed skip/fail/overwrite]

Examples:

replace $d.message with null
replace $d.some_superkey.log_length_plus_10 with $d.original_log.length()+10 on datatype changed skip

redact

Replace all substrings matching a regexp pattern from some keypath value, effectively hiding the original content.

The matching keyword is optional and can be used to increase readability.

redact <keypath> [matching] /<regular-expression>/ to '<redacted_str>'
redact <keypath> [matching] <string> to '<redacted_str>'

Examples:

redact $d.mykey /[0-9]+/ to 'SOME_INTEGER'
redact $d.mysuperkey.user_id 'root' to 'UNKNOWN_USER'
redact $d.mysuperkey.user_id matchingn 'root' to 'UNKNOWN_USER'

choose

Construct a new object based on expressions from the original object.

If the expression is a keypath, then by default the output_keypath will have a similar name. If it isn’t, then output_keypath needs to be provided.

choose <expression1> [as <output_keypath1>],<expression2> [as <output_keypath2>],...

Examples:

choose $d.mysuperkey.myfield
choose $d.my_superkey.mykey as $d.important_value, 10 as $d.the_value_ten

create

Create a new key and set its value to the result of the expression. Key creation is granular, meaning that superkeys in the path are not overwritten.

The creation can be controlled by providing one or more of several clauses. See below for details.

  (a|add|c|create) <keypath> from <expression> [on keypath exists (fail|skip|overwrite)] [on keypath missing (fail|create|skip)] [on datatype change (skip|fail|overwrite)

Examples:

create $d.radius from 100+23
c $d.log_data.truncated_message from $d.message.substring(1,50)
c $data.trimmed_name from $data.username.trim()

create $d.temperature from 100*23 on datatype changed skip

Controlling what happens when a key exists, is missing, or when the data type of the key has changed:

  • Adding on keypath exists allows to choose what to do when the keypath already exists.
    • overwrite – Overwrites the old value. This is the default value
    • fail – Fails the query
    • skip – Skips the creation of the key
  • Adding on keypath missing allows to choose what to do when the new keypath does not exist.
    • create – Creates the key. This is the default value
    • fail – Fails the query
    • skip – Skips the creation of the new key
  • Adding on datatype changed allows to choose what to do if the key already exists and the new data changes the datatype of the value
    • overwrite – Overwrites the value anyway. This is the default value
    • fail – Fails the query
    • skip – Leaves the key with the original value (and type)

convert

Convert the data type of one or more keys.

The datatypes keyword is optional and can be used for readability.

(conv|convert) [datatypes] <keypath1>:<datatype1>,<keypath2>:<datatype2>,...

Examples:

convert $d.level:number
conv datatypes $d.long:number,$d.lat:number
convert $d.data.color:number,$d.item:string

NOTE: The extract operator contains a special clause called datatypes which provides on-the-fly conversion of datatypes during extraction.

extract

Extract data from some string value into a new object. Multiple extraction methods are supported.

(e|extract) <expression> into <keypath> using <extraction-type>(<extraction-params>) [datatypes keypath1:datatype1,keypath2:datatype2,...]

Here are the currently supported extraction methods, and their parameters:

  • regexp – Create a new object based on regexp capture-groups
    • e – A regular expression with names capture-groups.

Example:

extract $d.my_text into $d.my_data using regexp(e=/user (?<user>.*) has logged in/)

# Here's an example result event:
{
  "my_text": "user A1000 has logged in",
  ...
  "my_data": {
    "user": "A1000"
  }
}

  • kv – Extract a new object from a string that contains key=value key=value... pairs
    • pair_delimiter – The delimiter to expect between pairs. Default is “ (a space)
    • key_delimiter – The delimiter to expect separating between a key and a value. Default is =.

Examples:

extract $d.text into $d.my_kvs using kv()

# Here's an example result event:
{
  "text": "query_id=100 query_source=UI query_duration_ms=233",
  ...
  "my_kvs": {
    "query_id": "100",
    "query_source": "UI",
    "query_duration_ms": "233"
  }
}

  • jsonobject – Extract a new object from a string that contains an encoded JSON object, potentially attempting to unescape the string before decoding it into a JSON
    • unescape_if_needed – A boolean stating whether to unescape the JSON if needed. When set to true, the engine will detect whether or not the value contains an escaped JSON string and handle it accordingly

Example:

extract $d.json_message_as_str into $d.json_message using jsonobject()

# Here's an example result event:
{
  "json_message_as_str": "{\\\\"longdate\\\\":\\\\"2022-07-31 11:38:57.6482\\\\",\\\\"duration\\\\":\\\\"1939\\\\",\\\\"message\\\\":\\\\"my message\\\\"}"},
  "json_message": {
    "longdate": "2022-07-31 11:38:57.6482",
    "duration": "1939",
    "message": "my message"
  }
}

Additional extraction methods will be supported soon.

It is possible to provide data type information as part of the extraction, by using the datatypes clause. For example, adding datatypes my_field:number to an extraction would cause the extract my_field keypath to be a number instead of a string. For example:

extract $d.my_msg into $d.data using kv() datatypes my_field:number

Extracted data always goes into a new keypath as an object, allowing further processing of the new keys inside that new object. For example:

# Assuming a dataset which look like that:
{ "msg": "query_type=fetch query_id=100 query_results_duration_ms=232" }
{ "msg": "query_type=fetch query_id=200 query_results_duration_ms=1001" }

# And the following DataPrime query:
source logs
  | extract $d.msg into $d.query_data using kv() datatypes query_results_duration_ms:number
  | filter $d.query_data.query_results_duration_ms > 500

# The resulting dataset will be as follows (only the second object, in which the duration is >500ms will remain)
{
  "msg": "query_type=fetch query_id=200 query_results_duration_ms=1001",
  "query_data": {
    "query_type": "fetch",
    "query_id": "200",
    "query_results_duration_ms": 1001
  }
}

limit

Limits the output to the first <event-count> events.

limit <event-count>

Examples

limit 100

orderby

Sort the data by ascending/descending order of the expression value. Ordering by multiple expressions is supported.

(orderby|sortby|order by|sort by) <expression> [(asc|desc)] , ...

Examples:

orderby $d.kubernetes.myfield
orderby $d.duration:number desc
sortby $d.myfield desc

NOTE: Sorting numeric values can be done by casting expression to the type: e.g.<expression>:number. In some cases, this will be inferred automatically by the engine.

block

The negation of filter. Filters-out all events where the condition is true. The same effect can be achieved by using filter with !(condition).

block $d.status_code == 200        # Block events which have a status code of 200

timeround

Rounds the time of the event into some time interval, possibly creating a new key for the result.

timeround [source-timestamp] to <time-interval> [into <target-keypath>]

If source-timestamp is not provided, then $m.timestamp is used as the source timestamp.

By default, the rounded result is written back to the original event time $m.timestamp. If into <target-keypath> is provided, then $m.timestamp is not modified, and the result is written to a new target-keypath.

Supported time intervals are:

  • Xs – X seconds
  • Xm – X minutes
  • Xh – X hours
  • Xd – X days

Examples:

timeround to 1h
timeround $d.another_event_timestamp to 60m
timeround to 60s into $d.rounded_ts_to_the_minute

groupby

Groups the results of the preceding operators by the specified grouping expressions and calculates aggregate functions for every group created.

groupby <grouping_expression> [as <alias>] [, <grouping_expression_2> [as <alias_2>], ...] [calculate]
  <aggregate_function> [as <result_keypath>]
  [, <aggregate_function_2> [as <result_keypath_2], ...]

Examples:

The following query:

groupby $m.severity calculate sum($d.duration)

Will result in logs of the following form:

{ “severity”: “Warning”, “_sum”: 17045 }

The keypaths for the grouping expressions will always be under $d. Using the as keyword, we can rename the keypath for the grouping expressions and aggregation functions. The following query:

groupby $m.applicationname as $d.app calculate sum($d.duration) as $d.sum_duration

Will result in logs of the following form:

{ “app”: “web-api”, “sum_duration”: 17045 }

All supported aggregation functions are listed in the “Aggregation Functions” section below.

Text Search Operators

find/text

Search for the string in a certain keypath.

(find|text) <free-text-string> in <keypath>

Examples:

find 'host1000' in $d.kubernetes.hostname
text 'us-east-1' in $d.msg

wildfind/wildtext

Search for the string in the entire user data. This can be used when the keypath in which the text resides is unknown.

NOTE: The performance of this operator is worse than when using the find/text operator. Prefer using those operators when you know the keypath to search for.

(wildfind/wildtext) <string>

Examples:

wildfind 'my-region'
wildtext ':9092'

lucene

A generic lucene-compatible operator, allowing both free and wild text searches, and more complex search queries.

Field names inside the lucene query are relative to $d (the root level of user-data).

lucene <lucene-query-as-a-string>

Examples:

lucene 'pod:recommender AND (is_error:true or status_code:404)'

Expressions

DataPrime supports a limited set of javascript constructs that can be used in expressions.

The data is exposed using the following top-level fields:

  • $m – Event metadata
    • timestamp
    • severity – Possible values are Verbose, Debug, Info, Warning, Error, Critical
    • priorityclass – Possible values are high, medium, low
    • logid
  • $l – Event labels
    • applicationname
    • subsystemname
    • category
    • classname
    • computername
    • methodname
    • threadid
    • ipaddress
  • $d – The user’s data

Field Access

Accessing nested data is done by using a keypath, similar to any programming language or json tool. Keys with special characters can be accessed using a map-like syntax, with the key string as the map index, e.g. $d.my_superkey['my_field_with_a_special/character'].

Examples:

$m.timestamp
$d.my_superkey.myfield
$d.my_superkey['my_field_with_a_special/character']
$l.applicationname

Language constructs

All standard language constructs are supported:

  • Constants
  • Nested field access, as mentioned above
  • Basic math operations between numbers, including modulo (%)
  • Boolean operations &&, ||, !
  • Comparisons
  • String concatenations through concat (native string interpolation will be supported soon)
  • Casting – A simple notation for casting data types: e.g. $d.temperature:number. Type inference is automatically applied when possible. We’ll support full type-inference soon, reducing the need for casting.

Functions

Various functions can be used in expressions. All functions can be called as methods as well, e.g. $d.msg.contains('x') is equivalent to contains($d.msg,'x').

String functions

  • trim(s: string): string – Removes whitespace from the edges of a string s
  • ltrim(s: string): string – Removes whitespace to the left of the string s
  • rtrim(s: string): string – Removes whitespace to the right of the string s
  • concat(s1: string, ... ,sN: string): string – Concatenates multiple strings into one. Native string interpolation will be supported soon, which will allow easier and more readable string concatenation
  • contains(s: string, substring: string): boolean – Returns true if substring is contained in s
  • substr(s: string, from: int, length: int=<end-of-string>): string – Returns the substring in s, from position from and up to length length
  • length(s: string): int – Returns the length of s
  • indexOf(s: string, substring: string): int – Returns the position of substring in s, or 1 if not found.
  • toUpperCase(s: string): string – Converts s to uppercase
  • toLowerCase(s: string): string – Converts s to lowercase
  • firstNonNull(e1: any,e2: any,...): any – Returns the first non-null value from the parameters. Works only on scalars for now
  • startsWith(s: string, prefix: string): boolean – Returns true if s starts with prefix
  • endsWith(s: string, suffix: string): boolean – Returns true if s ends with suffix

General Functions

  • if(condition: boolean,then: any,else: any): any – return either the then or else according to the result of condition

NOTE: This function is currently limited work only on scalar values, when called on json subtrees they will always return null. The behaviour will be expanded to json subtrees in a later version.

Number functions

  • round(n: number, digits: int=0) – Round n to digits decimal places
  • abs(n: number): number – Returns the absolute value of n
  • min(v1: number, ..., vN: number): number – Returns the smallest number of all the numbers passed to the function
  • max(v1: number, ..., vN: number): number – Returns the largest number of all the numbers passed to the function
  • floor(n: number) – Rounds the value down to the nearest integer
  • ceil(n: number) – Rounds the value up to the nearest integer
  • power(n: number, exponent: number) – Returns n^exponent
  • log(b: number, n: number) – Returns the log of n in base b
  • log2(n: number) – Returns the log of n in base 2. Equivalent to log(2,n)
  • ln(n: number) – Returns the natural log of n

Date/Time functions

  • timeRound(ts_milliseconds: number,date:timestamp? = fromUnixTime($m.timestamp / pow(10, 9))):timestamp
  • toIso8601DateTime(ts:timestamp): string – Converts ts to an ISO8601 string. Output precision is nanoseconds
  • parseToTimestamp(s: string, pattern: string?):timestamp – Parses s to a timestamp using the provided optional pattern. If pattern is not provided, the pattern is autodetected
  • fromUnixTime(n: number):timestamp millisecond precision – Converts the number n into a timestamp. Expects a timestamp in milliseconds format (13 digits, for example 1658958157515)

Encoding/Decoding Functions

  • decodeBase64(s: string): string – Decode a base-64 encoded string
  • encodeBase64(s: string): string – Encode a string into base-64

Case expressions

Case expressions are special constructs in the language that allow choosing between multiple options in an easy manner and in a readable way. They can be used wherever an expression is expected.

case

Choose between multiple values based on several generic conditions. Resort to a default-value if no condition is met.

case {
  condition1 -> value1,
  condition2 -> value2,
  ...
  conditionN -> valueN,
  _          -> <default-value>
}

Example:

case {
  $d.status_code == 200 -> 'success',
  $d.status_code == 201 -> 'created',
  $d.status_code == 404 -> 'not-found',
  _ -> 'other'
}

# Here's the same example inside the context of a query. A new field is created with the `case` result,
# and then a filter will be applied, leaving only non-successful responses.

source logs | ... | create $d.http_response_outcome from case {
  $d.status_code == 200 -> 'success',
  $d.status_code == 201 -> 'created',
  $d.status_code == 404 -> 'not-found',
  _                     -> 'other'
} | filter $d.http_response_outcome != 'success'

case_largerthan

A shorthand for case which allows comparing n to multiple values without repeating the expression leading to n. The chosen value is the first which matches expression > value.

case_largerthan {
  n: number,
  value1: number -> result1,
  value2: number -> result2,
  ...
  valueN: number -> resultN
  _              -> <default-value>
}

Example:

case_largerthan {
  $d.status_code,
  500 -> 'server-error',
  400 -> 'client-error',
  300 -> 'redirection',
  200 -> 'success',
  100 -> 'information'
  _   -> 'other'
}

case_smallerthan

A shorthand for case which allows comparing a number n to multiple values without repeating the expression leading to n. The chosen value is the first which matches expression < value.

case_smallerthan {
  n: number,
  value1: number -> result1,
  value2: number -> result2,
  ...
  valueN: number -> resultN
  _              -> <default-value>
}

Example:

case_smallerthan {
  $d.temperature_celsius,
  10 -> 'freezing',
  20 -> 'cold',
  30 -> 'fun',
  45 -> 'hot',
  _  -> 'burning'
}

case_contains

A shorthand for case which allowing checking if a string s contains one of several substrings without repeating the expression leading to s. The chosen value is the first which matches s.contains(substring).

case_contains {
  s: string,
  substring1 -> result1,
  substring2 -> result2,
  ...
  substring3 -> resultN,
  _          -> <default-value>
}

Example:

case_contains {
  $l.subsystemname,
  '-prod-' -> 'production',
  '-dev-'  -> 'development',
  '-stg-'  -> 'staging',
  _        -> 'test'
}

case_equals

A shorthand for case which allowing comparing some expression e to several results without repeating the expression. The chosen value is the first which matches s == value

case_equals {
  e: any,
  value1 -> result1,
  value2 -> result2,
  ...
  valueN -> resultN
}

Example:

case_equals {
  $m.severity,
  'info'   -> true,
  'warning -> true,
  _        -> false
}

Aggregation Functions:

count
Counts the number of results in a group. Takes no parameters.

groupby $m.severity calculate count()

count_if
Counts rows that satisfy the provided condition.

count_if(condition: boolean)

Example:

groupby $m.severity calculate count_if($d.duration > 500) as $d.high_duration_logs
min

Calculates the minimum value of a numerical expression in the group.

min(expr: number)

Example

groupby $m.severity calculate min($d.duration)
max

Calculates the maximum value of a numerical expression in the group.

max(expr: number)

Example:

groupby $m.severity calculate max($d.duration)
sum

Calculates the sum of a numerical expression in the group.

sum(expr: number)

Example:

groupby $m.severity calculate sum($d.duration) as total_duration
avg

Calculates the average value of a numerical expression in the group.

avg(expr: number)

Example:

groupby $m.severity calculate avg($d.duration) as average_duration
percentile

Calculates the approximate n-th percentile value of a numerical expression in the group.

percentile(expr: number, n: number, accuracy: number = 0.01)

Since the percentile calculation is approximate, the accuracy may be controlled with the accuracy parameter which ranges from 0 to 1. A lower value will result in better accuracy at the cost of longer query times.

Example:

groupby $m.severity calculate percentile($d.duration, 0.99) as p99_latency
stddev

Computes the sample standard deviation of a numerical expression in the group.

stddev(expr: number)

Example:

groupby $m.severity calculate stddev($d.duration)
variance

Computes the variance of a numerical expression in the group.

variance(expr: number)

Example:

groupby $m.severity calculate variance($d.duration)
distinct_count

Counts distinct values of an expression in the group.

distinct_count(expr: any)

Example:

groupby $l.applicationname calculate distinct_count($d.username) as active_users

On this page