Breaking News from AWS re:Invent
Coralogix receives AWS Rising Star award!
Use our innovative DataPrime language not only to query your data, but transform it using a series of operations in a manner that it meaningful for you.
This Cheat Sheet will enable you to hit the ground running using DataPrime. A complete glossary of all DataPrime operators and expressions can be found here.
DataPrime supports data and helper types, expressions, and operators.
DataPrime supports the following data types:
boolean
: >, <, =, true
or false
number
/num
: 42 string
: “foo” timestamp
: 10-10-2021:11:11 It also supports several helper types, used as arguments for functions or operators, including:
Interval
: duration of timeRegexp
: standard regular expressionsExpressions consist of literals
, functions
/ methods
, operators
, cast
, and / or groupings
that return data.
literals
: 42functions
: length($.foo)methods
:$d.foo.length() //just alternative syntax for functionsoperator expressions
: 1 + $d.foocast
: $d.foo:stringgroupings
: (…)Use operators as commands that transform unstructured streams of JSON data and creates a new stream for a specified time period. Examples include: filter
, extract
, sortby
, groupby
, orderby
, find
, choose
.
Find out more about the language building blocks here.
Query format is as follows:
source logs | operator1 ... | operator2 ... | operator3 | ...
Any whitespace between operators is ignored, allowing to write queries as multiline queries:
source logs | operator1 .... | operator2 .... | operator3 ....
The following section provides use-cases of most frequently used operators, so you can hit the ground running. View a full glossary of all operators and expressions here.
Example input data:
# Input Examples { region: "us-east-1", az: "us-east-1a", duration_ms: 231, result: 'success' } { region: "us-east-1", az: "us-east-1b", duration_ms: 2222, result: 'failure' } { region: "eu-west-1", az: "eu-west-1a", duration_ms: 501, result: 'success' } { region: "eu-west-2", az: "eu-west-2a", duration_ms: 23, result: 'success' }
filter
Get only logs which signify success
:
source logs | filter result == 'success' # Result - Full raw logs
Get only logs which signify failure
in the eu-west-1
region:
# Option 1 source logs | filter result == 'failure' && region == 'eu-west-1' # Option 2 source logs | filter result == 'failure' | filter region == 'eu-west-1' # Result - Full raw logs
Get only logs which have a region that starts with eu-
:
source logs | filter region.startsWith('eu-') # Result - Full raw logs
Get only logs which have a duration larger than 2 seconds:
source logs | filter duration_ms / 1000 > 2 # Result - Full raw logs
Get all the logs except the ones which have a ap-southeast-1
region (3 options):
# Option 1 source logs | filter region != 'ap-southeast-1' # Option 2 source logs | filter !(region == 'ap-southeast-1') # Option 3 source logs | block region == 'ap-southeast-1'
Get 10 success logs which have a duration larger than 2 seconds:
source logs | filter result == 'success' && duration_ms / 1000 > 2 | limit 10 # Result - Full raw logs
Order the success logs by descending duration fetching the top-most 10 logs:
source logs | filter result == 'success' | orderby duration_ms desc | limit 10 # Result - Full raw logs
Do a free-text search on the msg
field, returning only logs which have the word “started” in them. Combine the free-text search with another filter (the stream
field has the value stdout
):
# Query 1 - Using the find operator for finding the text in msg, and then filtering using the filter operator source logs | find 'started' in msg | filter stream == 'stdout' # Results 1 - Full raw logs # Query 2 - Using the ~ predicate and combining the free-text search and the filter into one expression that is passed to filter source logs | msg ~ 'started' && filter stream == 'stdout' # Result - Full raw logs
Perform a wild-search of text inside the entire log:
# Option 1 - Using wildfind operator source logs | wildfind 'coralogix' # Option 2 - Inside an expression source logs | filter $d ~~ 'coralogix' # Result - Full raw logs
Use a full Lucene query to filter results:
source logs | lucene 'region:"us-east-1" AND "coralogix"'
Convert the data type of a key: Get the logs whose version
field contains a value greater than 32.
Input Data Example: { "version" : "12", ... } { "version": "17", ... } { "version": "65", ... } # Option 1 - By casting source logs | filter version:number > 32 # Option 2 - Using convert operator source logs | convert version:number | filter version > 32
Get success logs, but choose only result
and duration
fields for the output:
# Option 1 - Using the choose operator source logs | filter result == 'success' | choose result, duration_ms # Option 2 - Using the select operator, which is just an alias for choose source logs | filter result == 'success' | select result, duration_ms # Result - Only result and duration keys will remain in each event
choose
Get success logs, but choose only result
and duration
fields for the output.
# Option 1 - Using the choose operator source logs | filter result == 'success' | choose result, duration_ms # Result - Only result and duration keys will remain in each event
Construct a new object using choose
. The output fields will be as follows:
outcome
which will contains the value of the result
field from the original logduration_seconds
which will contain the original duration_ms
divided by 1000 in order to convert it to secondsmeaning_of_life
which will contain the value 42source logs | choose result as outcome, duration_ms / 1000 as duration_seconds, 42 as meaning_of_life # Result - Notice the key names have been changed according to the "as X" parts { "outcome": "success", "duration_seconds": 2.54, "meaning_of_life": 42 } { "outcome": "failure", "duration_seconds": 0.233, "meaning_of_life": 42 }
count
/ countby
Count all the success logs:
source logs | filter result == 'success' | count # Result - Total number of logs after filtering
Count logs, grouped by success/failure:
source logs | ountby result # Result - Number of logs per result value { "result": "success", "_count": 847 } { "result": "failure", "_count": 22 }
Count logs, grouped success/failure per region, with the results in a new field named request_count
:
source logs | countby region,result into request_count # Result - Notice that the count keyname is set to request_count because of "into request_count" { "region": "eu-west-1", "result": "success", "request_count": 287 } { "region": "eu-west-1", "result": "failure", "request_count": 2 } { "region": "eu-west-2", "result": "success", "request_count": 2000 } { "region": "eu-west-3", "result": "success", "request_count": 54 } { "region": "eu-west-3", "result": "failure", "request_count": 2 }
Count events in each region, and return the top 3 regions:
source logs | top 3 region by count() # Result - 3 rows, each containing a region and a count of logs
Average the duration in seconds for each region, and return the lowest (bottom) 3 regions:
source logs | bottom 3 regions by avg(duration_ms) # Result - 3 rows, each containing a region and an average duration
groupby
Get the average and maximum durations for successes/failures:
# Option 1 - Output keypaths are named automatically source logs | groupby result calc avg(duration_ms),max(duration_ms) # Result 1 { "result": "success", "_avg": 23.4, "_max": 287 } { "result": "failure", "_avg": 980.1, "_max": 1000.2 } # Option 2 - Using "as X" to name the output keypaths source logs | groupby result calc avg(duration_ms) as avg_duration,max(duration_ms) as max_duration # Result 2 { "result": "success", "average_duration": 23.4, "max_duration": 287 } { "result": "failure", "average_duration": 980.1, "max_duration": 1000.2 }
When querying with the groupby
operator, you can now apply an aggregation function (such as avg
, max
, sum
) to the bucket of results. This feature gives you the power to manipulate an aggregation expression inside the expression itself, allowing you to calculate and manipulate your data simultaneously. Examples of DataPrime expressions in aggregations can be found here.
distinct
Get distinct regions from the data, grouping logs by region name without any aggregations.
# Input Examples: { "region": "us-east-1", ... } { "region": "us-east-1", ... } { "region": "eu-west-1", ... } # Query 1 - Get distinct regions from the data source logs | distinct region # Results 1 - distinct region names { "region": "us-east-1" } { "region": "eu-west-1" }
enrich
Enrich and filter your logs using additional context from a lookup table. For example, enrich user activity logs with the user’s department and then retrieve logs of all users in the Finance department.
First, upload the lookup table:
Go to Data Flow > Data Enrichment page > Custom Enrichment section, and add Custom Enrichment. For more details see the Custom Enrichment documentation.
There are two possible ways to enrich your logs:
enrich
query to look up for a value in this table and enrich the log dynamically for the purpose of the query. The advantages of this mode:
The syntax:
enrich <value_to_lookup> into <enriched_key> using <lookup_table>
The <value_to_lookup>
(name of a log key or the actual value) will be looked up in the Custom Enrichment <lookup_table>
and a key called <enriched_key>
will be added to the log, containing all table columns as sub-keys. If the <value_to_lookup>
is not found in the <lookup_table>
, the <enriched_key>
will still be added but with “null” values, in order to preserve the same structure for all result logs. You can then filter the results using the DataPrime capabilities, such as filtering logs by specific value in the enriched field.
Example
The original log:
{ "userid": "111", ... }
The Custom Enrichment lookup table called “my_users”:
ID | Name | Department |
---|---|---|
111 | John | Finance |
222 | Emily | IT |
Running the following query:
enrich $d.userid into $d.user_enriched using my_users
Gives the following enriched log:
{ "userid": "111", "user_enriched": { "ID: "111", "Name": "John", "Department": "Finance" }, ... }
Notes:
source <lookup_table>
to view the enrichment table.<value_to_lookup>
exists in the <lookup_table>
, the sub-keys will be updated with the new value. If the <value_to_lookup>
does not exist, their current value will remain.<lookup_table>
will remain with their existing values.<lookup_table>
are considered to be strings. This means that:
<value_to_lookup>
must be in a string format.extract
extract
allows you to take some semi-structured text, and extract meaningful data out of it. There are multiple methods to extract this data:
regexp
– Extract using regular expression capture-groupsjsonobject
– Take a stringified JSON and extract it to a real objectkv
– Extract key=value pairs from a stringThe extracted values can also be converted to their real data type as part of the extraction. This is done by adding datatypes
clause that contains the required conversions (same syntax as the convert
operator).
Examples
Extract information from a text field using a regular expression:
# Input Data Examples: { "msg": "... Query_time: 2.32 Lock_time: 0.05487 ..." } { "msg": "... Query_time: 0.1222 Lock_time: 0.0002 ..." } ... # Example 1 # Query 1 source logs // Filter the relevant logs using lucene | lucene '"Query_time:"' // Extract duration and lock_time strings from the msg field | extract msg into stats using regexp( /# Query_time: (?<duration>.*?) Lock_time: (?<lock_time>.*?) /) // Choose to leave only the stats object that the extraction has created | choose stats # Results 1 - Output contains strings { "stats": { "duration": "0.08273" , "lock_time": "0.00121" } } { "stats": { "duration": "0.12" , "lock_time": "0.001" } } { "stats": { "duration": "3.121" , "lock_time": "0.83322" } } ... # Query 2 - Added datatypes clause, so the extracted values will be numbers instead of strings source logs | lucene '"Query_time:"' | extract msg into stats using regexp( /# Query_time: (?<duration>.*?) Lock_time: (?<lock_time>.*?) /) datatypes duration:number,lock_time:number | choose stats # Results 1 - Output contains real numbers and not strings (see above example) { "stats": { "duration": 0.08273 , "lock_time": 0.00121 } } { "stats": { "duration": 0.12 , "lock_time": 0.001 } } { "stats": { "duration": 3.121 , "lock_time": 0.83322 } } ... # Query 3 - Use the extracted values in a later operator, in this case a filter source logs | lucene '"Query_time:"' | extract msg into stats using regexp( /# Query_time: (?<duration>.*?) Lock_time: (?<lock_time>.*?) /) datatypes duration:number,lock_time:number | choose stats // Filter for only the logs which contain a lock_time which is above 0.5 | filter stats.lock_time > 0.5 # Results 1 - Output contains real numbers { "stats": { "duration": 3.121 , "lock_time": 0.83322 } } ...
Extract a JSON object stored in a string:
Input Data Examples: {"my_json": "{\\"x\\": 100, \\"y\\": 200, \\"z\\": {\\"a\\": 300}}" , "some_value": 1} {"my_json": "{\\"x\\": 400, \\"y\\": 500, \\"z\\": {\\"a\\": 600}}" , "some_value": 2} ... # Query 1 source logs | **extract my_json into my_data using jsonobject()** # Results 1 { "my_json": "..." "my_data": { "x": 100, "y": 200, "z": 300 } "some_value": 1 } { "my_json": "..." "my_data": { "x": 400, "y": 500, "z": 600 } "some_value": 2 } # Query 2 - Additional filtering on the resulting object source logs | extract my_json into my_data using jsonobject() | **filter my_data.x = 100** # Results 2 - Only the object containing x=100 is returned
Extract key=value data from a string. Notice that the kv
extraction honors quoted values.
# Example data for Query 1 { "log": "country=Japan city=\\"Tokyo\\"" , ... } { "log": "country=Israel city=\\"Tel Aviv\\"" , ... } ... # Query 1 source logs | extract log into my_data using kv() # Results 1 { "log": "..." "my_data": { "country": "Japan" "city": "tokyo" } ... } { "log": "..." "my_data": { "country": "Israel" "city": "Tel Aviv" } ... } # Example Data for Query 2 - Key/Value delimiter is ":" and not "=" { "log": "country:Japan city:\\"Tokyo\\"" , ... } { "log": "country:Israel city:\\"Tel Aviv\\"" , ... } ... # Query 2 source logs | extract log into my_data using kv(':') # Results 2 - Same results as query 1
orderby
/sortby
Order the successes by descending duration fetching the top-most 10 logs:
source logs | filter result == 'success' | orderby duration_ms desc | limit 10 # Result - Full raw logs
Numerically order a string field which effectively contains numbers:
# Input Data Examples: { "error_code": "23" } { "error_code": "12" } { "error_code": "4" } { "error_code": "1" } # Query 1 source logs | orderby error_code:number # Results 1 - Ordered by numeric value { "error_code": "1" } { "error_code": "4" } { "error_code": "12" } { "error_code": "23" } # Query 2 - By using the convert operator source logs | convert error_code:number | orderby error_code # Results 2 - Same results
Order by alphabetical order of multiple fields. [Note: Ordering is case-sensitive; A-Z will be ordered before a-z.]
# Example Input Data: { "last_name": "musk" , "first_name": "elon" } { "last_name": "jobs", "first_name": "steve" } ... # Query source logs | orderby last_name,first_name
Same example but with case-insensitive ordering:
source logs | orderby toLowerCase(last_name),toLowerCase(first_name)
orderby
Create a new keypath value:
# Example Input Data { "country": "Japan", "city": "Tokyo" } { "country": "Israel", "city": "Jerusalem" } ... # Query 1 source logs | create default_temperature from 32.5 # Results 1 - Each log contains the new field, with the same value { "country": "Japan", "city": "Tokyo", "default_temperature": 32.5 } { "country": "Israel", "city": "Jerusalem", "default_temperature": 32.5 } ... # Query 2 - Create a new field which contains the first three letters of the country, converted to uppercase source logs | create country_initials from toUpperCase(substr(country,1,3)) # Results 2 { "country": "Japan", "city": "Tokyo", "country_initials": "JAP" } { "country": "Israel", "city": "Jerusalem", "country_initials": "ISR" } # Input Examples for Query 3 { ... , "temp_in_fahrenheit": 87.2 } { ... , "temp_in_fahrenheit": 32 } ... # Query 3 source logs | create temp_in_celcius from (temp_in_fahrenheit - 32) * 5 / 9 # Results 3 { ... , "temp_in_fahrenheit": 87.2, "temp_in_celcius": 30.66666 } { ... , "temp_in_fahrenheit": 32, "temp_in_celcius": 0.0 } ... # Query 3 - Create a new field containing <country>/<city> as a string. Uses string-interpolation syntax source logs | create country_and_city from `{country}/{city}` # Results 3 { "country": "Japan", "city": "Tokyo", "country_and_city": "Japan/Tokyo" } { "country": "Israel", "city": "Jerusalem", "country_and_city": "Israel/Jerusalem" }
move
This operator can be used to move a source keypath to a target keypath, including entire subtrees. It can also be used for renaming a keypath.
For nested source keypaths, only the actual key is moved, merging the target keypath with any other objects or keys which already exist in the data.
When moving an entire subtree, the target keypath will serve as the root of the new subtree.
Examples
Move a key to a target location:
Input Data Example: { "query_id": "AAAA", "stats": { "total_duration": 23.3 , "total_rows: 500 }} ... # Query 1 - Rename a keypath source logs | move query_id to query_identifier # Results 1 - Keypath renamed { "query_identifier": "AAAA", "stats": { "total_duration": 23.3 , "total_rows: 500 } } ... # Query 2 - Move a key to an existing subtree source logs | move query_id to stats.query_id # Results 2 - query_id moved below "stats" { "stats": { "total_duration": 23.3 , "total_rows: 500, "query_id": "AAAA" } } ...
Move subtree to another location:
# Query 1 - Rename subtree source logs | move stats to execution_data # Results 1 - Rename subtree { "query_identifier": "AAAA", "execution_data": { "total_duration": 23.3 , "total_rows: 500 } } ... # Query 2 - Move subtree to root source logs | move stats to $d # Results 2 { "query_id": "AAAA", "total_duration": 23.3 , "total_rows: 500 } ... # Input Data Examples for Query 3 { "request": { "id": "1000" } , "user": { "name": "james", "id": 50 } } ... # Query 3 - Move subtree to another subtree source logs | move user to request.user_info # Results 3 - Entire user subtree moved below request.user_info { "request": { "id": "1000", "user_info": { "name": "james", "id": 50 } } } ...
replace
Replace the value in an existing keypath:
# Input Data Examples: { "user": { "id": "1000" , "name": "James", "email": "[email protected]" } } { "user": { "id": "2000" , "name": "John", "email": "[email protected]" } } ... # Example 1 replace user.name with 'anyone' # Results 1 { "user": { "id": "1000" , "name": "anyone", "email": "[email protected]" } } { "user": { "id": "2000" , "name": "anyone", "email": "[email protected]" } } ... # Example 2 replace user.name with user.email # Results 2 { "user": { "id": "1000" , "name": "[email protected]", "email": "[email protected]" } } { "user": { "id": "2000" , "name": "[email protected]", "email": "[email protected]" } } # Example 3 replace user.name with `UserName={user.id}` # Results 3 { "user": { "id": "1000" , "name": "UserName=1000", "email": "[email protected]" } } { "user": { "id": "2000" , "name": "UserName=2000", "email": "[email protected]" } } ...
remove
Examples
# Input Data Examples: { "stats": { "duration_ms": 2.34, "rows_scanned": 501, "message": "Operation has taken 2.34 seconds" }, "some_value": 1000 } ... # Query 1 - Remove the message keypath source logs | remove stats.message # Results 1 { "stats": { "duration_ms": 2.34, "rows_scanned": 501 }, "some_value": 1000 } ... # Query 2 - Remove the entire stats subtree source logs | remove stats # Results 2 { "some_value": 1000 } ...
redact
Examples
Input Examples: { "serverIp": "ip-172-30-20-12.eu-west-1.compute.internal", ... } { "serverIp": "ip-172-82-121-1.eu-west-2.compute.internal", ... } { "serverIp": "ip-172-99-72-187.us-east-1.compute.internal", ... } ... # Query 1 - Redact all parts containing the string '.computer.internal' source logs | redact serverIp matching 'compute.internal' to '' # Results 1 { "serverIp": "ip-172-30-20-12.eu-west-1", ... } { "serverIp": "ip-172-82-121-1.eu-west-2", ... } { "serverIp": "ip-172-99-72-187.us-east-1", ... } # Query 2 - Redact all digits before aggregation using regexp source logs | redact serverIp matching /[0-9]+/ to 'X' | countby serverIp # Results 2 { "serverIp": "ip-X-X-X-X.eu-west-X.compute.internal", "_count": 2323 } { "serverIp": "ip-X-X-X-X.us-east-X.compute.internal", "_count": 827 } ...
source
Set the data source that your DataPrime query is based on.
Syntax
source <data_store>
Where <data_store>
can be either:
logs
spans
(supported only in the API)Example
source logs