Skip to content

extract - Parse strings into objects

The extract keyword is used to taking values out of a string and turning them into an object. It utilizes regular expressions and named capture groups to turn embedded strings into indexed, queryable data.

Syntax

(e|extract) <expression> into <keypath> using <extraction-type>(<extraction-params>) [datatypes keypath:datatype,keypath:datatype,...]

Using regular expressions

Defining a regular expression in dataprime is done using the regexp type. It accepts one argument, e (short for expression) that contains the named capture groups necessary for parsing the string.

For example, the following will create a regular expression that parses the username out of the following string: "user Chris has logged in".

regexp(e=/user (?<user>.*) has logged in/)

Example - Extracting a username from a login message, using Regex

Consider the following document:

{
  "message": "user Chris has logged in"
}

We want to extract the username from this string, so that we can search and query it directly. This can be done using the extract keyword with a regular expression.

extract message into my_data using regexp(e=/user (?<user>.*) has logged in/)

This results in this log object:

{
  "message": "user Chris has logged in",
  "my_data": {
    "user": "Chris"
  }
}

Using Key-Value parsing

Some strings follow consistent key value formats, and so simply need to be converted into a JSON structure. This would be difficult to do in regex, which is why we have the kv parsing option too.

The kv function takes two arguments.

  • pair_delimiter - The delimiter to expect between pairs. Default is (a space)

  • key_delimiter - The delimiter to expect separating between a key and a value. Default is =.

Example - Converting query string parameters into a JSON object using Key-Value parsing

Consider the following document:

{
  "domain": "https://www.coralogix.com",
  "path": "/home",
  "query_string": "a=b&b=c&c=d"
}

We want to extract each query string parameter into its own object field, which will give us the ability to query on individual query string fields. We can do this using kv.

extract query_string into query_string_parameters using kv(pair_delimiter='&',key_delimiter='=')

The resulting document looks like this:

{
  "domain":"https://www.coralogix.com",
  "query_string":"a=b&b=c&c=d",
  "path": "/home",
  "query_string_parameters":{
    "a":"b",
    "b":"c",
    "c":"d"
  }
}

Parse escaped JSON object

JSON objects that have been converted to strings are commonly escaped, so they can exist as string fields inside a larger JSON object. The jsonobject function will unescape the stringified JSON and convert into a fully parsed object.

The jsonobject function takes one argument:

  • max_unescape_count - Max number of escaping levels to unescape before parsing the json. Default is 1. When set to 1 or more, the engine will detect whether the value contains an escaped JSON string and unescape it until its parsable or max unescape count ie exceeded. Default is 1

Example - Parsing an escaped JSON object back into an object.

Consider the following document:

{
  "nested_json": "{\"key\": \"value\"}"
}

Using the jsonobject function in conjunction with extract allows us to convert this into a queryable object.

extract nested_json into parsed_json using jsonobject

The resulting document looks like this:

{
  "nested_json": "{\"key\": \"value\"}",
  "parsed_json": {
    "key": "value"
  }
}

Providing type information

It is possible to provide datatype information as part of the extraction, by using the datatypes clause. For example, adding datatypes my_field:number to an extraction would cause the extract my_field keypath to be a number instead of a string.

For example, the following query will inform dataprime that my_field should be interpreted as type number.

extract $d.my_msg into $d.data using kv() datatypes my_field:number

NOTE: You don't need to specify the parent object. We can see that the extraction happens into the field data but the keypath in datatypes is only the name of the extracted field.

Extracted data always goes into a new keypath as an object, allowing further processing of the new keys inside that new object. For example:

Example - Ensuring duration is parsed as a number for further processing.

Consider the following documents:

{ 
  "msg": "query_type=fetch query_id=100 query_results_duration_ms=232" 
},
{ 
  "msg": "query_type=fetch query_id=200 query_results_duration_ms=1001" 
}

We want to extract the results duration field as a number, so that we can perform a numerical filter. We can do this using the datatypes clause:

extract msg into query_data using kv() datatypes query_results_duration_ms:number

This will result in the following documents:

{ 
  "msg": "query_type=fetch query_id=100 query_results_duration_ms=232",
  "query_type": "fetch",
  "query_id": "100", // Note the string type!
  "query_results_duration_ms": 232
},
{ 
  "msg": "query_type=fetch query_id=200 query_results_duration_ms=1001",
  "query_type": "fetch",
  "query_id": "200", // Note the string type!
  "query_results_duration_ms": 1001
}

We can now perform filters, aggregations and more, because we provided type information.