Skip to content

How to use DataPrime to enrich and reshape data on the fly

Goal

By the end of this guide you should be able to enrich documents using lookup tables, extract structured values from strings, parse key-value pairs, and explode arrays into separate documents.

Why it matters

Data in logs and traces is often messy, inconsistent, or incomplete. You may need to add metadata, normalize fields across schemas, parse semi-structured text, or reshape arrays into flat rows for easier analysis. DataPrime lets you do all of this at query time, without needing to preprocess or re-index.

These transformations are essential for debugging, auditing, and building clean, meaningful dashboards—even when your logs aren’t clean.


Enrich documents with lookup metadata (enrich)

The enrich command allows you to add contextual data from an external table—such as team assignments, department names, or locations—based on a key in your document. This is perfect for enriching logs with human-readable or operational metadata that isn't present in the original log stream. A custom enrichment table is required to use the enrich command.

enrich userid into user_data using user_lookup_table

For example, if your log contains a userid, you can enrich it with fields like name and department from the lookup table. The enriched data is attached as a nested object under user_data.


Extract structured data from a string (extract + regexp)

Logs often contain structured information hidden inside a string message. Use the extract command with a regular expression to pull out useful fields like usernames, error codes, or transaction IDs. This makes them accessible for filtering, grouping, and display.

extract message into parsed_fields using regexp(e=/user (?<username>.*) logged in/)

The named capture group (?<username>) ensures that if the pattern matches, the username is extracted into a new field under parsed_fields.


Parse key-value strings into objects (extract + kv)

If your data includes log lines, query strings, or parameters encoded as key-value pairs, the kv extraction strategy is a fast way to parse them into structured fields. It works great for payloads formatted like a=b&c=d.

extract query_string into query_params using kv(pair_delimiter='&', key_delimiter='=')

This creates an object under query_params, allowing you to reference values like query_params.user or query_params.env in downstream filters or transforms. You can also pair this with urlDecode() to clean encoded values.


Explode arrays into multiple documents (explode)

When a log contains an array—like user roles, IP addresses, or scope tags—you can use explode to split it into separate documents, one per element. This makes the data much easier to analyze, aggregate, or filter.

explode scopes into scope original preserve

With original preserve, all other fields from the original log are kept. Each resulting document contains one value from the array assigned to the key scope, which you can then group or filter on independently.


Expected output

After applying these transformations, your documents will be cleaner and more consistent. You’ll be able to:

  • Join metadata into your logs based on user IDs or other keys.
  • Pull meaningful fields out of messages for easier filtering and alerting.
  • Convert encoded or blob-style strings into structured JSON.
  • Flatten arrays into single-value documents for counting, grouping, and dashboards.

Common pitfalls or gotchas

  • enrich only works if your lookup key is a string—cast it if needed.
  • extract using regexp will return null if the pattern doesn't match.
  • kv extraction assumes consistent formatting—watch for missing delimiters or malformed strings.
  • explode overwrites destination fields if names collide—rename carefully.