How to use DataPrime to combine datasets and correlate logs

Goal

By the end of this guide you should be able to:

Use join to combine logs and traces by shared fields
Use union to merge datasets with compatible schemas
Use identifier-based filtering to correlate logs without a formal join

Why it matters

Real-world debugging rarely involves a single service. To understand the full picture, you often need to combine data from multiple sources—logs, traces, or metrics—based on shared identifiers like request_id, trace_id, or user_id. This guide helps you unify fragmented data into a cohesive timeline for triage, monitoring, and root cause analysis.

Combining datasets using `join`

Description

The join command combines two datasets by matching a common field (e.g., trace_id, request_id). It's useful for enriching logs with related events from another source.

Note

Joins can be resource intensive. Try to filter as much as possible before joining.

Syntax

<query1>
| join (
  <query2>
) on <join_condition>

Merging datasets using `union`

Description

The union command merges two datasets into a single stream. Both sources should have compatible schemas or be normalized with choose.

Syntax

<query1>
| union (
  <query2>
)

Common pitfalls

Unfiltered joins: Always apply filter before join to avoid performance issues.
Mismatched schemas: Use choose to normalize fields before union.
Missing correlation keys: Without a shared ID like request_id, correlation is not possible.

Previous Arrays and strings

Next Normalizing data

How to use DataPrime to combine datasets and correlate logs

Goal

Why it matters

Combining datasets using join

Description

Syntax

Merging datasets using union

Description

Syntax

Common pitfalls

Combining datasets using `join`

Merging datasets using `union`