Skip to content

How to use DataPrime to combine datasets and correlate logs

Goal

By the end of this guide you should be able to:

  • Use join to combine logs and traces by shared fields
  • Use union to merge datasets with compatible schemas
  • Use identifier-based filtering to correlate logs without a formal join

Why it matters

Real-world debugging rarely involves a single service. To understand the full picture, you often need to combine data from multiple sources—logs, traces, or metrics—based on shared identifiers like request_id, trace_id, or user_id. This guide helps you unify fragmented data into a cohesive timeline for triage, monitoring, and root cause analysis.


Combining datasets using join

Description

The join command combines two datasets by matching a common field (e.g., trace_id, request_id). It's useful for enriching logs with related events from another source.

Warning

Joins are expensive. Always filter aggressively before joining.

Syntax

<query1>
| join (
  <query2>
) on <join_condition>

Merging datasets using union

Description

The union command merges two datasets into a single stream. Both sources should have compatible schemas or be normalized with choose.

Syntax

<query1>
| union (
  <query2>
)

Common pitfalls

  • Unfiltered joins: Always apply filter before join to avoid performance issues.
  • Mismatched schemas: Use choose to normalize fields before union.
  • Missing correlation keys: Without a shared ID like request_id, correlation is not possible.