How to use DataPrime to combine datasets and correlate logs
Goal
By the end of this guide you should be able to:
- Use
join
to combine logs and traces by shared fields - Use
union
to merge datasets with compatible schemas - Use identifier-based filtering to correlate logs without a formal join
Why it matters
Real-world debugging rarely involves a single service. To understand the full picture, you often need to combine data from multiple sources—logs, traces, or metrics—based on shared identifiers like request_id
, trace_id
, or user_id
. This guide helps you unify fragmented data into a cohesive timeline for triage, monitoring, and root cause analysis.
Combining datasets using join
Description
The join
command combines two datasets by matching a common field (e.g., trace_id
, request_id
). It's useful for enriching logs with related events from another source.
Warning
Joins are expensive. Always filter aggressively before joining.
Syntax
Merging datasets using union
Description
The union
command merges two datasets into a single stream. Both sources should have compatible schemas or be normalized with choose
.
Syntax
Common pitfalls
- Unfiltered joins: Always apply
filter
beforejoin
to avoid performance issues. - Mismatched schemas: Use
choose
to normalize fields beforeunion
. - Missing correlation keys: Without a shared ID like
request_id
, correlation is not possible.