# How to use DataPrime to combine datasets and correlate logs

## Goal

By the end of this guide you should be able to:

- Use `join` to combine logs and traces by shared fields
- Use `union` to merge datasets with compatible schemas
- Use identifier-based filtering to correlate logs without a formal join

## Why it matters

Real-world debugging rarely involves a single service. To understand the full picture, you often need to combine data from multiple sources—logs, traces, or metrics—based on shared identifiers like `request_id`, `trace_id`, or `user_id`. This guide helps you unify fragmented data into a cohesive timeline for triage, monitoring, and root cause analysis.

______________________________________________________________________

## Combining datasets using `join`

### Description

The `join` command combines two datasets by matching a common field (e.g., `trace_id`, `request_id`). It's useful for enriching logs with related events from another source.

Note

Joins can be resource intensive. Try to filter as much as possible before joining.

### Syntax

```dataprime
<query1>
| join (
  <query2>
) on <join_condition>
```

______________________________________________________________________

## Merging datasets using `union`

### Description

The `union` command merges two datasets into a single stream. Both sources should have compatible schemas or be normalized with `choose`.

### Syntax

```dataprime
<query1>
| union (
  <query2>
)
```

______________________________________________________________________

## Common pitfalls

- **Unfiltered joins**: Always apply `filter` before `join` to avoid performance issues.
- **Mismatched schemas**: Use `choose` to normalize fields before `union`.
- **Missing correlation keys**: Without a shared ID like `request_id`, correlation is not possible.
