stitch
Description
The stitch
command performs a horizontal union of two datasets, combining them side-by-side. It aligns rows from one dataset with rows from another and merges their columns into a single unified dataset.
This is particularly useful for joining datasets that share a logical order but lack a join key.
Key behaviors:
- Rows are combined in order (row 1 with row 1, row 2 with row 2, etc.).
- If one dataset contains more rows than the other, unmatched rows include
null
values for missing columns. - The resulting dataset contains all columns from both sides.
Difference from union
:
stitch
merges datasets horizontally (adding columns).union
merges datasets vertically (adding rows).
Syntax
Example 1
Use case: Merge sales and revenue data side-by-side
Suppose you have two datasets—sales
and revenue
—that contain related information about products. You can use stitch
to combine them horizontally into a single document per product, provided both datasets are ordered consistently.
Example data
\\ sales
{ "product": "Widget", "sales": 100 },
{ "product": "Gadget", "sales": 200 },
{ "product": "Dashboard", "sales": 150 }
\\ revenue
{ "product": "Widget", "revenue": 5000 },
{ "product": "Gadget", "revenue": 8000 },
{ "product": "Dashboard", "revenue": 6000 }
Example query
Example output
{ "product": "Widget", "sales": 100, "combined_data": { "product": "Widget", "revenue": 5000 } },
{ "product": "Gadget", "sales": 200, "combined_data": { "product": "Gadget", "revenue": 8000 } },
{ "product": "Dashboard", "sales": 150, "combined_data": { "product": "Dashboard", "revenue": 6000 } }
The stitch
command aligns rows based on their order after sorting and attaches the right dataset (revenue
) into the target keypath (combined_data
) within the left dataset (sales
).
Note
If the datasets have differing row counts, missing values from the shorter dataset are replaced with null
:
Example 2
If the datasets have unequal rows, the stitch command fills missing values with null.
Consider the following datasets:
sales dataset: (3 rows)
{ "product": "Widget", "sales": 100 }
{ "product": "Gadget", "sales": 200 }
{ "product": "Dashboard", "sales": 150 }
revenue dataset: (2 rows)
Example query
Example output
{ "product": "Widget", "sales": 100, "combined_data": { "product": "Widget", "revenue": 5000 } }
{ "product": "Gadget", "sales": 200, "combined_data": { "product": "Gadget", "revenue": 8000 } }
{ "product": "Dashboard", "sales": 150, "combined_data": { "product": "Dashboard", "revenue": null } }
Tip
Rows must represent the same entities and appear in the same order for meaningful stitching. If row alignment differs, results may not match as expected.