Skip to content

stitch

Description

The stitch command performs a horizontal union of two datasets, combining them side-by-side. It aligns rows from one dataset with rows from another and concatenates their columns, creating a single, unified dataset.

When using the stitch command:

  • Datasets must be ordered, as rows are combined in sequence (e.g., row 1 of dataset A is stitched with row 1 of dataset B).
  • If one dataset has more rows than the other, unmatched rows will have null values in the stitched columns.
  • The resulting dataset will contain all columns from both datasets.

How stitch differs from union

  • stitch combines datasets horizontally by appending columns row-by-row.
  • union appends rows vertically, stacking datasets on top of each other.

Syntax

... | stitch (<subquery>) into <target-keypath>

Usage

Consider the following datasets created using custom enrichments:

sales dataset:

{ "product": "Widget", "sales": 100 }
{ "product": "Gadget", "sales": 200 }
{ "product": "Dashboard", "sales": 150 }

revenue dataset:

{ "product": "Widget", "revenue": 5000 }
{ "product": "Gadget", "revenue": 8000 }
{ "product": "Dashboard", "revenue": 6000 }

The goal is to combine these datasets side-by-side, ensuring that each row from one dataset aligns with the corresponding row from the other.

Query

source sales | orderby product
| stitch (source revenue | orderby product) into combined_data

Query breakdown

  1. source sales

    Fetches all rows from the sales dataset, which contains products and their corresponding sales figures.

  2. orderby product

    Sorts the sales dataset by the product field, ensuring a consistent order for row alignment.

  3. stitch (source revenue | orderby product)

    • Fetches rows from the revenue dataset and sorts them by the product field.
    • Combines the sales and revenue datasets horizontally, aligning rows based on their order after sorting.
  4. into combined_data

    Stores the stitched dataset in a variable called combined_data. The resulting dataset includes columns from both the sales and revenue datasets.

Result

The resulting dataset aligns rows from both datasets horizontally:

{ "product": "Widget", "sales": 100, "combined_data": { "product": "Widget", "revenue": 5000 } }
{ "product": "Gadget", "sales": 200, "combined_data": { "product": "Gadget", "revenue": 8000 } }
{ "product": "Dashboard", "sales": 150, "combined_data": { "product": "Dashboard", "revenue": 6000 } }

Handling unequal row counts

If the datasets have unequal rows, the stitch command fills missing values with null.

Consider the following datasets:

sales dataset: (3 rows)

{ "product": "Widget", "sales": 100 }
{ "product": "Gadget", "sales": 200 }
{ "product": "Dashboard", "sales": 150 }

revenue dataset: (2 rows)

{ "product": "Widget", "revenue": 5000 }
{ "product": "Gadget", "revenue": 8000 }

Query

source sales | orderby product
| stitch (source revenue | orderby product) into combined_data

Result

{ "product": "Widget", "sales": 100, "combined_data": { "product": "Widget", "revenue": 5000 } }
{ "product": "Gadget", "sales": 200, "combined_data": { "product": "Gadget", "revenue": 8000 } }
{ "product": "Dashboard", "sales": 150, "combined_data": { "product": "Dashboard", "revenue": null } }

Guiding notes

  • Rows must correlate logically for stitching to produce meaningful results. Ensure that rows from both datasets represent the same entities and are in the same order. For example, if the product field in the sales dataset does not match the product field in the revenue dataset for corresponding rows, stitching will not work as expected.
  • If datasets differ in row count, the result will include null values for missing data in the shorter dataset.