Skip to content

window

Description

Applies a window function over a set of related rows and adds the result as a new column to every row in the result set.

  • A window operation does not collapse rows (unlike groupby). It enriches each row with a computed value.
  • You can optionally:
  • Partition the data into independent groups using by
  • Order rows inside each partition using orderby
  • Define a frame to limit which rows are considered for each output row (e.g., “previous 10 rows”)

Note

A single window operator supports one window function call inside calc. To add multiple window-based columns, chain multiple window operators.

Warning

If you use an orderby clause, the window operator will re-order the result set by the orderby expressions.

Syntax

window
  [ by <expr> [, <expr> ...] ]
  [ orderby <expr> [asc|desc] [, <expr> [asc|desc] ...] ]
  [ <frame-type> <frame-function> ]
  calc <window-function-expression> as <alias>

Arguments

NameTypeRequiredDescription
bylist[scalar-expression]falsePartitions the current result set into independent groups (“buckets”).

The window calculation is performed separately within each partition. If omitted, the window is computed over the entire result set (no partitioning).
orderbylist[scalar-expression (asc|desc)]falseDefines the row order within each partition.
frame-typeenumfalseDefines how the window frame is interpreted.
frame-functionframe-functionfalseDefines which rows are included in the calculation **relative to the current row.

frame-type

  • rows — frame boundaries are based on row counts (e.g., previous 10 rows)
  • range — frame boundaries are based on the ordered value (commonly time intervals when ordering by timestamp)
  • groups — frame boundaries are based on peer groups according to the orderby clause

If omitted, a default frame is applied (see frame-function).

frame-function

Supported frame functions:

  • between(from, to) — explicit frame bounds (both from and to must be literals)
  • all() — equivalent to between(null, null)
  • running() — cumulative window up to the current row (equivalent to between(to=0))
  • previous(n) — previous n rows / groups / interval, excluding the current row (between(-n, -1))
  • next(n) — next n rows / groups / interval, excluding the current row (between(1, n))
  • lookbehind(n) — includes current row and n rows / groups / interval behind (between(-n, 0))
  • lookahead(n) — includes current row and n rows / groups / interval ahead (between(0, n))
  • lookaround(n) — includes current row and n rows / groups / interval on both sides (between(-n, n))

Mapping of between(from, to) to SQL-style semantics:

  • from = null → unbounded preceding
  • to = null → unbounded following
  • from/to = 0 → current row
  • from/to = -1 → 1 preceding
  • from/to = 1 → 1 following

Defaults (Postgres-like semantics):

  • If orderby is omitted → default frame is rows all()
  • If orderby is present → default frame is groups running()

Validations:

  • lookahead, lookbehind, lookaround, next, previous: n must be a non-negative literal
  • between(from, to): from and to must be literals

Rules:

  • Must contain exactly one window function call.
  • Can include normal scalar operations around the window function (e.g., value / avg(value)).
  • The result can be named using as <alias>.

Supported window functions include:

Ranking

  • dense_rank() — Returns an integer rank (starting at 1) within the partition, without gaps. Peer rows (rows with equal orderby values) share the same rank, and the next distinct orderby value increases the rank by 1.
  • percent_rank() — Returns the relative rank as a fraction in [0, 1], computed as (rank() - 1) / (count(*) - 1) within the partition. Returns 0 when the partition contains a single row. Because it’s based on rank(), it may not reach 1 if the last rows are peers (ties).
  • rank() — Returns an integer rank (starting at 1) within the partition, with gaps. Peer rows share the same rank, and the next rank jumps by the number of peer rows (e.g., ordered values [10, 10, 20] → ranks [1, 1, 3]).
  • row_number() — Returns a unique sequential integer for each row (starting at 1) in the partition, ordered by orderby. Rows that compare equal in orderby still get different row numbers; add additional orderby expressions to make tie-breaking deterministic.
  • cume_dist() — Returns the cumulative distribution as a fraction in (0, 1], computed as count(rows with orderby value <= current row’s value) / count(*) within the partition. All peer rows return the same value.
  • ntile(n: number) — Splits the ordered partition into n buckets as evenly as possible and returns the 1-based bucket number for the current row. Bucket sizes differ by at most 1; if the rows don’t divide evenly, earlier buckets contain one extra row. n must be a positive integer.

Value

  • lag(expr, offset?: number = 1, default?: any = null)
  • lead(expr, offset?: number = 1, default?: any = null)
  • first_value(expr)
  • nth_value(expr, n: number)
  • last_value(expr)

Aggregations

  • Aggregation functions can also be used as window functions.
  • All aggregation functions work for cumulative frames (e.g., all(), running()).
  • For sliding frames (where rows can leave the window), only these aggregates are supported: avg(), count(), max(), min(), sum(), stddev(), variance(), sample_stddev(), sample_variance()

Example 1

Use case: Keep one DataPrime query per user based on bytes scanned

Partition queries by user, rank each user’s queries using the orderby clause, and then keep only the row with rank == 1.

Example data

{ "clientInfo": { "userEmail": "[email protected]" }, "queryInfo": { "queryId": "q1", "interfaceType": "dataprime", "queryOutcome": { "storage": { "stats": { "bytesScanned": 1200 } } } } },
{ "clientInfo": { "userEmail": "[email protected]" }, "queryInfo": { "queryId": "q2", "interfaceType": "dataprime", "queryOutcome": { "storage": { "stats": { "bytesScanned": 900 } } } } },
{ "clientInfo": { "userEmail": "[email protected]" }, "queryInfo": { "queryId": "q3", "interfaceType": "dataprime", "queryOutcome": { "storage": { "stats": { "bytesScanned": 5000 } } } } },
{ "clientInfo": { "userEmail": "[email protected]" }, "queryInfo": { "queryId": "q4", "interfaceType": "dataprime", "queryOutcome": { "storage": { "stats": { "bytesScanned": 3000 } } } } }

Example query

source system/engine.queries
| filter queryInfo.interfaceType == 'dataprime'
| window by clientInfo.userEmail
    orderby queryInfo.queryOutcome.storage.stats.bytesScanned
    calc rank() as rank
| filter rank == 1
| choose clientInfo.userEmail as user, queryInfo.queryId,queryInfo.queryOutcome.storage.stats.bytesScanned

Example output

userqueryIdbytesScanned
[email protected]q2900
[email protected]q43000

Example 2

Use case: Keep the top 3 most recent log entries per application

Partition logs by applicationname, rank each entry from newest to oldest using dense_rank(), then keep only the first 3 ranks per partition. This pattern returns top N rows per group — distinct from the top command, which returns the top N groups overall.

Example data

{ "timestamp": "2026-05-11T12:40:23Z", "applicationname": "cx-metrics" },
{ "timestamp": "2026-05-11T12:39:51Z", "applicationname": "cx-metrics" },
{ "timestamp": "2026-05-11T12:39:31Z", "applicationname": "cx-metrics" },
{ "timestamp": "2026-05-11T12:38:00Z", "applicationname": "cx-metrics" },
{ "timestamp": "2026-05-11T12:39:09Z", "applicationname": "claude-code" },
{ "timestamp": "2026-05-11T12:39:08Z", "applicationname": "claude-code" },
{ "timestamp": "2026-05-11T12:30:00Z", "applicationname": "claude-code" }

Example query

source application_logs
| window by applicationname
    orderby timestamp desc
    calc dense_rank() as rank
| filter rank <= 3
| choose timestamp, applicationname

Example output

timestampapplicationname
2026-05-11T12:40:23Zcx-metrics
2026-05-11T12:39:51Zcx-metrics
2026-05-11T12:39:31Zcx-metrics
2026-05-11T12:39:09Zclaude-code
2026-05-11T12:39:08Zclaude-code
2026-05-11T12:30:00Zclaude-code

Example 3

Use case: Detect slow HTTP requests compared to the previous 10 requests (per method)

Order requests by time, then compute the average duration over the previous 10 rows in the same HTTP method partition. Finally, filter requests that are more than 3× slower than that baseline.

Example data

{ "method": "GET", "timestamp": "2026-02-21T10:00:00Z", "duration_ms": 120 },
{ "method": "GET", "timestamp": "2026-02-21T10:00:01Z", "duration_ms": 110 },
{ "method": "GET", "timestamp": "2026-02-21T10:00:02Z", "duration_ms": 115 },
{ "method": "GET", "timestamp": "2026-02-21T10:00:03Z", "duration_ms": 900 },
{ "method": "POST", "timestamp": "2026-02-21T10:00:00Z", "duration_ms": 200 },
{ "method": "POST", "timestamp": "2026-02-21T10:00:01Z", "duration_ms": 210 },
{ "method": "POST", "timestamp": "2026-02-21T10:00:02Z", "duration_ms": 205 },
{ "method": "POST", "timestamp": "2026-02-21T10:00:03Z", "duration_ms": 800 }

Example query

source http_logs
| window by method
    orderby timestamp asc
    rows previous(10)
    calc avg(duration_ms) as avg_prev_10_ms
| filter duration_ms > avg_prev_10_ms * 3
| choose method, timestamp, duration_ms, avg_prev_10_ms

Example output

methodtimestampduration_msavg_prev_10_ms
GET2026-02-21T10:00:03Z900115
POST2026-02-21T10:00:03Z800205