Skip to content

Find heavy DataPrime queries

TL;DR

This recipe helps you find and rank queries that scan large volumes of data in the system dataset, aiding in query optimization and cost analysis.

Problem / Use case

You want to monitor DataPrime usage and detect which queries consume the most storage, to identify costly or inefficient patterns.

Query

source system/engine.queries
| filter queryInfo.queryOutcome.status == 'Completed'
| filter queryInfo.queryOutcome.storage.bytesScanned:number > 1000000
| groupby queryInfo.queryOutcome.storage.bytesScanned:number as avg_bytes_scanned
| sortby avg_bytes_scanned desc

Explanation

  1. source system/engine.queries — Reads from the system dataset containing execution metadata for all DataPrime queries.
  2. filter queryInfo.queryOutcome.status == 'Completed' — Keeps only successfully completed queries.
  3. filter queryInfo.queryOutcome.storage.bytesScanned:number > 1000000 — Filters queries that scanned more than 1 MB of data.
  4. groupby queryInfo.queryOutcome.storage.bytesScanned:number as avg_bytes_scanned — Groups by the amount of data scanned and calculates the average.
  5. sortby avg_bytes_scanned desc — Lists results from the most data-intensive queries to the least.

Expected output

avg_bytes_scanned
1666196703
1647101739
1249189292
1071874573
1046265388

Variations

  • Replace the threshold (> 1000000) with another value to catch lighter or heavier scans.
  • Add groupby queryInfo.queryText to identify specific query patterns responsible for large scans.
  • Use count() to see how frequently large-scan queries occur per user or time window.