Find heavy DataPrime queries

TL;DR

This recipe helps you find and rank queries that scan large volumes of data in the system dataset, aiding in query optimization and cost analysis.

Problem / Use case

You want to monitor DataPrime usage and detect which queries consume the most storage, to identify costly or inefficient patterns.

Query

source system/engine.queries
| filter queryInfo.queryOutcome.status == 'Completed'
| filter queryInfo.queryOutcome.storage.bytesScanned:number > 1000000
| groupby queryInfo.queryOutcome.storage.bytesScanned:number as avg_bytes_scanned
| sortby avg_bytes_scanned desc

Explanation

source system/engine.queries — Reads from the system dataset containing execution metadata for all DataPrime queries.
filter queryInfo.queryOutcome.status == 'Completed' — Keeps only successfully completed queries.
filter queryInfo.queryOutcome.storage.bytesScanned:number > 1000000 — Filters queries that scanned more than 1 MB of data.
groupby queryInfo.queryOutcome.storage.bytesScanned:number as avg_bytes_scanned — Groups by the amount of data scanned and calculates the average.
sortby avg_bytes_scanned desc — Lists results from the most data-intensive queries to the least.

Expected output

avg_bytes_scanned
1666196703
1647101739
1249189292
1071874573
1046265388

Variations

Replace the threshold (> 1000000) with another value to catch lighter or heavier scans.
Add groupby queryInfo.queryText to identify specific query patterns responsible for large scans.
Use count() to see how frequently large-scan queries occur per user or time window.

Need help? Contact Support.

What's new? Find out here.

LLM? Read llms.txt.

Previous Group failed queries by failure type

Next API queries