Find heavy DataPrime queries
TL;DR
This recipe helps you find and rank queries that scan large volumes of data in the system dataset, aiding in query optimization and cost analysis.
Problem / Use case
You want to monitor DataPrime usage and detect which queries consume the most storage, to identify costly or inefficient patterns.
Query
source system/engine.queries
| filter queryInfo.queryOutcome.status == 'Completed'
| filter queryInfo.queryOutcome.storage.bytesScanned:number > 1000000
| groupby queryInfo.queryOutcome.storage.bytesScanned:number as avg_bytes_scanned
| sortby avg_bytes_scanned desc
Explanation
source system/engine.queries
— Reads from the system dataset containing execution metadata for all DataPrime queries.filter queryInfo.queryOutcome.status == 'Completed'
— Keeps only successfully completed queries.filter queryInfo.queryOutcome.storage.bytesScanned:number > 1000000
— Filters queries that scanned more than 1 MB of data.groupby queryInfo.queryOutcome.storage.bytesScanned:number as avg_bytes_scanned
— Groups by the amount of data scanned and calculates the average.sortby avg_bytes_scanned desc
— Lists results from the most data-intensive queries to the least.
Expected output
avg_bytes_scanned |
---|
1666196703 |
1647101739 |
1249189292 |
1071874573 |
1046265388 |
Variations
- Replace the threshold (
> 1000000
) with another value to catch lighter or heavier scans. - Add
groupby queryInfo.queryText
to identify specific query patterns responsible for large scans. - Use
count()
to see how frequently large-scan queries occur per user or time window.
Theme
Light