### TL;DR

This recipe helps you **find and rank queries that scan large volumes of data** in the system dataset, aiding in query optimization and cost analysis.

### Problem / Use case

You want to monitor DataPrime usage and detect which queries consume the most storage, to identify costly or inefficient patterns.

### Query

```dataprime
source system/engine.queries
| filter queryInfo.queryOutcome.status == 'Completed'
| filter queryInfo.queryOutcome.storage.bytesScanned:number > 1000000
| groupby queryInfo.queryOutcome.storage.bytesScanned:number as avg_bytes_scanned
| sortby avg_bytes_scanned desc
```

### Explanation

1. `source system/engine.queries` — Reads from the system dataset containing execution metadata for all DataPrime queries.
1. `filter queryInfo.queryOutcome.status == 'Completed'` — Keeps only successfully completed queries.
1. `filter queryInfo.queryOutcome.storage.bytesScanned:number > 1000000` — Filters queries that scanned more than 1 MB of data.
1. `groupby queryInfo.queryOutcome.storage.bytesScanned:number as avg_bytes_scanned` — Groups by the amount of data scanned and calculates the average.
1. `sortby avg_bytes_scanned desc` — Lists results from the most data-intensive queries to the least.

### Expected output

| avg_bytes_scanned |
| ----------------- |
| 1666196703        |
| 1647101739        |
| 1249189292        |
| 1071874573        |
| 1046265388        |

### Variations

- Replace the threshold (`> 1000000`) with another value to catch lighter or heavier scans.
- Add `groupby queryInfo.queryText` to identify specific query patterns responsible for large scans.
- Use `count()` to see how frequently large-scan queries occur per user or time window.
