Audit schema field types and values by dataset

TL;DR

Group system/engine.schema_fields by dataset, count distinct types with distinct_count(type), and rank datasets by type_count to reveal which have the most diverse schemas.

Problem / Use case

You want to understand how many unique data types exist across your datasets and see which ones contain the greatest type variety. This helps identify inconsistent schemas or areas where normalization is needed.

Query

source system/engine.schema_fields
| groupby dataset
    aggregate
        distinct_count(type) as type_count,
        any_value(type) as sample_type
| orderby type_count desc

Expected output

dataset	type_count	sample_type
logs	13	object
aaa.audit_events	10	string
engine.schema_fields	8	null
engine.queries	7	number
spans	6	string
labs.limitViolations	5	number

Variations

Count schema fields by source system instead of dataset.
Include field count per type to measure schema complexity within datasets.

Previous Count schema changes over time

Next Monitor dataset schema changes for compliance