Audit schema field types and values by dataset
TL;DR
Group system/engine.schema_fields
by dataset, count distinct types with distinct_count(type)
, and rank datasets by type_count
to reveal which have the most diverse schemas.
Problem / Use case
You want to understand how many unique data types exist across your datasets and see which ones contain the greatest type variety. This helps identify inconsistent schemas or areas where normalization is needed.
Query
source system/engine.schema_fields
| groupby dataset
aggregate
distinct_count(type) as type_count,
any_value(type) as sample_type
| orderby type_count desc
Expected output
dataset | type_count | sample_type |
---|---|---|
logs | 13 | object |
aaa.audit_events | 10 | string |
engine.schema_fields | 8 | null |
engine.queries | 7 | number |
spans | 6 | string |
labs.limitViolations | 5 | number |
Variations
- Count schema fields by source system instead of dataset.
- Include field count per type to measure schema complexity within datasets.
Theme
Light