Skip to content

Audit schema field types and values by dataset

TL;DR

Group system/engine.schema_fields by dataset, count distinct types with distinct_count(type), and rank datasets by type_count to reveal which have the most diverse schemas.

Problem / Use case

You want to understand how many unique data types exist across your datasets and see which ones contain the greatest type variety. This helps identify inconsistent schemas or areas where normalization is needed.

Query

source system/engine.schema_fields
| groupby dataset
    aggregate
        distinct_count(type) as type_count,
        any_value(type) as sample_type
| orderby type_count desc

Expected output

datasettype_countsample_type
logs13object
aaa.audit_events10string
engine.schema_fields8null
engine.queries7number
spans6string
labs.limitViolations5number

Variations

  • Count schema fields by source system instead of dataset.
  • Include field count per type to measure schema complexity within datasets.