Back
Back

Monitor schema health with engine.schema_fields: Structure, Drift, and Volatility

Monitor schema health with engine.schema_fields: Structure, Drift, and Volatility

If you’ve worked with an observability pipeline, you’ve probably experienced schema problems: a field disappears, a type shifts from string to number, or a new label quietly appears. The causes are everywhere. Different teams adopt different naming conventions. A dependency upgrade changes the shape of a library’s log output. Over time, these small, reasonable decisions compound into schema sprawl: dashboards break, alerts misfire, and teams scramble to find out what happened. The problem isn’t so much that something changes; it’s that nobody notices until it’s too late.

Meet the engine.schema_fields Dataset

Coralogix’s engine.schema_fields Dataset exists to solve exactly these problems. It’s part of Coralogix’s System Dataspace, a collection of internal Datasets that perform observability on observability: these System Datasets expose metadata about your system and usage, helping you optimize analytics operations. Sitting within the System Dataspace, engine.schema_fields captures metadata and historical snapshots of every Dataset’s structural evolution, giving you a living record of how your data’s shape changes over time. Think of it as version control for your schemas, except it runs automatically and requires no commits.

In this post, we’ll walk through what engine.schema_fields contains, why it matters, and how to use it to track three dimensions of schema health: Structure, Drift, and Volatility.

What Lives Inside engine.schema_fields

Each record in the dataset represents a point-in-time snapshot of a field within a dataset. The schema is rich enough to answer most questions about a field’s identity, lineage, and behavior. Key attributes include:

  • dataset and dataspace: which dataset you’re looking at and where it lives.
  • type: the field’s data type (string, number, etc.).
  • snapshotId and snapshotStartTime: unique identifiers and epoch-nanosecond timestamps that pin each observation to a specific moment.
  • distinctValueCount: how many unique values have been observed for the field, useful for spotting cardinality explosions.
  • labels and metadata: contextual information like severity level, priority class, pillar, and optional categorization labels, including class names and method names.
  • partitioningScheme, pathParts, and flatPath: structural details about how the field is organized and referenced.

Because every snapshot is timestamped and uniquely identified, you can reconstruct the full history of any field across any dataset. That’s the foundation for everything that follows.

Monitoring structure: Know what you have

Before you can detect changes, you need a baseline. The first use case for engine.schema_fields is simply auditing the current structure of your datasets, understanding which fields exist, what types they carry, and how they’re labeled.

A straightforward DataPrime query against source system/engine.schema_fields lets you group by dataset and inspect field-level details:

source system/engine.schema_fields

  // Focus on healthy, successful snapshots
  | filter $m.severity == INFO

  // Group by dataset and field type to see the structural makeup
  | groupby dataset, type
      aggregate

        // How many distinct fields exist in this dataset with this type?
        distinct_count(flatPath) as field_count,

        // What's the average cardinality across those fields?
        avg(distinctValueCount) as avg_cardinality

  // Surface the largest, most complex datasets first
  | sortby field_count desc

This gives you a per-dataset, per-type breakdown of how many string fields versus number fields, and how diverse the values are within each group. If you’re onboarding a new engineer or validating an integration, this kind of structural audit is invaluable.

But structure isn’t static, which brings us to the more interesting problem.

Tracking Drift: Catch What Changed

Schema drift is what happens when a dataset’s structure evolves without coordination. Maybe a microservice team added a new field to their log output. Maybe an upstream provider changed a field’s type from integer to string. Maybe a deprecated field reappeared after a rollback.

The snapshotId field is your primary tool here. Each distinct snapshot ID represents a version of the schema at a point in time. By counting distinct snapshot IDs per dataset, you can surface which datasets are experiencing the most structural churn. A query like:

source system/engine.schema_fields
| filter $m.severity == INFO
| groupby dataset
    aggregate distinct_count(snapshotId) as schema_change_count
| sortby schema_change_count desc

This immediately ranks your datasets by how frequently their schemas are changing. A dataset with a high schema_change_count isn’t necessarily broken, but it does deserve attention. Is the drift intentional? Is it coordinated? Or is something upstream producing inconsistent output?

You can also group by snapshotStartTime to see the cadence of changes. If a dataset that normally produces one or two schema snapshots a week suddenly generates ten in a day, that’s worth investigating. This time-series view of snapshot counts transforms a static audit into a living monitor.

Measuring Volatility: Quantify the Risk

Drift tells you that something changed. Volatility tells you how much you should worry about it.

A dataset with steady, predictable schema evolution, let’s say, one new field added per release cycle, is healthy. A dataset whose field count or type composition swings erratically is a liability. The distinctValueCount field helps here: tracking its changes over time for a given field can reveal cardinality creep, where a field that once held a handful of values starts producing thousands. That kind of silent expansion can degrade query performance and inflate storage costs before anyone raises a flag.

You can quantify this directly:

source system/engine.schema_fields

  // Group snapshots into daily buckets for a time-series view
  | groupby formatTimestamp(snapshotStartTime:timestamp, '%Y-%m-%d') as day
      aggregate

        // Total schema events - measures raw volume
        count() as total_events,

        // Distinct snapshots - measures actual structural versions
        distinct_count(snapshotId) as unique_snapshots

  // Walk through the timeline chronologically
  | sortby day asc

When total_events climbs but unique_snapshots stays flat, your schemas are stable, you’re just seeing repeated observations. When both spike together, something is actively changing and deserves a closer look.

Consider pairing queries like this with Coralogix alerting: trigger a notification when a dataset’s schema change count exceeds a threshold within a rolling window, or when a field’s distinct value count jumps by more than a configured percentage.

The variations are flexible. You might filter for ERROR-level severity events to isolate failed or invalid schema updates. You might add max($m.timestamp) to surface the most recent change per dataset, keeping your compliance dashboard current. Or you might combine count() with distinct_count(snapshotId) to measure total schema events alongside distinct snapshots, giving you both volume and uniqueness in a single view.

Putting It Into Practice

The real power of engine.schema_fields emerges when you move from ad-hoc queries to systematic monitoring. Here’s a practical approach:

  1. Baseline your schemas by auditing field types and values across your most critical datasets. Store or document the expected structure.
  2. Set up drift detection using snapshot-count queries on a scheduled cadence. Feed the results into a Coralogix custom dashboard. Try using dynamic widgets, which will automatically choose the right visualization for your query results, so you can get a single pane of glass for schema health without hand-picking chart types.
  3. Define volatility thresholds based on your organization’s tolerance. A dataset backing a compliance report has a very different threshold than an experimental feature flag store.
  4. Alert on anomalies, not just on the existence of change, but on the rate and magnitude of change. Schema governance isn’t about preventing evolution; it’s about ensuring evolution is visible, intentional, and reviewed.

Why This Matters

Schema issues are among the most insidious problems in observability. They don’t crash your system; they corrupt your insights. A silently changed field type can make a dashboard render zeros instead of errors. A disappeared label can cause an alert rule to match nothing.

The engine.schema_fields dataset gives you the raw material to catch these problems early or prevent them entirely. By treating schema metadata as a first-class observable signal, you shift from reactive firefighting to proactive governance. Your data pipeline becomes something you can trust, not just something you hope is working.

Start with a query. Build a dashboard. Set an alert. Your future self, the one who isn’t debugging a phantom data outage at 2 AM, will thank you.

On this page