Skip to content

engine.schema_fields

Purpose

The engine.schema_fields dataset captures metadata and historical snapshots of dataset schemas within Coralogix's system dataspace. This dataset provides a detailed record of a dataset's structural evolution over time, allowing users to track changes to dataset fields, their types, and other critical attributes. It is designed to help teams understand how the structure of a dataset has evolved, monitor data schema changes, and ensure that datasets maintain consistency across different versions.

The schema metadata includes information such as the field types, partitioning schemes, and labels, along with additional contextual data, such as dataset names and snapshot timestamps. This dataset is crucial for debugging, auditing schema changes, and ensuring data governance across datasets.

Schema description

Full JSON pathField data typeField data exampledescription
dataprimePathString"$d.dataset"Path used for Dataprime processing.
datasetString"engine.schema_fields"The dataset name.
dataspaceString"system"Dataspace this dataset belongs to.
distinctValueCountNumber1Number of distinct values observed for this field.
examplesArray\["rumEventsPoc"]Example values seen in the data.
flatPathString"$d.dataset"Flattened path representation used for referencing contexts.
labelsObject{ "category":"io", "className":"com.acme.Handler", "methodName":"handle", "thread":"pool-1-7" }Optional labels associated with the field.
labels.categoryString"io"Label describing a category for the field.
labels.classNameString"com.acme.Handler"Fully qualified class name (if applicable).
labels.methodNameString"handle"Method name (if applicable).
labels.threadString"pool-1-7"Thread identifier/name (if applicable).
metadataObject{ "entityType":"log", "pillar":"observability", "priorityClass":"Medium", "severity":"3" }Additional metadata about the field.
metadata.entityTypeString"log"Logical entity type for this field.
metadata.pillarString"observability"Product/solution pillar associated with the field.
metadata.priorityClassString"Medium"Priority classification.
metadata.severityString"3"Severity level.
partitioningSchemeString"dt/hr"Partitioning scheme of the dataset.
pathPartsArray\["$d","dataset"]Path breakdown used for data reference resolution.
snapshotDurationString (ns)"3600000000000"Duration of the snapshot in nanoseconds (string-encoded).
snapshotIdString (UUID)"043c5e02-0924-45f0-83dd-2aa5659e323e"Unique identifier of the dataset snapshot.
snapshotStartTimeNumber (ns since epoch)1750611600000000000Start time of the snapshot in epoch nanoseconds.
typeString"string"Field’s logical/type classification.

How the data in this dataset can be used

Tracking schema changes over time

By querying the snapshotId and snapshotStartTime fields, users can track how a dataset's schema has evolved over time. For example, you could query for all snapshots of a specific dataset and see which fields were added or removed.

Example query:

source system/engine.schema_fields
| groupby snapshotStartTime
    aggregate count() as snapshots
| sortby snapshotStartTime asc

Auditing schema field types and values

The type and examples fields can be queried to audit field types and understand the possible values of specific fields. This is helpful for ensuring consistency in the dataset's structure and for preparing for schema migrations.

Example query:

source system/engine.schema_fields
| groupby dataset, fieldName
    aggregate
        distinct_count(type) as type_count,
        any_value(type) as sample_type,
        any_value(examples) as sample_example

Monitoring dataset changes for compliance

The metadata.priorityClass and metadata.severity fields allow users to monitor and categorize changes in datasets according to their priority and severity. This can help in compliance auditing, ensuring that critical datasets are tracked closely.

Example query:

source system/engine.schema_fields
| filter metadata.severity == 3
| groupby dataset
    aggregate distinct_count(snapshotId) as high_severity_changes
| sortby high_severity_changes desc

engine.schema_fields schema

{ engine.schema_fields
Schema field metadata and characteristics for a dataset in the 'system' dataspace.
dataspace

Indicates the dataspace this dataset belongs to. Example: "system".

dataset

The dataset name. Example: "engine.schema_fields".

flatPath

Flattened path representation used for referencing contexts. Example: "$d.dataset".

dataprimePath

Path used for Dataprime processing. Example: "$d.dataset".

[ pathParts
Path breakdown used for data reference resolution.
pathParts[0]

The first element of the path. Example: "$d".

pathParts[1]

The second element of the path. Example: "dataset".

]
snapshotId

Unique identifier of the dataset snapshot. Example: "043c5e02-0924-45f0-83dd-2aa5659e323e".

snapshotStartTime

Start time of the snapshot in epoch nanoseconds. Example: 1750611600000000000.

snapshotDuration

Duration of the snapshot in epoch nanoseconds. Example: "3600000000000".

partitioningScheme

Partitioning scheme of the dataset. Example: "dt/hr".

type

Type of the field. Example: "string".

[ examples

Example values of the dataset.

Example[0]

Example: ["rumEventsPoc"].

]
distinctValueCount

Number of distinct values. Example: 1.

labels

Optional labels associated with the field.

highCardinalityLabels

High-cardinality labels, if any.

highCardinalityLabels[0]
highCardinalityLabels[0]
{ metadata
Additional metadata.
entityType
pillar
priorityClass

Priority classification. Example: "Medium".

severity

Severity level. Example: "3".

}
}