engine.schema_fields
Purpose
The engine.schema_fields
dataset captures metadata and historical snapshots of dataset schemas within Coralogix's system
dataspace. This dataset provides a detailed record of a dataset's structural evolution over time, allowing users to track changes to dataset fields, their types, and other critical attributes. It is designed to help teams understand how the structure of a dataset has evolved, monitor data schema changes, and ensure that datasets maintain consistency across different versions.
The schema metadata includes information such as the field types, partitioning schemes, and labels, along with additional contextual data, such as dataset names and snapshot timestamps. This dataset is crucial for debugging, auditing schema changes, and ensuring data governance across datasets.
Schema description
Full JSON path | Field data type | Field data example | description |
---|---|---|---|
dataprimePath | String | "$d.dataset" | Path used for Dataprime processing. |
dataset | String | "engine.schema_fields" | The dataset name. |
dataspace | String | "system" | Dataspace this dataset belongs to. |
distinctValueCount | Number | 1 | Number of distinct values observed for this field. |
examples | Array\ | ["rumEventsPoc"] | Example values seen in the data. |
flatPath | String | "$d.dataset" | Flattened path representation used for referencing contexts. |
labels | Object | { "category":"io", "className":"com.acme.Handler", "methodName":"handle", "thread":"pool-1-7" } | Optional labels associated with the field. |
labels.category | String | "io" | Label describing a category for the field. |
labels.className | String | "com.acme.Handler" | Fully qualified class name (if applicable). |
labels.methodName | String | "handle" | Method name (if applicable). |
labels.thread | String | "pool-1-7" | Thread identifier/name (if applicable). |
metadata | Object | { "entityType":"log", "pillar":"observability", "priorityClass":"Medium", "severity":"3" } | Additional metadata about the field. |
metadata.entityType | String | "log" | Logical entity type for this field. |
metadata.pillar | String | "observability" | Product/solution pillar associated with the field. |
metadata.priorityClass | String | "Medium" | Priority classification. |
metadata.severity | String | "3" | Severity level. |
partitioningScheme | String | "dt/hr" | Partitioning scheme of the dataset. |
pathParts | Array\ | ["$d","dataset"] | Path breakdown used for data reference resolution. |
snapshotDuration | String (ns) | "3600000000000" | Duration of the snapshot in nanoseconds (string-encoded). |
snapshotId | String (UUID) | "043c5e02-0924-45f0-83dd-2aa5659e323e" | Unique identifier of the dataset snapshot. |
snapshotStartTime | Number (ns since epoch) | 1750611600000000000 | Start time of the snapshot in epoch nanoseconds. |
type | String | "string" | Field’s logical/type classification. |
How the data in this dataset can be used
Tracking schema changes over time
By querying the snapshotId
and snapshotStartTime
fields, users can track how a dataset's schema has evolved over time. For example, you could query for all snapshots of a specific dataset and see which fields were added or removed.
Example query:
source system/engine.schema_fields
| groupby snapshotStartTime
aggregate count() as snapshots
| sortby snapshotStartTime asc
Auditing schema field types and values
The type
and examples
fields can be queried to audit field types and understand the possible values of specific fields. This is helpful for ensuring consistency in the dataset's structure and for preparing for schema migrations.
Example query:
source system/engine.schema_fields
| groupby dataset, fieldName
aggregate
distinct_count(type) as type_count,
any_value(type) as sample_type,
any_value(examples) as sample_example
Monitoring dataset changes for compliance
The metadata.priorityClass
and metadata.severity
fields allow users to monitor and categorize changes in datasets according to their priority and severity. This can help in compliance auditing, ensuring that critical datasets are tracked closely.
Example query:
source system/engine.schema_fields
| filter metadata.severity == 3
| groupby dataset
aggregate distinct_count(snapshotId) as high_severity_changes
| sortby high_severity_changes desc
engine.schema_fields
schema
Indicates the dataspace this dataset belongs to. Example: "system".
The dataset name. Example: "engine.schema_fields".
Flattened path representation used for referencing contexts. Example: "$d.dataset".
Path used for Dataprime processing. Example: "$d.dataset".
The first element of the path. Example: "$d".
The second element of the path. Example: "dataset".
Unique identifier of the dataset snapshot. Example: "043c5e02-0924-45f0-83dd-2aa5659e323e".
Start time of the snapshot in epoch nanoseconds. Example: 1750611600000000000.
Duration of the snapshot in epoch nanoseconds. Example: "3600000000000".
Partitioning scheme of the dataset. Example: "dt/hr".
Type of the field. Example: "string".
Example values of the dataset.
Example: ["rumEventsPoc"].
Number of distinct values. Example: 1.
Optional labels associated with the field.
High-cardinality labels, if any.
Priority classification. Example: "Medium".
Severity level. Example: "3".