Datasets

What is a dataset?

Datasets are the fundamental building blocks of data organization within a dataspace. They allow you to logically segment your logs, spans, and other entity types, thereby improving query performance, access control, and storage clarity. For instance datasets can be created to segment along team lines, like frontend, backend, or security.

Note

Only system datasets are currently supported. User-defined dataspaces are in development for future release.

A dataset is a scoped collection of related data within a dataspace. Each dataset contains a specific stream of observability data (e.g., logs, traces, alerts) that inherits configuration from its parent dataspace.

Datasets are created:

Automatically — via routing logic
Manually — through the UI or writeTo queries
Dynamically — based on values like $d.region, $l.applicationname, etc.

Each dataset lives in a dataspace and is queried using:

source <dataspace>/<dataset> | ...

Note

Datasets currently work only with archived data.

Why use datasets?

Datasets are designed to solve long-standing limitations in the traditional logs and spans model. They provide a way to logically isolate data by structure, purpose, and access level—improving control, schema hygiene, and performance.

Limitations of traditional `logs`/`spans`:

Hardcoded pipelines: All data flows into fixed logs/spans buckets with no user control over routing.
Schema pollution: Different sources (e.g., alerts, RUM, enrichments) get dumped into the same dataset.
Fixed $l labels: System-defined labels like application or subsystem create naming conflicts and ambiguous semantics.
Immutable archive paths: Changing storage configuration renders old data inaccessible unless manually copied.
Cross-contaminated data: Mixed data types (e.g., metrics vs. alerts vs. enrichments) degrade schema clarity and performance.

Benefits of datasets:

No pollution: Each dataset tracks its own schema, reducing collisions and ambiguity.
Dynamic labels ($l): Labeling is now fully user-defined—e.g., $l.env, $l.cluster, $l.region.
Flexible storage: Datasets can route to different buckets or prefixes, and their location can change without needing to copy data.
Improved performance: Queries run faster because datasets only contain semantically related data.
Reusable outputs: You can write query results into new datasets—supporting summary tables, derived views, or archival logic.
Future-proof: Logs and spans will eventually migrate to dynamic datasets with full support for $l-based routing and labeling.

In short: Datasets offer structure, separation, and flexibility.

Key capabilities of datasets

Capability	Description
Dynamic creation	Datasets are created on-the-fly based on routing rules or labels like `$l.applicationname`. No manual setup required.
Scoped performance	Segmented datasets reduce schema collisions and improve query speed by narrowing the search space.
Granular control	Apply retention, access, routing, and enrichment policies at the datasets level.
Reusability	You can write query results into datasets and retrieve them later for dashboards, joins, or long-term analytics.
Clarity and structure	Datasets make data easier to organize and reason about — by team, service, environment, or data type.

Example: writing to and reading from a dataset

Note

Duplicated data created by queries will count towards your quota.

// Write query results to a dataset
source logs
| filter status_code >= 500
| writeTo default/high_errors

// Reuse it later
source default/high_errors
| groupby path agg count()

This workflow is especially helpful for recurring reports, dashboards, and trend analyses.

Query syntax

source <dataspace>/<dataset>

If you're in the default dataspace, you can omit the prefix:

source logs

These are equivalent:

source logs
source default/logs

And when used alone:

| filter status_code >= 500

is implicitly querying default/logs.

System datasets

Coralogix includes several system datasets in the system dataspace. These are read-only and auto-generated.
Dataset Description
system/aaa.audit_events Stores audit logs for compliance and access monitoring.
system/alerts.history Records alert evaluation and trigger metadata.
system/cases Models each case from creation and acknowledgement through resolution.
system/engine.queries Historical record of user queries for introspection and optimization.
system/engine.schema_fields Tracks field-level schema evolution over time.
system/labs.limit_violations Records each time a configured limit is exceeded.
system/notification.deliveries Captures the lifecycle of outbound alert notifications.
system/notification.requests Captures each incoming notification request metadata.

These datasets power features like schema visualization, alert performance tracking, and auditing. See System dataset for more information.

Dataset schemas

Each dataset has an associated schema, influenced by its pillar (logs, spans, etc.) and entity type (e.g., alerts, browserLogs, cpuProfiles).
Pillar Entity type Example schema
logs alerts { alert_name, severity, status, triggered_at }
logs browserLogs { user_agent, page_url, timestamp }
logs text { text: "..." }
spans spans OpenTelemetry-formatted span objects
metrics metrics { __name__, value, labels... }
binary sessionRecordings Metadata + link to binary
binary files File metadata (e.g., name, size, uploaded_by)

Schema docs for common datasets:

Managing datasets

With Dataset Management can manage your datasets from the UI by navigating to:

Data Flow > Dataset Management

Here, you can:

View all active datasets
Enable/disable system datasets
Apply configuration rules
View schema definitions
Inspect sample documents

Enabling and disabling datasets

Datasets, especially system datasets, must be manually enabled. Once enabled:

All users can query them
They count toward your daily quota
Previously generated data remains accessible, even if later disabled

Disabling a dataset stops its ingestion — not its storage.

Learn more

Previous Dataspaces

Next Data Processing, Transformation, and Routing

Dataset	Description
`system/aaa.audit_events`	Stores audit logs for compliance and access monitoring.
`system/alerts.history`	Records alert evaluation and trigger metadata.
`system/cases`	Models each case from creation and acknowledgement through resolution.
`system/engine.queries`	Historical record of user queries for introspection and optimization.
`system/engine.schema_fields`	Tracks field-level schema evolution over time.
`system/labs.limit_violations`	Records each time a configured limit is exceeded.
`system/notification.deliveries`	Captures the lifecycle of outbound alert notifications.
`system/notification.requests`	Captures each incoming notification request metadata.

Pillar	Entity type	Example schema
logs	`alerts`	`{ alert_name, severity, status, triggered_at }`
logs	`browserLogs`	`{ user_agent, page_url, timestamp }`
logs	`text`	`{ text: "..." }`
spans	`spans`	OpenTelemetry-formatted span objects
metrics	`metrics`	`{ __name__, value, labels... }`
binary	`sessionRecordings`	Metadata + link to binary
binary	`files`	File metadata (e.g., name, size, uploaded_by)

Datasets

What is a dataset?

Why use datasets?

Limitations of traditional logs/spans:

Benefits of datasets:

Key capabilities of datasets

Example: writing to and reading from a dataset

Query syntax

System datasets

Dataset schemas

Managing datasets

Enabling and disabling datasets

Learn more

Limitations of traditional `logs`/`spans`: