Skip to content

Datasets

What is a dataset?

Datasets are the fundamental building blocks of data organization within a dataspace. They allow you to logically segment your logs, spans, and other entity types, thereby improving query performance, access control, and storage clarity. For instance datasets can be created to segment along team lines, like frontend, backend, or security.

A dataset is a scoped collection of related data within a dataspace. Each dataset contains a specific stream of observability data (e.g., logs, traces, alerts) that inherits configuration from its parent dataspace.

Datasets are created:

  • Automatically — via routing logic
  • Manually — through the UI or writeto queries
  • Dynamically — based on values like $d.region, $l.applicationname, etc.

Each dataset lives in a dataspace and is queried using:

source <dataspace>/<dataset> | ...

Key capabilities of datasets

CapabilityDescription
Dynamic creationDatasets are created on-the-fly based on routing rules or labels like $l.applicationname. No manual setup required.
Scoped performanceSegmented datasets reduce schema collisions and improve query speed by narrowing the search space.
Granular controlApply retention, access, routing, and enrichment policies at the datasets level.
ReusabilityYou can write query results into datasets and retrieve them later for dashboards, joins, or long-term analytics.
Clarity and structureDatasets make data easier to organize and reason about — by team, service, environment, or data type.

Query syntax

source <dataspace>/<dataset>

If you're in the default dataspace, you can omit the prefix:

source logs

These are equivalent:

source logs
source default/logs

And when used alone:

| filter status_code >= 500

is implicitly querying default/logs.


System datasets

Coralogix includes several system datasets in the system dataspace. These are read-only and auto-generated.
DatasetDescription
system/alerts.historyRecords alert evaluation and trigger metadata.
system/engine.queriesHistorical record of user queries for introspection and optimization.
system/engine.schema_fieldsTracks field-level schema evolution over time.
system/notification.deliveriesCaptures the lifecycle of outbound alert notifications.
system/notification.requestsCaptures each incoming notification request metadata.

These datasets power features like schema visualization, alert performance tracking, and auditing. See System dataset for more information.


Dataset schemas

Each dataset has an associated schema, influenced by its pillar (logs, spans, etc.) and entity type (e.g., alerts, browserLogs, cpuProfiles).
PillarEntity typeExample schema
logsalerts{ alert_name, severity, status, triggered_at }
logsbrowserLogs{ user_agent, page_url, timestamp }
logstext{ text: "..." }
spansspansOpenTelemetry-formatted span objects
metricsmetrics{ __name__, value, labels... }
binarysessionRecordingsMetadata + link to binary
binaryfilesFile metadata (e.g., name, size, uploaded_by)

Schema docs for common datasets:


Managing datasets

With Dataset Management can manage your datasets from the UI by navigating to:

Data Flow > Dataset Management

Here, you can:

  • View all active datasets
  • Enable/disable system datasets
  • Apply configuration rules
  • View schema definitions
  • Inspect sample documents

manage datasets


Enabling and disabling datasets

Datasets, especially system datasets, must be manually enabled. Once enabled:

  • All users can query them
  • They count toward your daily quota
  • Previously generated data remains accessible, even if later disabled

Disabling a dataset stops its ingestion — not its storage.


Learn more