Datasets
What is a dataset?
Datasets are the fundamental building blocks of data organization within a dataspace. They allow you to logically segment your logs, spans, and other entity types, thereby improving query performance, access control, and storage clarity. For instance datasets can be created to segment along team lines, like frontend
, backend
, or security
.
A dataset is a scoped collection of related data within a dataspace. Each dataset contains a specific stream of observability data (e.g., logs, traces, alerts) that inherits configuration from its parent dataspace.
Datasets are created:
- Automatically — via routing logic
- Manually — through the UI or
writeto
queries - Dynamically — based on values like
$d.region
,$l.applicationname
, etc.
Each dataset lives in a dataspace and is queried using:
Key capabilities of datasets
Capability | Description |
---|---|
Dynamic creation | Datasets are created on-the-fly based on routing rules or labels like $l.applicationname . No manual setup required. |
Scoped performance | Segmented datasets reduce schema collisions and improve query speed by narrowing the search space. |
Granular control | Apply retention, access, routing, and enrichment policies at the datasets level. |
Reusability | You can write query results into datasets and retrieve them later for dashboards, joins, or long-term analytics. |
Clarity and structure | Datasets make data easier to organize and reason about — by team, service, environment, or data type. |
Query syntax
If you're in the default
dataspace, you can omit the prefix:
These are equivalent:
And when used alone:
is implicitly querying default/logs
.
System datasets
Coralogix includes several system datasets in the system
dataspace. These are read-only and auto-generated.
Dataset | Description |
---|---|
system/alerts.history | Records alert evaluation and trigger metadata. |
system/engine.queries | Historical record of user queries for introspection and optimization. |
system/engine.schema_fields | Tracks field-level schema evolution over time. |
system/notification.deliveries | Captures the lifecycle of outbound alert notifications. |
system/notification.requests | Captures each incoming notification request metadata. |
These datasets power features like schema visualization, alert performance tracking, and auditing. See System dataset for more information.
Dataset schemas
Each dataset has an associated schema, influenced by its pillar (logs, spans, etc.) and entity type (e.g., alerts
, browserLogs
, cpuProfiles
).
Pillar | Entity type | Example schema |
---|---|---|
logs | alerts | { alert_name, severity, status, triggered_at } |
logs | browserLogs | { user_agent, page_url, timestamp } |
logs | text | { text: "..." } |
spans | spans | OpenTelemetry-formatted span objects |
metrics | metrics | { __name__, value, labels... } |
binary | sessionRecordings | Metadata + link to binary |
binary | files | File metadata (e.g., name, size, uploaded_by) |
Schema docs for common datasets:
Managing datasets
With Dataset Management can manage your datasets from the UI by navigating to:
Data Flow > Dataset Management
Here, you can:
- View all active datasets
- Enable/disable
system
datasets - Apply configuration rules
- View schema definitions
- Inspect sample documents
Enabling and disabling datasets
Datasets, especially system datasets, must be manually enabled. Once enabled:
- All users can query them
- They count toward your daily quota
- Previously generated data remains accessible, even if later disabled
Disabling a dataset stops its ingestion — not its storage.