Overview of Dataspaces and Datasets
Note
This feature is currently limited to early-access customers. Broader availability is planned for a future release.
Dataspaces are the top-level organizational units in Coralogix. They provide a scalable, policy-driven structure for routing, storing, securing, and querying observability data.
Rather than managing all data as a flat list of datasets, dataspaces allow you to group related data by environment, team, or workflow. This unlocks a more flexible and future-proof model for data governance, routing, and analysis.
Why use dataspaces and datasets
As your observability footprint grows, structuring your data becomes essential. Without a clear model, you'll face unmanageable queries, schema conflicts, inconsistent retention, and overexposed data. Dataspaces and datasets solve this by introducing a clean, two-tiered separation of concerns:
- Dataspaces define organizational boundaries—such as environments, business units, teams, or regions.
- Datasets define logical groupings of content—such as logs, traces, metrics, or enrichment data.
This model replaces the legacy logs
and spans
flow with a dynamic, configurable system that gives you full control over how data is routed, stored, queried, and secured.
Note
The dataspaces/datasets highlighted in red in the above image are currently in development, and are planned for future release.
You must define both a dataspace and a dataset to ingest and organize data in DataPrime. This unlocks key advantages:
Without dataspaces and datasets
- All data is routed into hardcoded pipelines (
logs
,spans
) with no semantic isolation. - Fixed labels like
$l.application
may not match your actual structure. - Changing storage configuration breaks backward compatibility.
- Schema collisions and noise are common due to unrelated data sharing the same destination.
With dataspaces and datasets
- Data is logically isolated into semantically consistent buckets.
- You define flexible labels like
$l.env
,$l.region
, or$l.team
. - Storage and routing are fully configurable per dataset or dataspace.
- Access control policies can be applied at any level.
- Performance improves as data becomes easier to segment and query.
How it works
Dataspaces act like databases
Each dataspace groups datasets under a single namespace and enforces shared configuration, routing logic, and access policies. This includes:
- Routing rules that decide what data goes into what dataset.
- Configuration templates (like base S3 paths).
- Lifecycle policies and retention settings.
For example, your organization might define dataspaces for frontend
, backend
, and security
. Each can contain its own logs, traces, metrics, or other entity types.
Each of these datasets inherits configuration from its parent dataspace, reducing manual setup and ensuring consistency.
Datasets act like tables
Datasets are the logical containers for event data inside each dataspace. They can be created automatically (based on routing patterns) or manually (for write-to workflows). You can think of them like “tables” in an SQL schema.
Datasets support:
- Fine-grained query scopes.
- Individual retention and access policies.
- Modular, reusable outputs (e.g.,
writeTo
results). - Access control rules for teams or roles.
Because datasets are just identifiers, they can take any name, including dot notation like engine.queries
. This does not imply a hierarchy—engine.queries
and engine.schema_fields
are separate, unrelated datasets, for example.
Querying across dataspaces and datasets
You can query any dataset with DataPrime using the source command:
Examples:
If no dataspace is provided, the default
dataspace is assumed:
This means that if you're only using the default dataspace, your existing queries will continue to work.
Default and system dataspaces
- Default dataspace: Every account has one. It includes all the standard sources like
logs
,spans
, andenrichments
. - System dataspace: This is reserved for Coralogix-generated data such as alerts history, audit logs, and notification deliveries.
- User-defined dataspaces: (Coming soon) Organizations can create custom dataspaces to segment data by team, product, environment, or any meaningful boundary.