Overview of Dataspaces and Datasets

Note

This feature is currently limited to early-access customers. Broader availability is planned for a future release.

Dataspaces are the top-level organizational units in Coralogix. They provide a scalable, policy-driven structure for routing, storing, securing, and querying observability data.

Rather than managing all data as a flat list of datasets, dataspaces allow you to group related data by environment, team, or workflow. This unlocks a more flexible and future-proof model for data governance, routing, and analysis.

Why use dataspaces and datasets

As your observability footprint grows, structuring your data becomes essential. Without a clear model, you'll face unmanageable queries, schema conflicts, inconsistent retention, and overexposed data. Dataspaces and datasets solve this by introducing a clean, two-tiered separation of concerns:

Dataspaces define organizational boundaries—such as environments, business units, teams, or regions.
Datasets define logical groupings of content—such as logs, traces, metrics, or enrichment data.

This model replaces the legacy logs and spans flow with a dynamic, configurable system that gives you full control over how data is routed, stored, queried, and secured.

Note

The dataspaces/datasets highlighted in red in the above image are currently in development, and are planned for future release.

You must define both a dataspace and a dataset to ingest and organize data in DataPrime. This unlocks key advantages:

Without dataspaces and datasets

All data is routed into hardcoded pipelines (logs, spans) with no semantic isolation.
Fixed labels like $l.application may not match your actual structure.
Changing storage configuration breaks backward compatibility.
Schema collisions and noise are common due to unrelated data sharing the same destination.

With dataspaces and datasets

Data is logically isolated into semantically consistent buckets.
You define flexible labels like $l.env, $l.region, or $l.team.
Storage and routing are fully configurable per dataset or dataspace.
Access control policies can be applied at any level.
Performance improves as data becomes easier to segment and query.

How it works

Dataspaces act like databases

Each dataspace groups datasets under a single namespace and enforces shared configuration, routing logic, and access policies. This includes:

Routing rules that decide what data goes into what dataset.
Configuration templates (like base S3 paths).
Lifecycle policies and retention settings.

For example, your organization might define dataspaces for frontend, backend, and security. Each can contain its own logs, traces, metrics, or other entity types.

frontend/
  └── ui.events
  └── user.interactions

backend/
  └── service.requests
  └── system.traces

Each of these datasets inherits configuration from its parent dataspace, reducing manual setup and ensuring consistency.

Datasets act like tables

Datasets are the logical containers for event data inside each dataspace. They can be created automatically (based on routing patterns) or manually (for write-to workflows). You can think of them like “tables” in an SQL schema.

Datasets support:

Fine-grained query scopes.
Individual retention and access policies.
Modular, reusable outputs (e.g., writeTo results).
Access control rules for teams or roles.

Because datasets are just identifiers, they can take any name, including dot notation like engine.queries. This does not imply a hierarchy—engine.queries and engine.schema_fields are separate, unrelated datasets, for example.

Querying across dataspaces and datasets

You can query any dataset with DataPrime using the source command:

source <dataspace>/<dataset>

Examples:

source default/logs
source system/engine.queries
source frontend/spans

If no dataspace is provided, the default dataspace is assumed:

source logs  // equivalent to source default/logs

This means that if you're only using the default dataspace, your existing queries will continue to work.

Default and system dataspaces

Default dataspace: Every account has one. It includes all the standard sources like logs, spans, and enrichments.
System dataspace: This is reserved for Coralogix-generated data such as alerts history, audit logs, and notification deliveries.
User-defined dataspaces: (Coming soon) Organizations can create custom dataspaces to segment data by team, product, environment, or any meaningful boundary.

Previous OBI Distributed Tracing

Next Dataspaces