# Dataspaces and datasets

Dataspaces and datasets provide a two-tiered model for organizing, routing, and securing observability data in Coralogix.

- **Dataspaces** define organizational boundaries — such as environments, business units, teams, or regions.
- **Datasets** define logical groupings of content — such as logs, traces, metrics, or enrichment data.

Ready to get started?

Create a [user-defined dataset](https://coralogix.com/docs/user-guides/data-layer/default-dataspace/user-defined-datasets/index.md), view and manage [system datasets](https://coralogix.com/docs/user-guides/data-layer/dataset-management/dataset-management/index.md), or query existing datasets in [Explore](https://coralogix.com/docs/user-guides/data_exploration/datasets/index.md).

______________________________________________________________________

## Dataspaces

A dataspace is a logical container for one or more datasets. It acts as a control layer for:

- Routing logic
- Storage structure
- Retention policies
- Access control
- Schema enforcement

Think of dataspaces like databases. Each dataspace groups datasets under a single namespace and enforces shared configuration. For example, your organization might define dataspaces for `frontend`, `backend`, and `security`:

```text
frontend/
  └── ui.events
  └── user.interactions

backend/
  └── service.requests
  └── system.traces
```

### Configuration inheritance

When a dataset is created inside a dataspace, it automatically inherits the dataspace's configuration:

- S3 storage paths
- Retention rules
- Access policies
- Metadata enrichment

For example, if a dataspace defines the S3 path `s3://my-bucket/my_prefix`, new datasets inside that dataspace automatically write to:

```text
s3://my-bucket/my_prefix/dataset1
s3://my-bucket/my_prefix/dataset2
```

This inheritance is dynamic — no manual setup is needed when new datasets appear.

### Types of dataspaces

| Type                                                                                                                   | Description                                                                                                   |
| ---------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------- |
| [**default**](https://coralogix.com/docs/user-guides/data-layer/default-dataspace/index.md)                            | The main user-facing dataspace. Contains datasets like `logs`, `spans`, and any user-defined datasets.        |
| [**system**](https://coralogix.com/docs/user-guides/data-layer/system_dataspace/index.md)                              | A Coralogix-managed dataspace for internal datasets such as alert history, audit events, and schema metadata. |
| [**user-defined**](https://coralogix.com/docs/user-guides/data-layer/default-dataspace/user-defined-datasets/index.md) | Custom dataspaces created by users to segment data by team, region, environment, or use case. Coming soon.    |

______________________________________________________________________

## Datasets

A dataset is a scoped collection of related data within a dataspace. Think of datasets like tables in a database. Each dataset contains a specific stream of observability data (e.g., logs, traces, alerts) that inherits configuration from its parent dataspace.

Datasets are created:

- **Automatically** — via routing logic
- **Manually** — through the UI or `writeTo` queries
- **Dynamically** — based on values like `$d.region`, `$l.applicationname`, etc.

Note

Datasets currently work only with archived data.

Because datasets are just identifiers, they can take any name, including dot notation like `engine.queries`. This does **not** imply a hierarchy — `engine.queries` and `engine.schema_fields` are separate, unrelated datasets.

### Streaming vs summary datasets

User-defined datasets in the `default/` dataspace come in two flavors based on how they're populated:

- **Streaming datasets** — populated in real time by [TCO Optimizer](https://coralogix.com/docs/user-guides/account-management/tco-optimizer/index.md) policies routing ingested data.
- **Summary datasets** — populated by [Background queries v2](https://coralogix.com/docs/user-guides/dataengine/background_queries_v2/index.md) writing query results for downstream reuse.

For the full distinction, entity-type rules, and when to use each, see [Streaming vs summary datasets](https://coralogix.com/docs/user-guides/data-layer/default-dataspace/user-defined-datasets/#streaming-vs-summary-datasets).

### Key capabilities

| Capability             | Description                                                                                                           |
| ---------------------- | --------------------------------------------------------------------------------------------------------------------- |
| **Dynamic creation**   | Datasets are created on-the-fly based on routing rules or labels like `$l.applicationname`. No manual setup required. |
| **Scoped performance** | Segmented datasets reduce schema collisions and improve query speed by narrowing the search space.                    |
| **Granular control**   | Apply retention, access, routing, and enrichment policies at the dataset level.                                       |
| **Reusability**        | Write query results into datasets and retrieve them later for dashboards, joins, or long-term analytics.              |

### Writing to and reading from a dataset

```dataprime
// Write query results to a dataset
source logs
| filter status_code >= 500
| writeTo default/high_errors
```

```dataprime
// Reuse it later
source default/high_errors
| groupby path agg count()
```

Note

Duplicated data created by queries will count towards your quota.

### Dataset schemas

Each dataset has an associated schema, influenced by its **pillar** (logs, spans, etc.) and **entity type** (e.g., `alerts`, `browserLogs`, `cpuProfiles`).

| Pillar  | Entity type         | Example schema                                   |
| ------- | ------------------- | ------------------------------------------------ |
| logs    | `alerts`            | `{ alert_name, severity, status, triggered_at }` |
| logs    | `browserLogs`       | `{ user_agent, page_url, timestamp }`            |
| logs    | `text`              | `{ text: "..." }`                                |
| spans   | `spans`             | OpenTelemetry-formatted span objects             |
| metrics | `metrics`           | `{ __name__, value, labels... }`                 |
| binary  | `sessionRecordings` | Metadata + link to binary                        |
| binary  | `files`             | File metadata (e.g., name, size, uploaded_by)    |

Schema docs for common datasets:

- [Alert events](https://coralogix.com/docs/user-guides/alerting/configure-notifications/source-type-schema/index.md)
- [Audit events](https://coralogix.com/docs/user-guides/troubleshooting/coralogix-audit/index.md)
- [Notification deliveries](https://coralogix.com/docs/user-guides/data-layer/system_dataspace/notification_deliveries/index.md)

### Enabling and disabling datasets

Datasets, especially system datasets, must be manually enabled. Once enabled:

- All users can query them
- They count toward your daily quota
- Previously generated data remains accessible, even if later disabled

Disabling a dataset stops its ingestion — not its storage.

### Managing datasets

Manage your datasets from the UI by navigating to **Data Flow**, then **[Dataset Management](https://coralogix.com/docs/user-guides/data-layer/dataset-management/dataset-management/index.md)**. Here, you can view all active datasets, enable/disable system datasets, apply configuration rules, view schema definitions, and inspect sample documents.

______________________________________________________________________

## Query syntax

Query any dataset with DataPrime using the [source](https://coralogix.com/docs/dataprime/language-reference/commands-reference/sources/source/index.md) command:

```dataprime
source <dataspace>/<dataset>
```

Examples:

```dataprime
source default/logs
source system/engine.queries
source frontend/spans
```

If no dataspace is provided, the `default` dataspace is assumed:

```dataprime
source logs  // equivalent to source default/logs
```

If you're only using the [default dataspace](https://coralogix.com/docs/user-guides/data-layer/default-dataspace/index.md), your existing queries will continue to work.

______________________________________________________________________

## System datasets

Coralogix includes several read-only, auto-generated datasets in the `system` dataspace:

| Dataset                                                                                                                               | Description                                                                                                                          |
| ------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------ |
| [system/aaa.audit_events](https://coralogix.com/docs/user-guides/data-layer/system_dataspace/aaa_audit-events/index.md)               | Stores audit logs for compliance and access monitoring.                                                                              |
| [system/alerts.history](https://coralogix.com/docs/user-guides/data-layer/system_dataspace/alerts-history/index.md)                   | Records alert evaluation and trigger metadata.                                                                                       |
| [system/engine.queries](https://coralogix.com/docs/user-guides/data-layer/system_dataspace/engine_queries/index.md)                   | Historical record of user queries for introspection and optimization.                                                                |
| [system/engine.schema_fields](https://coralogix.com/docs/user-guides/data-layer/system_dataspace/engine-schema_fields/index.md)       | Tracks field-level schema evolution over time.                                                                                       |
| [system/labs.limit_violations](https://coralogix.com/docs/user-guides/data-layer/system_dataspace/labs_limit-violations/index.md)     | Records each time a configured limit is exceeded.                                                                                    |
| [system/notification.deliveries](https://coralogix.com/docs/user-guides/data-layer/system_dataspace/notification_deliveries/index.md) | Logs Notification Center delivery events. Alerts record delivery failures, while Cases record both successful and failed deliveries. |
| [system/notification.requests](https://coralogix.com/docs/user-guides/data-layer/system_dataspace/notification_requests/index.md)     | Captures each incoming notification request metadata.                                                                                |

See [System dataspace](https://coralogix.com/docs/user-guides/data-layer/system_dataspace/index.md) for more information.
