Data Processing, Transformation, and Routing

When data enters Coralogix, it doesn’t just get ingested, it goes through a structured lifecycle that ensures it’s shaped, routed, and stored in the right place. Data is first received from shippers or agents, then transformed with DataPrime rules to clean up fields or create new values. Next, routing logic is applied based on attributes like region, team, or environment, and then the data is directed into the appropriate dataspace and dataset. If a dataset doesn’t already exist, it’s created automatically, and configuration details are inherited from the parent dataspace. This process ensures that incoming data is both flexible and consistent, ready to be queried, analyzed, and acted on.

Note

Only the System dataspace and its datasets are currently supported. User-defined dataspaces and datasets are in development for future release.

High-level flow

Ingress Data enters the platform through a shipper or agent. Customers can pre-define a targetDatabase (dataspace) and targetDataset via the shipper config.
Pre-processing Coralogix applies DataPrime transformation rules:

For example, fields can be removed or recalculated before finalizing the data structure:
```
remove derived_metric
| replace raw_value with normalized_value
| create derived_metric from quantity * 232
```

Routing A set of conditions (e.g., region, team, environment) determine where data goes:

For example, different regions or teams in the data can determine the target dataspace or dataset.

<region == 'us2'>       ->      [targetDataspace = bu1, targetDataset = logs-us]
<team == 'neptune'>     ->      [targetDataspace = planet, targetDataset = gassy]
<team == 'venus'>       ->      [targetDataspace = planet, targetDataset = rocky]

Routing is fully data-driven and can include dynamic elements:

<region>                 ->     [targetDataspace = bu2, targetDataset = logs-{$l.applicationname}]

Dataset creation If a dataset does not already exist, it will be created automatically under the target dataspace.
Configuration inheritance The dataset inherits configuration from its dataspace, including:
- Storage prefix (e.g., s3://bucket/my-dataspace/logs-regionX)
- Retention and archive rules
- Access control policies
- Metadata enrichment
Final storage & query Once routed and processed, the data is written to object storage and made available for querying.

Example dataset structure

default/
  └── logs
  └── spans

business-unit1/
  └── logs

business-unit2/
  └── logs-cx510
  └── logs-euprod2
  └── logs-production
  └── ...
  └── <datasets created dynamically as data arrives>

security/
  └── ...

Handling quota and duplication

Duplicating data across datasets (e.g., routing the same event to multiple targets) will count against your quota.
You can monitor dataset-level usage in Dataset Management.
Dataset quotas can be enforced per team, space, or workload.
The data usage page shows detailed breakdowns to help you understand where and how your data is being consumed.

Legacy vs new ingestion flow

Step	Legacy pipeline	Dataset pipeline
Routing	Hardcoded to `logs`/`spans`	Data-driven routing via any field
Labels	Fixed `$l.application`	Dynamic, user-defined `$l.*`
Storage	Static location	Configurable per dataset/dataspace
Pre-processing	None or limited	Full DataPrime transform support
Categorization	One shared schema	Schema isolation per dataset
Dataset lifecycle	Manual setup or hardcoded	Auto-created on demand

Benefits of this model

Cleaner data: Schema and context stay consistent within each dataset.
Dynamic flexibility: Route and organize data based on any field.
Performance gains: Query engines work faster when data is semantically aligned.
Policy control: Permissions and retention are isolated per dataset.
Zero-touch onboarding: No need to pre-declare datasets—they are created automatically.

Previous Datasets

Next The Default Dataspace