Data Processing, Transformation, and Routing
When data enters Coralogix, it doesn’t just get ingested, it goes through a structured lifecycle that ensures it’s shaped, routed, and stored in the right place. Data is first received from shippers or agents, then transformed with DataPrime rules to clean up fields or create new values. Next, routing logic is applied based on attributes like region, team, or environment, and then the data is directed into the appropriate dataspace and dataset. If a dataset doesn’t already exist, it’s created automatically, and configuration details are inherited from the parent dataspace. This process ensures that incoming data is both flexible and consistent, ready to be queried, analyzed, and acted on.
High-level flow
Ingress Data enters the platform through a shipper or agent. Customers can pre-define a
targetDatabase
(dataspace) andtargetDataset
via the shipper config.Pre-processing Coralogix applies DataPrime transformation rules:
For example, fields can be removed or recalculated before finalizing the data structure:
Routing A set of conditions (e.g., region, team, environment) determine where data goes:
For example, different regions or teams in the data can determine the target dataspace or dataset.
<region == 'us2'> -> [targetDataspace = bu1, targetDataset = logs-us] <team == 'neptune'> -> [targetDataspace = planet, targetDataset = gassy] <team == 'venus'> -> [targetDataspace = planet, targetDataset = rocky]
Routing is fully data-driven and can include dynamic elements:
Dataset creation If a dataset does not already exist, it will be created automatically under the target dataspace.
Configuration inheritance The dataset inherits configuration from its dataspace, including:
- Storage prefix (e.g.,
s3://bucket/my-dataspace/logs-regionX
) - Retention and archive rules
- Access control policies
- Metadata enrichment
- Storage prefix (e.g.,
Final storage & query Once routed and processed, the data is written to object storage and made available for querying.
Example dataset structure
default/
└── logs
└── spans
business-unit1/
└── logs
business-unit2/
└── logs-cx510
└── logs-euprod2
└── logs-production
└── ...
└── <datasets created dynamically as data arrives>
security/
└── ...
Handling quota and duplication
- Duplicating data across datasets (e.g., routing the same event to multiple targets) will count against your quota.
- You can monitor dataset-level usage in Dataset Management.
- Dataset quotas can be enforced per team, space, or workload.
- The data usage page shows detailed breakdowns to help you understand where and how your data is being consumed.
Legacy vs new ingestion flow
Step | Legacy pipeline | Dataset pipeline |
---|---|---|
Routing | Hardcoded to logs /spans | Data-driven routing via any field |
Labels | Fixed $l.application | Dynamic, user-defined $l.* |
Storage | Static location | Configurable per dataset/dataspace |
Pre-processing | None or limited | Full DataPrime transform support |
Categorization | One shared schema | Schema isolation per dataset |
Dataset lifecycle | Manual setup or hardcoded | Auto-created on demand |
Benefits of this model
- Cleaner data: Schema and context stay consistent within each dataset.
- Dynamic flexibility: Route and organize data based on any field.
- Performance gains: Query engines work faster when data is semantically aligned.
- Policy control: Permissions and retention are isolated per dataset.
- Zero-touch onboarding: No need to pre-declare datasets—they are created automatically.