Skip to content

Data Processing, Transformation, and Routing

When data enters Coralogix, it doesn’t just get ingested, it goes through a structured lifecycle that ensures it’s shaped, routed, and stored in the right place. Data is first received from shippers or agents, then transformed with DataPrime rules to clean up fields or create new values. Next, routing logic is applied based on attributes like region, team, or environment, and then the data is directed into the appropriate dataspace and dataset. If a dataset doesn’t already exist, it’s created automatically, and configuration details are inherited from the parent dataspace. This process ensures that incoming data is both flexible and consistent, ready to be queried, analyzed, and acted on.

High-level flow

  1. Ingress Data enters the platform through a shipper or agent. Customers can pre-define a targetDatabase (dataspace) and targetDataset via the shipper config.

  2. Pre-processing Coralogix applies DataPrime transformation rules:

    For example, fields can be removed or recalculated before finalizing the data structure:

    remove derived_metric
    | replace raw_value with normalized_value
    | create derived_metric from quantity * 232
    
  3. Routing A set of conditions (e.g., region, team, environment) determine where data goes:

    For example, different regions or teams in the data can determine the target dataspace or dataset.

    <region == 'us2'>       ->      [targetDataspace = bu1, targetDataset = logs-us]
    <team == 'neptune'>     ->      [targetDataspace = planet, targetDataset = gassy]
    <team == 'venus'>       ->      [targetDataspace = planet, targetDataset = rocky]
    

    Routing is fully data-driven and can include dynamic elements:

    <region>                 ->     [targetDataspace = bu2, targetDataset = logs-{$l.applicationname}]
    
  4. Dataset creation If a dataset does not already exist, it will be created automatically under the target dataspace.

  5. Configuration inheritance The dataset inherits configuration from its dataspace, including:

    • Storage prefix (e.g., s3://bucket/my-dataspace/logs-regionX)
    • Retention and archive rules
    • Access control policies
    • Metadata enrichment
  6. Final storage & query Once routed and processed, the data is written to object storage and made available for querying.


Example dataset structure

default/
  └── logs
  └── spans

business-unit1/
  └── logs

business-unit2/
  └── logs-cx510
  └── logs-euprod2
  └── logs-production
  └── ...
  └── <datasets created dynamically as data arrives>

security/
  └── ...

Handling quota and duplication

  • Duplicating data across datasets (e.g., routing the same event to multiple targets) will count against your quota.
  • You can monitor dataset-level usage in Dataset Management.
  • Dataset quotas can be enforced per team, space, or workload.
  • The data usage page shows detailed breakdowns to help you understand where and how your data is being consumed.

Legacy vs new ingestion flow

StepLegacy pipelineDataset pipeline
RoutingHardcoded to logs/spansData-driven routing via any field
LabelsFixed $l.applicationDynamic, user-defined $l.*
StorageStatic locationConfigurable per dataset/dataspace
Pre-processingNone or limitedFull DataPrime transform support
CategorizationOne shared schemaSchema isolation per dataset
Dataset lifecycleManual setup or hardcodedAuto-created on demand

Benefits of this model

  • Cleaner data: Schema and context stay consistent within each dataset.
  • Dynamic flexibility: Route and organize data based on any field.
  • Performance gains: Query engines work faster when data is semantically aligned.
  • Policy control: Permissions and retention are isolated per dataset.
  • Zero-touch onboarding: No need to pre-declare datasets—they are created automatically.