When data enters Coralogix, it goes through a structured lifecycle: received from shippers or agents, transformed with DataPrime rules, routed based on attributes like region or team, and directed into the appropriate dataspace and dataset. If a dataset doesn't already exist, it's created automatically and inherits configuration from the parent dataspace.

## High-level flow

```
flowchart LR
    A["Ingress"] --> B["Pre-processing"]
    B --> C["Routing"]
    C --> D["Dataset creation"]
    D --> E["Configuration inheritance"]
    E --> F["Final storage & query"]

    class A entry
    class F success
```

1. **Ingress**

   Data enters the platform through a shipper or agent. Customers can pre-define a `targetDatabase` (dataspace) and `targetDataset` via the shipper config.

1. **Pre-processing**

   Coralogix applies DataPrime transformation rules:

   For example, fields can be removed or recalculated before finalizing the data structure:

   ```dataprime
   remove derived_metric
   | replace raw_value with normalized_value
   | create derived_metric from quantity * 232
   ```

1. **Routing**

   A set of conditions (e.g., region, team, environment) determine where data goes:

   For example, different regions or teams in the data can determine the target dataspace or dataset.

   ```text
   <region == 'us2'>       ->      [targetDataspace = bu1, targetDataset = logs-us]
   <team == 'neptune'>     ->      [targetDataspace = planet, targetDataset = gassy]
   <team == 'venus'>       ->      [targetDataspace = planet, targetDataset = rocky]
   ```

   Routing is fully **data-driven** and can include dynamic elements:

   ```text
   <region>                 ->     [targetDataspace = bu2, targetDataset = logs-{$l.applicationname}]
   ```

1. **Dataset creation**

   If a dataset does not already exist, it will be created automatically under the target dataspace.

1. **Configuration inheritance**

   The dataset inherits configuration from its dataspace, including:

   - Storage prefix (e.g., `s3://bucket/my-dataspace/logs-regionX`)
   - Retention and archive rules
   - Access control policies
   - Metadata enrichment

1. **Final storage & query**

   Once routed and processed, the data is written to object storage and made available for querying.

______________________________________________________________________

## Example dataset structure

```text
default/
  └── logs
  └── spans

business-unit1/
  └── logs

business-unit2/
  └── logs-cx510
  └── logs-euprod2
  └── logs-production
  └── ...
  └── <datasets created dynamically as data arrives>

security/
  └── ...
```

______________________________________________________________________

## Handling quota and duplication

- Duplicating data across datasets (e.g., routing the same event to multiple targets) **will count against your quota**.
- You can monitor dataset-level usage in **Dataset Management**.
- Dataset quotas can be enforced per team, space, or workload.
- The [data usage](https://coralogix.com/docs/user-guides/account-management/payment-and-billing/data-usage/index.md) page shows detailed breakdowns to help you understand where and how your data is being consumed.
