# Background Queries v2

Limited availability

Background Queries v2 is in limited availability and is replacing [Background Queries v1](https://coralogix.com/docs/user-guides/dataengine/background_queries_v1/index.md), which is being sunsetted. While the feature is rolling out, the page is labeled **Preview** in the sidebar and runs alongside v1.

Background Queries v2 runs long-running DataPrime or Lucene queries asynchronously, so you can submit a query, keep working in Coralogix, and come back when the results are ready. Use it for heavy, recurring, or long-term analysis.

## About Background Queries

### At a glance

Submit a long-running DataPrime query that runs in the background — for example:

```text
source logs
  | filter $d.severity == "ERROR"
  | groupby $d.service.name aggregate count() as error_count
```

Select **Temporary** results (30-day retention, good for one-off investigations) or **Persistent** results (written to a user-defined summary dataset for long-term reuse). Once the query completes, the results are reusable from Explore, Custom Dashboards, the API, and anywhere DataPrime runs.

```
flowchart LR
    A["Submit query"] --> B["<b>Temporary</b><br/>30 days"]
    A --> C["<b>Persistent</b>"]
    A -.-> L["<b>Ephemeral results</b><br/>Background Queries v1<br/>30 days"]
    C --> D["<b>Append</b><br/>Add to current version"]
    C --> E["<b>Overwrite</b><br/>New version,<br/>history retained"]
    B --> F["Query · Visualize · API"]
    D --> F
    E --> F
    L -.-> G["Download TSV only"]
    click L "../background_queries_v1/" "Open Background Queries v1"

    class A entry
    class L external
    class F success
```

### How it works

Three decisions shape every Background Query:

- **What you're querying** — the [entity type](https://coralogix.com/docs/user-guides/data-layer/entity-types/index.md) of the source data: `logs`, `spans`, or — when produced by DataPrime — derived types like `jsonData`. Lucene queries always produce `logs`; DataPrime can produce any of them depending on the source and query pipeline.
- **Where results go** — the **storage destination**:
  - **Temporary**: A system dataset kept for 30 days. Good for one-off investigations and ad-hoc sharing.
  - **Persistent**: A user-defined [summary dataset](https://coralogix.com/docs/user-guides/data-layer/default-dataspace/user-defined-datasets/#streaming-vs-summary-datasets) in your archive bucket, written under the [default dataspace](https://coralogix.com/docs/user-guides/data-layer/default-dataspace/index.md). Good for rolling aggregations, long-term analysis, and downstream queries.
- **How results are written** (Persistent only) — the **action**: **Append** adds rows to the existing dataset version; **Overwrite** creates a new version and retains historical versions.

Once a query finishes, the result set or dataset is reusable from [Explore](https://coralogix.com/docs/user-guides/data_exploration/index.md), [Custom Dashboards](https://coralogix.com/docs/user-guides/custom-dashboards/introduction/index.md), the [Background Queries API](https://coralogix.com/docs/dataprime/API/direct-archive-query-http/index.md), and anywhere [DataPrime](https://coralogix.com/docs/dataprime/introduction/welcome-to-the-dataprime-reference/index.md) runs.

### Compare Temporary and Persistent

The two destinations differ across the dimensions that shape the choice — retention, reusability, setup, cost, and what each is best for:

| Property                          | Temporary                                          | Persistent                                                                                                                                                  |
| --------------------------------- | -------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Where results are stored          | A Coralogix-managed bucket with no external access | Your archive bucket                                                                                                                                         |
| How results are referenced        | `system/engine.resultsets.<query_id>`              | `default/<dataset_name>`                                                                                                                                    |
| Retention                         | 30 days                                            | According to your archive bucket lifecycle policy                                                                                                           |
| Reusable in the platform and APIs | During the retention window                        | As long as data exists in your bucket                                                                                                                       |
| Append results over time          | No                                                 | Yes                                                                                                                                                         |
| Dataset setup required            | No                                                 | Yes; create it in [Dataspace Management](https://coralogix.com/docs/user-guides/data-layer/default-dataspace/user-defined-datasets/#create-a-dataset) first |
| Best for                          | One-off investigations or ad-hoc sharing           | Rolling aggregations, long-term analysis, downstream queries, custom timestamps                                                                             |
| Pricing model                     | Free of charge                                     | [Ingested results incur quota](https://coralogix.com/docs/user-guides/account-management/payment-and-billing/data-usage/#user-defined-datasets)             |

## Create a Background Query

### Step 1. Open Background Queries

In the Coralogix sidebar, go to **Data Flow → Background Queries v2**, then select **+ New background query**.

### Step 2. Add details

- **Name**: Required. Must start with a letter and contain only letters, numbers, underscores, or dots. Dots cannot appear consecutively or at the end. 2–255 characters.
- **Description**: Optional. Up to 2,048 characters. Use it to record what the dataset contains and its intended purpose — humans or AI agents may rely on this description to understand what the resultset holds.

### Step 3. Define the query

1. Select **DataPrime** or **Lucene**.
1. Set the time range with the time range picker. The default is the last 1 hour; minimum range is 1 hour and maximum range is 90 days.
1. Enter your [DataPrime](https://coralogix.com/docs/dataprime/introduction/welcome-to-the-dataprime-reference/index.md) or Lucene query expression. Select **Cheat sheet** for syntax help.

Note

Lucene queries always run against logs and produce time-partitionable results. The dataset's entity type is `logs` and the timestamp field defaults to `$m.timestamp`. Partitioning follows the destination dataset's setting — when writing to an empty dataset or with **Overwrite**, you can select unpartitioned. To query spans or other entity types, use DataPrime.

### Step 4. Select a destination

#### Temporary

1. In **Destination**, keep **Save results** set to **Temporary** (the default).
1. Select **Submit**.

After submission, a success toast confirms the query was received and a new row appears in the list with status **Pending**, then **Running**, then **Completed** (or **Failed**).

#### Persistent

1. In **Destination**, set **Save results** to **Persistent**.

1. Configure the four destination fields:

   | Field                                             | What it does                                                                                                                                                                                                                                                                                                                                                                                                            |
   | ------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
   | **Dataspace**                                     | Locked to **Default**. Background Queries v2 writes only to the default dataspace.                                                                                                                                                                                                                                                                                                                                      |
   | **Dataset**                                       | Select a user-defined dataset. Datasets must be created in advance — if the one you need doesn't exist, create it in [Dataspace Management](https://coralogix.com/docs/user-guides/data-layer/default-dataspace/user-defined-datasets/#create-a-dataset) first. Read-only datasets appear greyed out at the bottom of the list with a **No permissions** badge; hover the badge to see which permissions are missing.   |
   | **Action**                                        | **Append** — add new rows to the current dataset version (good for long-term aggregations). Appending the same raw data more than once creates duplicate records — there's no automatic deduplication by log ID. **Overwrite** — start a new dataset version while keeping older versions. New versions accept any entity type and partitioning scheme, similar to empty datasets.                                      |
   | **Timestamp field used for dataset partitioning** | Required when appending to a partitioned dataset. The dropdown lists `$m.timestamp`, `$m.ingressTimestamp` (when the result entity type is `logs`), and any timestamp fields detected in your query results. With **Overwrite** or an empty dataset, you can also pick **None** to leave the destination unpartitioned. Defaults to `$m.timestamp` when available; the default re-evaluates as your query text changes. |

1. Select **Submit**. If you chose **Overwrite**, confirm in the dialog that prompts before submission. If you navigate away from the form with unsaved changes, the platform prompts you to discard or keep editing.

Chain queries to cover larger windows

Use **Append** to combine multiple narrower-window queries into the same dataset. If a single 24-hour scan would exceed the per-query limit, run four 6-hour queries that each append to the same destination — the rows accumulate as if you'd queried the whole window at once.

### Step 5. After completion

Both Temporary and Persistent queries support **View results**, **Download TSV**, **View query** (read-only), and **Clone query**. Two differences to know:

- **Temporary executions**: **View results in Explore** opens a new tab with a DataPrime query filtering this execution's results. After 30 days, the results expire — the row stays in the list, but **View results in Explore** and **Download TSV** become unavailable.
- **Persistent executions**: also offer **View dataset in Explore**, which omits the per-execution filter and opens the full dataset across executions.

### Query the saved summary dataset

From [Explore](https://coralogix.com/docs/user-guides/data_exploration/index.md), [Custom Dashboards](https://coralogix.com/docs/user-guides/custom-dashboards/introduction/index.md), or anywhere [DataPrime](https://coralogix.com/docs/dataprime/introduction/welcome-to-the-dataprime-reference/index.md) runs, query a dataset with:

```text
source <dataspace>/<dataset>
```

For example:

```text
source default/sales_by_hour
```

Datasets currently operate on archived data. To combine a dataset with a recent window, use [`join`](https://coralogix.com/docs/dataprime/language-reference/commands-reference/join/index.md).

## Manage Background Queries

The **Background Queries** grid shows all submitted queries. Use the search bar to filter by query name, description, or submitter. Select **Refresh** to reload, and use the time range picker to browse queries within a window. Columns can be reordered by drag-and-drop; the order persists per browser. Sorting beyond submission time is not supported yet.

### Columns

| Column                  | What it shows                                                                                                                                           |
| ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Submission time**     | Date and time the query was submitted.                                                                                                                  |
| **Status**              | Current lifecycle state — see [Status](#status) below.                                                                                                  |
| **Query name**          | User-provided name; hover the cell for a tooltip with the full name, the first 500 characters of the description, and the query ID.                     |
| **Submitted by**        | Avatar and email of the user who submitted the query.                                                                                                   |
| **Action**              | `Append` or `Overwrite`. Always populated from the query definition; Temporary executions display `Append` by default.                                  |
| **Partitioning**        | `Partitioned` or `Unpartitioned`.                                                                                                                       |
| **Entity type**         | `logs`, `spans`, `jsonData`, or `empty` (for a dataset that hasn't been written to yet).                                                                |
| **Destination dataset** | Cell rendering depends on execution state — see [Destination dataset rendering](#destination-dataset-rendering) below.                                  |
| **Results**             | Row count for Completed queries (e.g., `256`, `1.34K`, `10K`).                                                                                          |
| **Data ingested**       | Raw uncompressed size for Completed queries (e.g., `236 kB`, `2.91 GB`). This is what counts toward your daily quota — not the compressed size on disk. |
| **Execution duration**  | Runtime for Completed and Failed queries (e.g., `15s`, `1m 50s`).                                                                                       |

### Destination dataset rendering

The Destination dataset cell shows different content depending on the execution's save mode and current state:

| Execution state                                       | Cell shows                                                           | Notes                                                                              |
| ----------------------------------------------------- | -------------------------------------------------------------------- | ---------------------------------------------------------------------------------- |
| **Persistent** — Completed or Running                 | `default/<dataset_name>` followed by a version badge (`v1`, `v2`, …) | New datasets show `empty` instead of a version.                                    |
| **Temporary**                                         | **Temporary dataset**                                                | Hover the cell to copy the underlying `system/engine.resultsets.<query_id>` path.  |
| **Legacy** (BGQ v1 or earlier API calls)              | **Ephemeral results**                                                | No badge, no copyable path.                                                        |
| **Failed** or **Cancelled** with a chosen destination | Dataset name only — no version suffix                                | —                                                                                  |
| **Deleted destination**                               | Last known dataset name, dimmed                                      | Hover shows a **Dataset deleted** tooltip. Result actions on the row are disabled. |
| **Pending** or no chosen destination                  | `None`                                                               | —                                                                                  |

### Status

| Status        | What it means                                                                                                                                                                                                                                   |
| ------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Pending**   | Queued and waiting to start. The info icon hover reads *"Query will be executed in the background. Refresh the table or return to this screen later for an updated status."* Result columns are empty.                                          |
| **Running**   | Execution in progress. The info icon hover reads *"Query is being executed in the background. Refresh the table or return to this screen later for an updated status."* **Cancel** is available from the row actions. Result columns are empty. |
| **Completed** | Query finished and results are available.                                                                                                                                                                                                       |
| **Failed**    | Query stopped before completion; results are not available. Hover the status info icon for the failure message, query ID, and a **Learn more** link.                                                                                            |
| **Cancelled** | Manually cancelled while Pending or Running; partial data is discarded.                                                                                                                                                                         |

When a **Temporary** execution passes its 30-day window, the row stays at **Completed** but greys out, and hovering the Status cell shows a **Query expired** tooltip. **View results in Explore** and **Download TSV** become unavailable on that row.

### Legacy executions

Submissions made from [Background Queries v1](https://coralogix.com/docs/user-guides/dataengine/background_queries_v1/index.md) — or from the public API before the destination fields were added — appear in the grid with **Ephemeral results** in the Destination dataset cell. They behave differently from v2 executions:

- Kept for 30 days, then dropped.
- **Download TSV** stays active on the row; **View results in Explore** is greyed out because the results were never written to a dataset.
- Cannot be referenced from Explore, Custom Dashboards, the API, or anywhere DataPrime runs.

To turn a legacy result set into something reusable, re-submit the same query in v2 with **Save results** set to **Persistent** and pick a destination dataset.

### Row actions

Hover a row to reveal four action icons, or open the read-only **View query** drawer (clicking the icon *or* clicking the row both open it) to get the same actions on a single screen. In the drawer, the current status appears as a badge next to the title; Pending and Running drawers add a **Refresh** button that re-fetches the execution and updates the badge in place, plus a **Cancel query** button.

| Action           | When it's active                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| ---------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **View results** | In the grid, this is one button with a submenu (▾); in the drawer, the two options surface as separate buttons. **View results in Explore** — Completed Temporary (within 30 days) and Completed Persistent. Disabled for Pending, Running, Failed, Cancelled, expired Temporary, deleted-destination Persistent, and legacy executions. **View dataset in Explore** — Completed Persistent only. Opens the full dataset across all executions, without the per-execution filter. Disabled when the destination dataset has been deleted. |
| **Download TSV** | Completed Temporary (within 30 days), Completed Persistent, and legacy executions (the TSV is still produced). Disabled for Pending, Running, Failed, Cancelled, expired Temporary, and deleted-destination Persistent.                                                                                                                                                                                                                                                                                                                   |
| **View query**   | All states. Opens the submission form as read-only at the dataset version that was executed (not the current latest version).                                                                                                                                                                                                                                                                                                                                                                                                             |
| **Clone query**  | All states. Pre-fills a new submission form with this query's name, text, time range, and destination.                                                                                                                                                                                                                                                                                                                                                                                                                                    |

## Persistent dataset rules

The following rules apply when you save results to a user-defined summary dataset. See [Create user-defined datasets](https://coralogix.com/docs/user-guides/data-layer/default-dataspace/user-defined-datasets/#create-a-dataset) for dataset creation steps.

### Entity type

Query results have an entity type (for example, `logs`, `spans`, or `jsonData`) corresponding to a data pillar. A user-defined dataset's entity type is set on first use: the first Background Query (or TCO policy) that writes into it locks the dataset to that entity type. When appending, results must match the dataset's existing entity type. Switching to **Overwrite** bypasses entity-type matching and resets the dataset to the new type.

For the full list of entity types, their schema guarantees, and how each is produced, see [Entity types](https://coralogix.com/docs/user-guides/data-layer/entity-types/index.md).

Examples of result entity types:

| Query            | Result entity type    |
| ---------------- | --------------------- |
| Any Lucene query | `logs`                |
| \`source logs    | limit 10\`            |
| \`source spans   | limit 10\`            |
| \`source logs    | countby $m.severity\` |

### Partitioning and timestamp field

Datasets are either **Partitioned** or **Unpartitioned**. Partitioning physically organizes storage into time-based buckets by day (`dt`) and hour (`hr`), following the layout `<dataspace>/<dataset>/v<version>/dt=<date>/hr=<hour>/` (for example, `default/sales_by_hour/v1/dt=2026-01-15/hr=14/`). Coralogix scans only the relevant partitions when querying with a time filter, which improves performance as the dataset grows.

When appending to a partitioned dataset, results must include a valid timestamp field. Coralogix uses it to assign each row to the correct `dt`/`hr` partition. Append is blocked when results don't contain a timestamp (for example, aggregations that drop the timestamp).

### Validation errors

The submission form blocks **Submit** until both validations pass.

Missing timestamp for a partitioned dataset

When appending to a partitioned dataset and your results don't contain a timestamp field, the form shows two errors simultaneously:

- "Missing a timestamp field which is needed to Append to a partitioned dataset" below the query
- "A valid timestamp field is required to append to this dataset" next to the timestamp dropdown

**Resolve by** changing the query to preserve a timestamp, switching to an unpartitioned target, or selecting **Overwrite**.

Entity type mismatch

When appending and your results produce a different entity type than the dataset's existing type, the form shows the same message below the query and below the dataset dropdown:

```text
Appended results of entity type <RESULTS_ENTITY_TYPE> do not match the destination dataset's entity type <DATASET_ENTITY_TYPE>.
```

**Resolve by** adjusting the query, selecting a different dataset, or switching to **Overwrite** (which bypasses entity-type matching and starts a new dataset version with the new type).

## Limitations

- Maximum result size per query is 1 million rows.
- Query time range is 1 hour minimum, 90 days maximum.
- Temporary results expire after 30 days. After expiry, **View results in Explore** and **Download TSV** are no longer available, but the row stays in the list.
- Appending to a partitioned dataset requires query results to include a valid timestamp field. Aggregation queries that drop timestamps cannot be appended to partitioned datasets.
- The **Persistent** option is disabled when the account daily quota is reached. Temporary saves continue to work; Persistent saves resume when the quota resets at midnight UTC, or when a new daily quota is set.
- Each team can have a maximum of 15 user-defined datasets by default.

For complete query-level limits (scan size, byte limits, and so on), see [Background Query limitations](https://coralogix.com/docs/dataprime/API/limitations/#background-query-limitations).

## API

Coralogix offers [gRPC-style](https://coralogix.com/docs/dataprime/API/direct-archive-query-grpc/index.md) and [HTTP-style](https://coralogix.com/docs/dataprime/API/direct-archive-query-http/index.md) Background Queries APIs.

The destination fields added for Persistent results are optional. Existing API clients that don't set them keep the previous behavior (results are saved as Temporary) without code changes. To target a summary dataset, set the destination dataspace, dataset, action (Append or Overwrite), and (when appending to a partitioned dataset) the timestamp field.

## Permissions

To submit and run Background Queries, and to manage the user-defined datasets that summary results are written to, you need the following permissions:

| Preset       | Permission                       | Where it applies                                                                                                                    | Description                                                          |
| ------------ | -------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------- |
| DataQuerying | `LEGACY-ARCHIVE-QUERIES:EXECUTE` | Account-wide                                                                                                                        | Submit and execute Background Queries                                |
|              | `Append Data`                    | Per target dataset                                                                                                                  | Save Persistent results with **Append**                              |
|              | `Overwrite Data`                 | Per target dataset                                                                                                                  | Save Persistent results with **Overwrite**                           |
|              | `Manage Datasets`                | [Dataspace Management](https://coralogix.com/docs/user-guides/data-layer/default-dataspace/user-defined-datasets/#create-a-dataset) | Create the user-defined datasets that summary results are written to |

If you have neither `Append Data` nor `Overwrite Data` on any dataset, the **Persistent** option is disabled.

Learn more about [roles and permissions](https://coralogix.com/docs/user-guides/aaa/access-control/permissions/index.md).