Conditional Aggregations

Consider a situation where you are analyzing system logs, and you want to count | ag

In this query, the count_if function applies a condition (e.g., $m.severity

The query's result will give you a table in which each row represents a subsystem | | |

In the query, the if function plays a key role in evaluating whether

The output of this query is a document for each subsystem that includes total_count,

Query:

| |

The collect function groups related data into an array.

Each

Notice that the array will contain only the severities that actually exist in the

Query:

The arraySort() function sorts the collected array alphabetically,

Notice that the severity levels are consistently ordered alphabetically in the

case_equals assigns numerical values to severity levels. Conditional aggregations allow you to apply functions based on specific conditions within your dataset, making it easier to group and analyze your data flexibly.

What you’ll learn in this guide:

What is conditional aggregation?
Primary uses
Conditional functions as the basis for conditional aggregations
Supported conditional aggregations

What is conditional aggregation?

Conditional aggregation enables you to apply aggregation functions (e.g., count, sum, avg) selectively based on specific conditions. Unlike standard aggregation, which processes all data without filtering, conditional aggregation evaluates subsets of data that meet defined criteria. This makes it ideal for complex scenarios requiring finer control over data summarization.

What can I use it for?

Conditional aggregation is essential when you need to:

Analyze subsets of data without splitting the dataset into multiple queries.
Simplify operations by aggregating based on custom conditions such as count_if in one step.
Ensure data consistency when certain values are missing from the dataset. In datasets with missing or null values, conditional aggregation allows you to handle these gaps by applying default values or specific rules for missing data. This can help prevent inconsistencies when performing analyses, as you can control how missing values are treated.
Generate compact, easy-to-read outputs that still provide detailed insights.

`if` conditional function

At the basis of conditional aggregations is the function if, the syntax for which is as follows:

if(<condition>,<value1>,<value2>) ><value1>, otherwise the function returns the <value2>.Basic examples
This example counts the number of error responses (status code 500) for each region. The if function checks if the status code is equal to 500, and if so, it counts it as an error.
source logs | groupby region calculate sum(if($d.status_code == 500, 1, 0)) as error_count to check the status_code for each log entry. If the status_code is 500 (indicating an error), the function returns 1; otherwise, it returns 0. The sum() aggregation then adds up all the 1s and 0s, resulting in the total number of error responses per region, which is labeled as error_count.Supported conditional aggregations
Conditional counting with count_if
This technique counts occurrences of specific conditions in the data. It’s ideal for scenarios where you want to track how often each condition is met within each group.
Syntax:
count_if(condition: bool, expression: any?) how many times each severity level (such as ERROR, WARNING, INFO, etc.) appears within each subsystem. Instead of filtering the logs for each severity level individually, you can use count_if to count each severity condition in a single query.Query:
source logs groupby $l.subsystemname gregate class=err>count_if($m.severity == ERROR) as error_count, class=err>count_if($m.severity == WARNING) as warning_count, class=err>count_if($m.severity == DEBUG) as debug_count, class=err>count_if($m.severity == INFO) as info_count, class=err>count_if($m.severity == VERBOSE) as verbose_count == ERROR) and counts only the rows that satisfy it. Each count_if generates a separate column for its respective condition.Output:
[ { "subsystemname": "074157727657", "debug_count": 47, "error_count": 321, "info_count": 1612, "verbose_count": 22, "warning_count": 12 }, { "subsystemname": "aws-cloudtrail", "debug_count": 0, "error_count": 82, "info_count": 11734, "verbose_count": 0, "warning_count": 0 }, { "subsystemname": "prometheus", "debug_count": 56, "error_count": 0, "info_count": 56, "verbose_count": 0, "warning_count": 0 } class=p>] and each column contains the count of logs for each severity level.Now consider the following query, which builds upon the previous one. The user previously aggregated the logs by subsystem name, counting the total occurrences of different severity levels (such as ERROR, DEBUG, etc.) for each subsystem. However, the user may now want to determine if a subsystem has an unusually high error rate. Specifically, the user wants to flag any subsystem where more than 10% of its log entries are marked as ERROR.
Query:
source logs groupby $l.subsystemname agg count() as total_count, class=err>count_if($m.severity == ERROR) as error_count, class=err>count_if($m.severity == DEBUG) as debug_count create analysis from if(error_count / total_count > 0.1, 'Problem', 'OK') filter analysis == 'Problem' the error rate in a subsystem is high. Specifically, it checks if the ratio of error_count to total_count exceeds 0.1 (10%). If this condition is met, the function assigns the value 'Problem' to the newly created field analysis; otherwise, it assigns 'OK'. This enables the query to flag subsystems with a high error rate, effectively categorizing them as "Problem" or "OK." The result is a refined dataset that only includes subsystems with a problematic error rate, allowing for more targeted analysis.Output:
[ { "subsystemname": "074157727657", "total_count": 2000, "error_count": 321, "debug_count": 47, "analysis": "Problem" }, { "subsystemname": "aws-cloudtrail", "total_count": 11800, "error_count": 118, "debug_count": 0, "analysis": "Problem" } class=p>] error_count, debug_count, and analysis. For subsystems with an error rate greater than 10%, the analysis field will be labeled 'Problem’.Compact summarization with collect
collect collects the result of some expression and produces an array. Use it to aggregate multiple values into a compact format, combining related data into a single row for better readability.
Syntax:
collect(expression: T, distinct: bool?, limit: number?, ignoreNulls: bool?): array<T> id=__span-8-1>source logs groupby $l.applicationname, $m.severity agg count() as cnt groupby applicationname aggregate class=err>collect (`{severity}={cnt}`) as severities_with_counts entry in the array combines severity levels with their respective counts, reducing redundancy and enabling a quick overview.

Output:

[ { "applicationname": "074157727657", "severities_with_counts": [ "Debug=51", "Verbose=24", "Info=1709", "Warning=31", "Error=337" ] }, { "applicationname": "aws-cloudtrail", "severities_with_counts": [ "Info=11066", "Error=84" ] }, { "applicationname": "prometheus", "severities_with_counts": [ "Debug=57", "Info=57" ] } class=p>] data. This means that if a particular severity level (like ERROR or WARNING) does not appear in the logs for a given applicationname, it will not appear in the aggregated results for that application. This ensures that the data remains compact and relevant, showing only the severity levels that are actually present in the dataset.Sorting results alphabetically with arraySort
Adding arraySort() to the previous example ensures consistent ordering of aggregated values, making results easier to interpret.
Syntax:
arraySort(array: array<T>, desc: bool?, nullsFirst: bool?): array<T> id=__span-11-1>source logs class=o>| groupby $l.applicationname, $m.severity agg count() as cnt class=o>| groupby applicationname aggregate class=err>collect (`{severity}={cnt}`).arraySort() as severities_with_counts ensuring that severity levels always appear in the same order across rows. By returning a new, sorted array, arraySort() aligns with these best practices, ensuring that your data transformations remain predictable and safe.Note
The arraySort() function does not modify the original array. Instead, it creates a new array that contains the sorted elements and returns this new, sorted array. The original array remains unchanged, lending itself to predictability, immutability, and error prevention.
Output:
[ { "applicationname": "074157727657", "severities_with_counts": [ "Debug=51", "Error=360", "Info=1679", "Verbose=21", "Warning=30" ] }, { "applicationname": "aws-cloudtrail", "severities_with_counts": [ "Error=86", "Info=11944" ] }, { "applicationname": "prometheus", "severities_with_counts": [ "Debug=58", "Info=58" ] } class=p>] output from the arraySort() function. This ensures that, regardless of the order in which data is processed, the results will always display severity levels in the same sequence. For instance, in the case of applicationname = 074157727657, the severity levels are listed as Debug, Error, Info,Verbose, and Warning, in that specific order.By using arraySort(), we ensure uniformity in the output, making it easier to compare severity counts across applications, as the data is consistently arranged. For example, the severity levels of Error and Warning in aws-cloudtrail are sorted alphabetically, and no severity types that are not present in the data (like Verbose) are included.
Logical sorting using severity_value
When performing aggregations based on categories such as severity levels, the default order might not always reflect their inherent logical hierarchy. For example, you may want to sort severities from least to most critical, where a severity level like Verbose should appear before Debug, and Error should come after Warning. To achieve this logical sorting, we can assign numeric values to each severity level, which represent their priority or urgency.
This is done by using create to create a new field, such as severity_value, where each severity level is mapped to a corresponding numeric value. The numeric values will allow you to sort the data logically, from least severe to most severe. Once this numeric field is created, you can use sorting functions to display the severity levels in the desired order.
This technique involves assigning numeric values to conditions for logical ordering (e.g., sorting severities from least to most critical). It does this by creating a new field called severity_value ****using ****the case_equals function, which is used to map each severity level to a corresponding numeric value. Find out more about case_* constructs here.
Query:
source logs class=o>| groupby $l.applicationname, $m.severity agg count() as cnt class=o>| create severity_value from case_equals { severity, VERBOSE -> 0, DEBUG -> 1, INFO -> 2, WARNING -> 3, ERROR -> 4, CRITICAL -> 5 } class=o>| groupby applicationname agg class=err>collect (`{severity_value}:{severity}={cnt}`).arraySort() as severities_with_counts
The combination of severity_value and arraySort() ensures that results are displayed in logical severity order, regardless of their natural order in the dataset. By using the severity_value and applying arraySort(), the data is sorted according to the logical severity order. The numeric values added at the start of each string help to achieve this sorted display, even if the severity levels in the original dataset aren't in the desired order.

Output:

[
    {
        "applicationname": "074157727657",
        "severities_with_counts": [
            "0:Verbose=23",
            "1:Debug=55",
            "2:Info=1692",
            "3:Warning=26",
            "4:Error=401"
        ]
    },
    {
        "applicationname": "aws-cloudtrail",
        "severities_with_counts": [
            "2:Info=11933",
            "4:Error=70"
        ]
    },
    {
        "applicationname": "prometheus",
        "severities_with_counts": [
            "1:Debug=58",
            "2:Info=59"
        ]
    }
]

Notice that the case_equals function assigns a numeric value to each severity level, enabling logical sorting. By adding this numeric value at the start of the severity string, we ensure that the data is ordered in a way that reflects the intended hierarchy—from least severe to most critical. Without this numeric prefix, the severity levels would be sorted alphabetically, which doesn't align with their actual priority.

Also, the use of arraySort() plays a key role in maintaining consistency across the results. It ensures that regardless of how the data is processed, the severity levels will always appear in the same logical order. This approach guarantees that critical severity levels like Error and Warning are correctly placed after less severe levels like Verbose and Debug, providing a more meaningful and easily interpretable output.

`Conclusion`

These advanced aggregation techniques offer flexible ways to analyze and summarize log data. By understanding their strengths and limitations, you can select the best approach for your needs:

count_if simplifies conditional counting but can lead to verbose queries.
collect condenses results into a compact format, suitable for quick summaries.
Sorting methods (arraySort, severity_value) add predictability and logical ordering to aggregated data.

Last updated: April 22, 2025

Previous Specifying timestamp formats

Next Introduction

Theme Light
 light
 dark

Theme Light
 light
 dark

Home User Guides
 DataPrime
 Integrations
 OpenTelemetry
 Developer Portal

 
  Coralogix Ltd. 
© Copyright
Privacy & Terms
 LLMs.txt

Conditional aggregations

What is conditional aggregation?

What can I use it for?

`if` conditional function

Basic examples

Supported conditional aggregations

Conditional counting with `count_if`

Compact summarization with `collect`

Sorting results alphabetically with `arraySort`

Logical sorting using `severity_value`

`Conclusion`

Conditional aggregations

What is conditional aggregation?

What can I use it for?

if conditional function

Basic examples

Supported conditional aggregations

Conditional counting with count_if

Compact summarization with collect

Sorting results alphabetically with arraySort

Logical sorting using severity_value

Conclusion

`if` conditional function

Conditional counting with `count_if`

Compact summarization with `collect`

Sorting results alphabetically with `arraySort`

Logical sorting using `severity_value`

`Conclusion`