Skip to content

approx_count_distinct - Count unique values approximately for performance

The approx_count_distinct function will count a given document if:

  • The document contains a unique, non-null value for a specified field.

This function is an approximation of the result from distinct_count and is designed for high performance scenarios where the exact figure isn't important.

NOTE: approx_count_distinct is an aggregation function, so should be used in conjunction with a grouping keyword, like groupby.

Syntax

approx_count_distinct(expression: any)

Arguments

Name Type Required Description
expression any true The non-null value to be counted distinctly

Example - Checking for high users

In this example, we wish to understand how many active users there have been in a given timeframe. We don't need the exact value, since we're primarily interested if it's over a certain figure. We can provide the username field to produce a count of the unique users.

groupby $l.applicationname aggregate approx_count_distinct($d.username) as active_users

We can then flag whether this is higher than our threshold of users:

create is_high_user_count from active_users > 10000

The field is_high_user_count is now true is the approximate total number of users for a given applicationname is over 10000, and false otherwise.