Mastering Null Semantics: Translating SQL Expressions to OpenSearch DSL

Working at Coralogix, a leading full-stack observability platform, I recently faced an interesting challenge.
The team I am part of is building the DataPrime query language and query engine, used to easily query logs and other observability data on the platform, usually in the form of Parquet files on AWS S3. Inside the engine, our DataPrime queries are transformed into query plans with SQL-like expressions, for example in filters. For backwards compatibility, the query engine needs to also be able to query OpenSearch instead of Parquet files. As such, we were met with the need to translate these expressions into OpenSearch query DSL.
While OpenSearch does have some support for SQL, experimenting with it quickly revealed that it’s not particularly thorough. So, in this article I want to explain one specific aspect about the translation I was facing that was particularly interesting: null semantics.
The Basics
In SQL, a (valid) filter expression, e.g. in a WHERE
clause, can have three results: TRUE
, FALSE
or NULL
. Rows where the result is NULL
will, of course, also be filtered out, just like FALSE
, but NULL
will behave very differently from both TRUE
and FALSE
within the expression.
Example:
CREATE TABLE test (
first_name varchar(255),
middle_name varchar(255),
last_name varchar(255)
);
INSERT INTO test (first_name, last_name)
VALUES ('Markus', 'Appel');
SELECT * FROM test WHERE first_name = 'Garry';
// 0 Result
SELECT * FROM test WHERE !(first_name = 'Garry');
// 1 Results
SELECT * FROM test WHERE middle_name = 'Garry';
// 0 Results
SELECT * FROM test WHERE !(midde_name = 'Garry');
// 0 Results !!!
The fourth SELECT
will not return any results because for the one row, middle_name
is NULL
, and the =
operator will return NULL
if either side is NULL
. And !NULL
is also NULL
.
This is called Three-valued logic, specifically the Kleene and Priest logics. The idea is quite simple: If you treat NULL
is undetermined, you will always return NULL
if the result cannot be determined either.
So for example, FALSE && NULL
is FALSE
because it doesn’t matter what NULL
is substituted for, the result will always be FALSE
.
In contrast, FALSE || NULL
will be NULL
, because when substituting FALSE
the result will be FALSE
, when substituting TRUE
the result will be TRUE
.
Meanwhile, the OpenSearch DSL has no concept of expressions at all. It only knows filters, in other words operations that take a set of documents and return those documents that match.
Theory
At first glance, reducing TRUE
, FALSE
, and NULL
to only TRUE
and FALSE
might sound like an impossible task.
However, the solution lies within the context in which an expression is used. Previously, I mentioned that a WHERE
will interpret an expression returning NULL
the same way as FALSE
—by filtering out the row. So WHERE
is actually also binary, it will either filter or not filter the row as well!
Formally, using pseudocode:
filter(expr) = expr == true
= !(expr == false || expr == null)
= !is_false_or_null(expr)
Now we’re getting somewhere. We just have to determine is_false_or_null
for all expressions we want to support. Let’s start with the basics:
is_false_or_null(null) = true
= { "match_all": {}}
is_false_or_null(true) = false
= { "match_none": {}}
is_false_or_null(false) = true
= { "match_all": {}}
is_false_or_null(field) = !matches(field, true) || !exists(field)
= {
"bool": {
"should": [
{
"bool": {
"must_not": {
"matches": {
"field": "true"
}
}
}
},
{
"bool": {
"must_not": {
"exists": "field"
}
}
}
]
}
}
Due to how verbose they are, while being relatively easy to come up with, I will stop writing out the OpenSearch DSL from here on out.1
Next, we should take a look at the boolean operators: OR
, AND
, and NOT
.
My approach there was to derive the logic from the truth tables on the Wikipedia article.

So, for example, take is_false_or_null(expr1 && expr2)
. I’ve highlighted all results for which it will be true:

Coincidentally, that just corresponds to the union of A
being FALSE
or NULL
and B
being FALSE
or NULL
.

So, the result is:
is_false_or_null(expr1 && expr2) = is_false_or_null(expr1) || is_false_or_null(expr2)
In case of OR
, the truth table looks a bit different:

Here, it’s not the union, but the intersection of A
being FALSE
or NULL
and B
being FALSE
or NULL
. So:
is_false_or_null(expr1 || expr2) = is_false_or_null(expr1) && is_false_or_null(expr2)
And last, we need to look at NOT
.

Here, it’s obvious that NOT(A)
is FALSE
or NULL
when A
is TRUE
or NULL
:
is_false_or_null(!expr)= is_true_or_null(expr)
So now we need is_true_or_null
too, but it’s very similar to implement:
is_true_or_null(null) = true
is_true_or_null(true) = true
is_true_or_null(false) = false
is_true_or_null(field) = matches(field, true) || !exists(field)
Even AND
, OR
and NOT
are easy. Looking at the truth tables it’s clear that this time AND
is the logical union, and OR
the logical intersection.
is_true_or_null(expr1 && expr2) = is_true_or_null(expr1) && is_true_or_null(expr2)
is_true_or_null(expr1 || expr2) = is_true_or_null(expr1) || is_true_or_null(expr2)
is_true_or_null(!expr) = is_false_or_null(expr)
Conclusion
With the above theory, the hardest part of the puzzle is solved.
The rest now consists of mere legwork of translating all concepts of the source language one wants to support. Of course, the capabilities of the ElasticSearch DSL will be exhausted rather quickly, as even simple numeric arithmetic doesn’t have any equivalent in it, meaning that rather quickly one will have to “fall back” to translating the expressions to script queries, which will have much worse performance but can cover virtually any use case. Furthermore, to avoid translating everything twice with just slight differences for is_true_or_null
and is_false_or_null
, it of course is best to have one is_value_or_null(value: bool, …)
instead.
To avoid queries getting too large, and especially too deeply nested, we also chose an intermediate representation of the ElasticSearch DSL before turning it into JSON.
This gives us the ability to comfortably perform a list of optimizations beforehand:
- Apply De Morgan’s laws
- Flatten nested bool clauses through associativity
- Invert negated
match_all
andmatch_none
- Remove
match_all
from inside ofbool
must
andmatch_none
from inside ofbool
withshould
- In
bool
, replacemust
withmatch_none
withmatch_none
- In
bool
, replaceshould
withmatch_all
withmatch_all
- Combine script clauses inside of
bool
withmust
andshould
appropriately
1Note: be careful to set minimum_should_match
to 1 when translating OR clauses to bool
with should
.