Hands-on-Exercises: Mapping Exceptions with Elasticsearch

Mapping is an essential foundation of an index that can generally be considered the heart of Elasticsearch. So you can be sure of the importance of a well-managed mapping. But just as it is with many important things, sometimes mappings can go wrong. Let’s take a look at various issues that can arise with mappings and how to deal with them.

Before delving into the possible challenges with mappings, let’s quickly recap some key points about Mappings. A mapping essentially entails two parts:

  1. The Process: A process of defining how your JSON documents will be stored in an index
  2. The Result: The actual metadata structure resulting from the definition process

The Process

If we first consider the process aspect of the mapping definition, there are generally two ways this can happen:

  • An explicit mapping process where you define what fields and their types you want to store along with any additional parameters.
  • A dynamic mapping Elasticsearch automatically attempts to determine the appropropriate datatype and updates the mapping accordingly.

The Result

The result of the mapping process defines what we can “index” via individual fields and their datatypes, and also how the indexing happens via related parameters.

Consider this mapping example:

{
"mappings": {
"properties": {
"timestamp": { "type": "date" },
"service": { "type": "keyword" },
"host_ip": { "type": "ip" },
"port": { "type": "integer" },
"message": { "type": "text" }
}
}
}

It’s a very simple mapping example for a basic logs collection microservice. The individual logs consist of the following fields and their associated datatypes:

  • Timestamp of the log mapped as a date
  • Service name which created the log mapped as a keyword
  • IP of the host on which the log was produced mapped as an ip datatype
  • Port number mapped as an integer
  • The actual log Message mapped as text to enable full-text searching
  • More… As we have not disabled the default dynamic mapping process so we’ll be able to see how we can introduce new fields arbitrarily and they will be added to the mapping automatically.

The Challenges

So what could go wrong :)?

There are generally two potential issues that many will end up facing with Mappings:

  • If we create an explicit mapping and fields don’t match, we’ll get an exception if the mismatch falls beyond a certain “safety zone”. We’ll explain this in more detail later.
  • If we keep the default dynamic mapping and then introduce many more fields, we’re in for a “mapping explosion” which can take our entire cluster down.

Let’s continue with some interesting hands-on examples where we’ll simulate the issues and attempt to resolve them.

Hands-on Exercises

Field datatypes – mapper_parsing_exception

Let’s get back to the “safety zone” we mentioned before when there’s a mapping mismatch.

We’ll create our index and see it in action. We are using the exact same mapping that we saw earlier:

curl --request PUT 'https://localhost:9200/microservice-logs' 
--header 'Content-Type: application/json' 
--data-raw '{
   "mappings": {
       "properties": {
           "timestamp": { "type": "date"  },
           "service": { "type": "keyword" },
           "host_ip": { "type": "ip" },
           "port": { "type": "integer" },
           "message": { "type": "text" }
       }
   }
}'

A well-defined JSON log for our mapping would look something like this:

{"timestamp": "2020-04-11T12:34:56.789Z", "service": "ABC", "host_ip": "10.0.2.15", "port": 12345, "message": "Started!" }

But what if another service tries to log its port as a string and not a numeric value? (notice the double quotation marks). Let’s give that try:

curl --request POST 'https://localhost:9200/microservice-logs/_doc?pretty' 
--header 'Content-Type: application/json' 
--data-raw '{"timestamp": "2020-04-11T12:34:56.789Z", "service": "XYZ", "host_ip": "10.0.2.15", "port": "15000", "message": "Hello!" }'
>>>
{
...
 "result" : "created",
...
}

Great! It worked without throwing an exception. This is the “safety zone” I mentioned earlier.

But what if that service logged a string that has no relation to numeric values at all into the Port field, which we earlier defined as an Integer? Let’s see what happens:

curl --request POST 'https://localhost:9200/microservice-logs/_doc?pretty' 
--header 'Content-Type: application/json' 
--data-raw '{"timestamp": "2020-04-11T12:34:56.789Z", "service": "XYZ", "host_ip": "10.0.2.15", "port": "NONE", "message": "I am not well!" }'
>>>
{
 "error" : {
   "root_cause" : [
     {
       "type" : "mapper_parsing_exception",
       "reason" : "failed to parse field [port] of type [integer] in document with id 'J5Q2anEBPDqTc3yOdTqj'. Preview of field's value: 'NONE'"
     }
   ],
   "type" : "mapper_parsing_exception",
   "reason" : "failed to parse field [port] of type [integer] in document with id 'J5Q2anEBPDqTc3yOdTqj'. Preview of field's value: 'NONE'",
   "caused_by" : {
     "type" : "number_format_exception",
     "reason" : "For input string: "NONE""
   }
 },
 "status" : 400
}

We’re now entering the world of Elastisearch mapping exceptions! We received a code 400 and the mapper_parsing_exception that is informing us about our datatype issue. Specifically that it has failed to parse the provided value of “NONE” to the type integer.

So how we solve this kind of issue? Unfortunately, there isn’t a one-size-fits-all solution. In this specific case we can “partially” resolve the issue by defining an ignore_malformed mapping parameter.

Keep in mind that this parameter is non-dynamic so you either need to set it when creating your index or you need to: close the index → change the setting value → reopen the index. Something like this.

curl --request POST 'https://localhost:9200/microservice-logs/_close'
 
curl --location --request PUT 'https://localhost:9200/microservice-logs/_settings' 
--header 'Content-Type: application/json' 
--data-raw '{
   "index.mapping.ignore_malformed": true
}'
 
curl --request POST 'https://localhost:9200/microservice-logs/_open'

Now let’s try to index the same document:

curl --request POST 'https://localhost:9200/microservice-logs/_doc?pretty' 
--header 'Content-Type: application/json' 
--data-raw '{"timestamp": "2020-04-11T12:34:56.789Z", "service": "XYZ", "host_ip": "10.0.2.15", "port": "NONE", "message": "I am not well!" }'

Checking the document by its ID will show us that the port field was omitted for indexing. We can see it in the “ignored” section.

curl 'https://localhost:9200/microservice-logs/_doc/KZRKanEBPDqTc3yOdTpx?pretty'
{
...
 "_ignored" : [
   "port"
 ],
...
}

The reason this is only a “partial” solution is because this setting has its limits and they are quite considerable. Let’s reveal one in the next example.

A developer might decide that when a microservice receives some API request it should log the received JSON payload in the message field. We already mapped the message field as text and we still have the ignore_malformed parameter set. So what would happen? Let’s see:

curl --request POST 'https://localhost:9200/microservice-logs/_doc?pretty' 
--header 'Content-Type: application/json' 
--data-raw '{"timestamp": "2020-04-11T12:34:56.789Z", "service": "ABC", "host_ip": "10.0.2.15", "port": 12345, "message": {"data": {"received":"here"}}}'
>>>
{
...
       "type" : "mapper_parsing_exception",
       "reason" : "failed to parse field [message] of type [text] in document with id 'LJRbanEBPDqTc3yOjTog'. Preview of field's value: '{data={received=here}}'"
...
}

We see our old friend, the mapper_parsing_exception! This is because ignore_malformed can’t handle JSON objects on the input. Which is a significant limitation to be aware of.

Now, when speaking of JSON objects be aware that all the mapping ideas remains valid for their nested parts as well. Continuing our scenario, after losing some logs to mapping exceptions, we decide it’s time to introduce a new payload field of the type object where we can store the JSON at will.

Remember we have dynamic mapping in place so you can index it without first creating its mapping:

curl --request POST 'https://localhost:9200/microservice-logs/_doc?pretty' 
--header 'Content-Type: application/json' 
--data-raw '{"timestamp": "2020-04-11T12:34:56.789Z", "service": "ABC", "host_ip": "10.0.2.15", "port": 12345, "message": "Received...", "payload": {"data": {"received":"here"}}}'
>>>
{
...
 "result" : "created",
...
}

All good. Now we can check the mapping and focus on the payload field.

curl --request GET 'https://localhost:9200/microservice-logs/_mapping?pretty'
>>>
{
...
       "payload" : {
         "properties" : {
           "data" : {
             "properties" : {
               "received" : {
                 "type" : "text",
                 "fields" : {
                   "keyword" : {
                     "type" : "keyword",
                     "ignore_above" : 256
                   }
...
}

It was mapped as an object with (sub)properties defining the nested fields. So apparently the dynamic mapping works! But there is a trap. The payloads (or generally any JSON object) in the world of many producers and consumers can consist of almost anything. So you know what will happen with different JSON payload which also consists of a payload.data.received field but with a different type of data:

curl --request POST 'https://localhost:9200/microservice-logs/_doc?pretty' 
--header 'Content-Type: application/json' 
--data-raw '{"timestamp": "2020-04-11T12:34:56.789Z", "service": "ABC", "host_ip": "10.0.2.15", "port": 12345, "message": "Received...", "payload": {"data": {"received": {"even": "more"}}}}'

…again we get the mapper_parsing_exception!

So what else can we do?

  • Engineers on the team need to be made aware of these mapping mechanics. You can also eastablish shared guidelines for the log fields.
  • Secondly you may consider what’s called a Dead Letter Queue pattern that would store the failed documents in a separate queue. This either needs to be handled on an application level or by employing Logstash DLQ which allows us to still process the failed documents.

Limits – illegal_argument_exception

Now the second area of caution in relation to mappings, are limits. Even from the super-simple examples with payloads you can see that the number of nested fields can start accumulating pretty quickly. Where does this road end? At the number 1000. Which is the default limit of the number of fields in a mapping.

Let’s simulate this exception in our safe playground environment before you’ll unwillingly meet it in your production environment.

We’ll start by creating a large dummy JSON document with 1001 fields, POST it and then see what happens.

To create the document, you can either use the example command below with jq tool (apt-get install jq if you don’t already have it) or create the JSON manually if you prefer:

thousandone_fields_json=$(echo {1..1001..1} | jq -Rn '( input | split(" ") ) as $nums | $nums[] | . as $key | [{key:($key|tostring),value:($key|tonumber)}] | from_entries' | jq -cs 'add')
 
echo "$thousandone_fields_json"
{"1":1,"2":2,"3":3, ... "1001":1001}

We can now create a new plain index:

curl --location --request PUT 'https://localhost:9200/big-objects'

And if we then POST our generated JSON, can you guess what’ll happen?

curl --request POST 'https://localhost:9200/big-objects/_doc?pretty' 
--header 'Content-Type: application/json' 
--data-raw "$thousandone_fields_json"
>>>
{
 "error" : {
   "root_cause" : [
     {
       "type" : "illegal_argument_exception",
       "reason" : "Limit of total fields [1000] in index [big-objects] has been exceeded"
     }
...
 "status" : 400
}

… straight to the illegal_argument_exception exception! This informs us about the limit being exceeded.

So how do we handle that? First, you should definitely think about what you are storing in your indices and for what purpose. Secondly, if you still need to, you can increase this 1,000 limit. But be careful as with bigger complexity might come a much bigger price of potential performance degradations and high memory pressure (see the docs for more info).

Changing this limit can be performed with a simple dynamic setting change:

curl --location --request PUT 'https://localhost:9200/big-objects/_settings' 
--header 'Content-Type: application/json' 
--data-raw '{
 "index.mapping.total_fields.limit": 1001
}'
>>>
{"acknowledged":true}

Now that you’re more aware of the dangers lurking with Mappings, you’re much better prepared for the production battlefield 🙂

Learn More

The Complete Guide to Elasticsearch Mapping

As Elasticsearch is gradually becoming the standard for textual data indexing (specifically log data) more companies struggle to scale their ELK stack. We decided to pick up the glove and create a series of posts to help you tackle the most common Elasticsearch performance and functional issues.

This post will help you in understanding and solving one of the most frustrating Elasticsearch issues – Mapping exceptions.

What is Elasticsearch Mapping?

Mapping in Elasticsearch is the process of defining how a document, and the fields it contains, are stored and indexed, each field with its own data type. Field data types can be, for example, simple types like text (string), long, boolean, or object/nested keys (which support the hierarchical nature of JSON).

Field mapping, by default, is set dynamically, meaning that when a previously unseen field is found in a document, Elasticsearch will add the new field to the type mapping and set its type according to its value.

For example, if this is our document: {“name”: “John”, “age”: 40, “is_married”: true}, the name field will be of type string, the age field will be of type long, and the is_married field will be of type boolean.

When the next log, containing any of these fields, will arrive, our Elasticsearch Index will expect the log fields to be populated with values of the same types as the ones that were determined previously in the initial mapping of our Index.

But what if it didn’t?

Elasticsearch Mapping Exceptions

Basically, the mapping determination is a race condition, the first key type to arrive at an index determines that key name type until the next index is created. Let’s take this log for example, {“name” : {“first“ : “John”, “second“ : “Smith”}, “age” : 40, “is_married” : 1}. The field “name” is actually a JSON object and the field “is_married” is now a number, this will result in the log causing two different mapping exceptions.

The log, in this case, will be discarded without being Indexed in Elasticsearch at all. We can consider this as data-loss, without having any indication of that incident besides the Elasticsearch log files, which will contain a mapping exception for each log (also impacting the cluster performance).

These scenarios happen more frequently when different programmers are working on the same projects, printing logs with the same fields’ names, only they are using a different set of values. How do we handle these mapping exceptions when we encounter them?

For starters, what you do not want to do is to manually change the type of the exceptions field (in our case ‘name‘) to object, existing field mappings cannot be updated. Changing the mapping would mean invalidating already indexed documents.

Instead, you should create a new index and then separate the ‘name‘ key to separate fields for strings and objects. E.g, assign objects to ‘name_object‘, and strings to ‘name‘. Thus, the mapping of the new index will have the two keys ‘name‘ and ‘name_object‘ which populate strings and objects respectively.

How we handle Mapping exceptions in Coralogix

Coralogix handles numeric Vs string mapping exceptions by dynamically adding a .numeric field to keys containing numbers that were previously mapped as a string.

In case Coralogix encounters a mapping exception it can’t handle (e.g Object Vs String, number Vs boolean, etc.), it stringifies the JSON and puts it under a newly generated key named ‘text‘. In either case, you won’t lose your data as a result of an exception.

In addition to that, Coralogix’s ingest engines add a ‘coralogix.failed_reason‘ key to any log that triggered a mapping exception so that our users can search, create alerts, or build visualizations to spot mapping exceptions and their root cause.

Each Coralogix client gets a predefined mapping exceptions dashboard that presents the number of logs that triggered mapping exceptions grouped by ‘coralogix.failed_reason‘ field, which describes the exception reason (find it under Kibana > dashboards > Mapping Exceptions Dashboard). Here are some examples of mapping exceptions message:

mapping exceptions failed reason table 1 Coralogix
mapping exceptions failed reason table 2 Coralogix

These are the actual messages ‘coralogix.failed_reason‘ key holds when a log input causes a mapping exception, and they tell us the story of what went wrong and of course, help us fix it.

Let’s investigate the following exception message: “object mapping for [results.data] tried to parse field [data] as object, but found a concrete value”.

From this message we can infer that ‘data’, which is a nested key within ‘results’, was initially mapped as an object and later arrived, numerous times, with a value of type string.

How can we fix the issue so the next log that will arrive with a string as the value for ‘data‘ won’t trigger an exception?

Coralogix offers a parsing rule engine. Among other useful parsing options, you can find the ‘Replace’ rule that enables the user to replace a given text, using REGEX, with a different one before the log is getting mapped and indexed in Elasticsearch.

What we would like to do is to use Regex and match any log entry that contains the ‘results.data field as an object (starts with an opening curly bracket) and replaces ‘data’ with’data_object’. Any similar occurrence from this point will assign the object to the new key ‘results.data_object‘.

This new key was created and added to the mapping of our index on the arrival of the first entry that matched after applying our rule. This is how our rule would look like in Coralogix:

  • REGEX: “data”s*:s*{
  • Replace To: “data_object”:{

If we are not sure of which rule to configure we can use the Coralogix ‘Logs’ screen to run some queries and see the valid logs and the logs which encountered a mapping exception.

In Coralogix:

coralogix resul.data-object search with no mapping conflict

Run the following query: _exists_:results.data, and you should retrieve logs with existing (meaning correctly mapped) results.data field, we can easily notice that all of these valid logs present a JSON object within results.data.

coralogix result.data-string-exception search

The query: _exists_:coralogix.failed_reason AND coralogix.failed_reason:”object mapping for [results.data] tried to parse field [data] as object, but found a concrete value”, will show us logs that encountered the specific mapping clash found in our Coralogix Mapping Exceptions Dashboard, confirming our analysis.

In Kibana:

coralogix result.data-string-exception-kibana search

Running the same query in Kibana will retrieve logs that failed to be mapped by Elasticsearch. We can see that the entire JSON payload of the log was stringified and stored into column text, verifying the fix action we should take.

Let’s take another example of an exception message: “failed to parse [response]”. We’ll encounter such exception in either of the following cases:

“response” field is of type string and the logs that triggered the exception include a JSON object in it.
Our replace rule:

  • REGEX: “data”s*:s*{
  • Replace To: “data_object”:{

“response” field is of type boolean and the logs that triggered the exception included a numeric value (0/1) or “false”/”true” in double-quotes (represents a string type in Elasticsearch), instead of the terms false/true which represent a Boolean type.

This case will require 2 rules for fixing each mapping exception type:
Our #1 replace rule:

  • REGEX: “data”s*:s*”false” Or “data”s*:s*0
  • Replace To: “data”s*:s*false

Our #2 replace rule:

  • REGEX: “data”s*:s*”true” Or “data”s*:s*1
  • Replace To: “data”s*:s*true

Create an alert for mapping exceptions

Mapping exceptions are reported in the field ‘coralogix.failed_reason’. An alert aggregation on this field will provide us the desired result. Since the mapping exception data is added at indexing time, the regular Coralogix Alerts mechanism would not have access to this information, therefore it could not be used to configure this type of alert. However, we could use Grafana alerts to achieve this goal.

I will show you how to configure your own Grafana Mapping Exceptions Dashboard, so you could use a Grafana alert for this. This tutorial assumes that Grafana has already been installed, and Coralogix defined as a Data Source in Grafana. For instructions on configuring the Grafana plugin, please refer to: https://coralogixstg.wpengine.com/tutorials/grafana-plugin/

In order to install the Dashboard, please follow these steps:

1. Download and decompress the Grafana Dashboard file: “Grafana – Mapping Exceptions Alerts Dashboard.json.zip”.

2. Go to Grafana and click the plus sign (+) / Create / Import.

3. Click “Upload JSON file” and select the “Grafana – Mapping Exceptions Alerts Dashboard.json”, followed by “Open”.

4. Please make sure the correct Data Source has been selected, then click “Import” (“Elasticsearch” in this example).

5. Please click “Save dashboard” on the top-right of the Grafana UI.

6. Enter an optional note describing your changes, followed by “Save”.

7. At this point, the Grafana Dashboard will be saved, and the alerting mechanism activated.

8. In order to review the alert settings, please click the bell icon on the left-side of the Grafana UI, followed by “Alerting” / “Alert rules”.

9. Click “Edit alert”.

10. Review the alert settings.

11. Click “Query” on the left, to Review the query settings.

That’s it! 🙂

At this point, the Admin must determine how to receive alert notifications when a Mapping Exception occurs. For this information, please refer to the Grafana alert notifications documentation. We sincerely hope that you found this document useful. Enjoy your Dashboard! 😎

To sum up

We should definitely take Mapping exceptions into consideration as they can cause data losses and dramatically reduce the power of Kibana and the Elasticsearch APIs. It is recommended to develop best practices across different engineering teams and team members so that logs are written correctly and consistently, or at the very least, choose the right logging solution that will give you tools to handle such issues and get your data on the right path.