8 Common Elasticsearch Configuration Mistakes That You Might Have Made

Elasticsearch was designed to allow its users to get up and running quickly, without having to understand all of its inner workings. However, more often than not, it’s only a matter of time before you run into configuration troubles.
Elasticsearch is open-source software that indexes and stores information in a NoSQL database and is based on the Lucene search engine. Elasticsearch is also part of the ELK Stack. Despite its increasing popularity, there are several common and critical mistakes that users tend to make while using the software.
Below are the most common Elasticsearch mistakes when setting up and running an Elasticsearch instance and how you can avoid making them.
1. Elasticsearch bootstrap checks failed
Bootstrap checks inspect various settings and configurations before Elasticsearch starts to make sure it will operate safely. If bootstrap checks fail, they can prevent Elasticsearch from starting if you are in production mode or issue warning logs in development mode. Familiarize yourself with the settings enforced by bootstrap checks, noting that they are different in development and production modes. By setting the system property of ‘enforce bootstrap checks’ to true, you can avoid bootstrap checks altogether.
2. Oversized templating
Large templates are directly related to large mappings. In other words, if you create a large mapping for Elasticsearch, you will have issues with syncing it across your nodes in the cluster, even if you apply them as an index template.
The issues with big index templates are mainly practical. You might need to do a lot of manual work with the developer as a single point of failure. It can also relate to Elasticsearch itself. You will always need to remember to update your template when you make changes to your data model.
Solution
A solution to consider is the use of dynamic templates. Dynamic templates can automatically add field mappings based on your predefined mappings for specific types and names. However, you should always try to keep your templates small in size.
3. Elasticsearch configuration for capacity provisioning
Provisioning can help to equip and optimize Elasticsearch for operational performance. Elasticsearch is designed in such a way that will keep nodes up, stop memory from growing out of control, and prevent unexpected actions from shutting down nodes. However, with inadequate resources, there are no optimizations that will save you.
Solution
Ask yourself: ‘How much space do you need?’ You should first simulate your use-case. Boot up your nodes, fill them with real documents, and push them until the shard breaks. You can then start defining a shard’s capacity and apply it throughout your entire index.
It’s important to understand resource utilization during the testing process. This allows you to reserve the proper amount of RAM for nodes, configure your JVM heap space, configure your CPU capacity, provision through scaling larger instances with potentially more nodes, and optimize your overall testing process.
4. Not defining Elasticsearch configuration mappings
Elasticsearch relies on mapping, also known as schema definitions, to handle data properly according to its correct data type. In Elasticsearch, mapping defines the fields in a document and specifies their corresponding data types, such as date, long, and string.
In cases where an indexed document contains a new field without a defined data type, Elasticsearch uses dynamic mapping to estimate the field’s type, converting it from one type to another when necessary. While this may seem ideal, Elasticsearch mappings are not always accurate. If, for example, you choose the wrong field type, then indexing errors will pop up.
Solution
To fix this issue, you should define mappings, especially in production-based environments. It’s a best practice to index several documents, let Elasticsearch guess the field, and then grab the mapping it creates. You can then make any appropriate changes that you see fit without leaving anything up to chance.
5. Combinable data ‘explosions’
Combinable Data Explosions are computing problems that can cause an exponential growth in bucket generation for certain aggregations and can lead to uncontrolled memory usage. Elasticsearch’s ‘terms’ field builds buckets according to your data, but it cannot predict how many buckets will be created in advance. This can be problematic for parent aggregations that are made up of more than one child aggregation.
Solution
Collection modes can be used to help to control how child aggregations perform. The default collection mode of an aggregation is called ‘depth-first’. A depth-first collection mode will first build a data tree and then trim the edges. Elasticsearch will allow you to change collection modes in specific aggregations to something more appropriate such as ‘breadth-first’. This collection mode helps build and trims the tree one level at a time to control combinable data explosions.
6. Search timeout errors
If you don’t receive an Elasticsearch response within the specified search period, the request fails and returns an error message. This is called a search timeout. Search timeouts are common and can occur for many reasons, such as large datasets or memory-intensive queries.
Solution
To eliminate search timeouts, you can increase the Elasticsearch Request Timeout configuration, reduce the number of documents returned per request, reduce the time range, tweak your memory settings, and optimize your query, indices, and shards. You can also enable slow search logs to monitor search run time and scan for heavy searches.
7. Process memory locking failure
As memory runs out in your JVM, it will begin to use swap space on the disk. This has a devastating impact on the performance of your Elasticsearch cluster.
Solution
The simplest option is to disable swapping. You can do this by setting the bootstrap memory lock to true. You should also ensure that you’ve set up memory locking correctly by consulting the Elasticsearch configuration documentation.
8. Shards are failing
When searching in Elasticsearch, you may encounter ‘shards failure’ error messages. This happens when a read request fails to get a response from a shard. This can happen if the data is not yet searchable because the cluster or node is still in an initial start process, or when the shard is missing, or in recovery mode and the cluster is red.
Solution
To ensure better management of shards, especially when dealing with future growth, you are better off reindexing the data and specifying more primary shards in newly created indexes. To optimize your use case for indexing, make sure you designate enough primary shards so that you can spread the indexing load evenly across all of your nodes. You can also factor disabling merge throttling, increasing the size of the indexing buffer, and refresh less frequently by increasing the refresh interval.
Summary
When set up, configured, and managed correctly, Elasticsearch is a fully compliant distributed full-text search, and analytics engine. It enables multiple tenants to search through their entire data sets, regardless of size, at unprecedented speeds. Elasticsearch also doubles as an analytics system and distributed database. While these capabilities are impressive on their own, Elasticsearch combines all of them to form a real-time search and analytics application that can keep up with customer needs.
Errors, exceptions, and mistakes arise while operating Elasticsearch. To avoid them, pay close attention to initial setup and configuration and be particularly mindful when indexing new information. You should have strong monitoring and observability in your system, which is the first basic component of quickly and efficiently getting to the root of complex problems like cluster slowness. Instead of fearing their appearance, you can treat errors, exceptions, or mistakes, as an opportunity to optimize your Elasticsearch infrastructure.