Elasticsearch Configurations and 9 Common Mistakes You Should Avoid

The default Elasticsearch setup is pretty good. You can get running from scratch to use as a simple log storage solution. However, as soon as you begin to rely on your default cluster, problems will inevitably appear.

Configuring elasticsearch is complex, so we’ve compiled useful configurations and the most common mistakes. Armed with these key tips, you’ll be able to take control of your Elasticsearch cluster and ship your logs with confidence.

Key Elasticsearch Configurations To Apply To Your Cluster

Get your naming right

Make sure your nodes are in the same cluster!

Nodes in the same cluster share the same cluster.name. Out of the box, your nodes will have the name “elasticsearch”. The cluster name is configured in the elasticsearch.yml file specific to each environment. 

cluster.name: my-custom-cluster-name

TIP: Be careful with sharing cluster names across environments. You may end up with nodes in the wrong cluster!

Wait, which node is ‘ec2-100.64.0.0-xhdtd’ again? 

When you spin up a new EC2 instance in your cloud provider, it often has a long and unmemorable name. Naming your nodes allows you to give them a meaningful identifier. 

By default, the node name will be the hostname of the machine. You can configure the elasticsearch.yml to set the node name to an environment variable.

node.name: my-node-123

Write to a safe location!

By default, Elasticsearch will write data to folders within $ES_HOME. The risk here is, during an upgrade, these paths may be overwritten.

In a production environment, it is strongly recommended you set the path.data and path.logs in elasticsearch.yml to locations outside of $ES_HOME.

TIP: The path.data setting can be set to multiple paths. 

Make sure your network configurations are rock solid

The network.host property sets both the bind host and the publish host. For those of you who aren’t familiar with these terms, they’re quite straightforward. The bind host is what Elasticsearch listens to for requests. The publish host, or IP address, is what Elasticsearch will use to communicate with other nodes.

TIP: You can put in some special values into this field, such as  _local_, _site_ and modifiers like :ip4.

Be prepared for some start up issues!

As soon as you set a value for network.host, you’re signaling to Elasticsearch that you’re now ready for a production setup. A number of system startup checks graduate from warnings to exceptions. This may introduce some teething issues as you get started.

Tell your nodes who is who with discovery settings

Elasticsearch nodes need to be able to find out about other nodes in the cluster. They also need to be able to determine who will be the master node. We can do this with a few discovery settings.

‘discovery.seed_hosts’ declares who can be a master node

The discovery.seed_hosts provides a list of nodes that can become a master node.

TIP: Each item should be formatted as host:port or host on its own. If you don’t specify a port, it will default to the value of transport.profiles.default.port

What if you’re starting a brand new cluster?

When starting an Elasticsearch cluster for the very first time, use the cluster.initial_master_nodes setting. When starting a new cluster in production mode, you must explicitly list the nodes that can become master nodes.  Under the hood, you’re telling Elasticsearch that these nodes are permitted to vote for a leader.

Check your JVM Configuration

Elasticsearch is underpinned by the Java Virtual Machine (JVM). It is easy to forget about this critical component, but when you’re running a production Elasticsearch cluster, you need to have a working understanding of the JVM.

Allocating Heap Space

The size of your JVM heap determines how much memory your JVM has to work with.

Give your JVM Heap some padding, beyond what you’re expecting to use, but not too much. A large JVM Heap can run into problems with long garbage collection pauses, which can dramatically impact your cluster performance.

At most, your Xms and Xmx values should be 50% of your RAM. Elasticsearch requires memory for purposes other than the JVM heap and it is important to leave space for this. On top of this, don’t set your heap to greater than 32 GB, otherwise you lose the benefit of compressed object pointers.

The 9 Most Common Elasticsearch Configuration Mistakes

To follow are nine of the most common Elasticsearch configuration mistakes when setting up and running an Elasticsearch instance and how you can avoid making them.

1. Elasticsearch Bootstrap Checks Preventing Startup

Bootstrap checks inspect various settings and configurations before Elasticsearch starts to make sure it will operate safely. If bootstrap checks fail, they can prevent Elasticsearch from starting in production mode or issue warning logs in development mode. It is recommended to familiarize yourself with the settings enforced by bootstrap checks, noting that they are different in development and production modes. 

By setting the system property of ‘enforce bootstrap checks’ to true, you can avoid bootstrap checks altogether.

2. Oversized Templating

Large templates are directly related to large mappings. Large mappings create syncing issues in your cluster.

One solution is dynamic templates. Dynamic templates can automatically add field mappings based on your predefined mappings for specific types and names. However, you should always try to keep your templates small in size. 

3. Elasticsearch Configuration for Capacity Provisioning

Provisioning can help to equip and optimize Elasticsearch for operational performance. The question that needs to be asked is ‘How much space do you need?’ You should first simulate your use-case. This can be done by booting up your nodes, filling them with real documents, and pushing them until the shard breaks. You can then start defining a shard’s capacity and apply it throughout your entire index. 

It’s important to understand resource utilization during the testing process. This allows you to reserve the proper amount of RAM for nodes, configure your JVM heap space, configure your CPU capacity, provision through scaling larger instances with potentially more nodes, and optimize your overall testing process. 

4. Not Defining Elasticsearch Mappings

Elasticsearch relies on mapping, also known as schema definitions, to handle data properly according to its correct data type. In Elasticsearch, mapping defines the fields in a document and specifies their corresponding data types, such as date, long, and string. 

In cases where an indexed document contains a new field without a defined data type, Elasticsearch uses dynamic mapping to estimate the field’s type, converting it from one type to another when necessary. 

You should define mappings, especially in production-based environments. It’s a best practice to index several documents, let Elasticsearch guess the field, and then grab the mapping it creates. You can then make any appropriate changes that you see fit without leaving anything up to chance.

5. Combinable Data ‘Explosions’

Combinable Data Explosions are computing problems that can cause an exponential growth in bucket generation for certain aggregations and can lead to uncontrolled memory usage. Elasticsearch’s ‘terms’ field builds buckets according to your data, but it cannot predict how many buckets will be created in advance. This can be problematic for parent aggregations that are made up of more than one child aggregation.

To overcome this, collection modes can be used to help to control how child aggregations perform. By default, Elasticsearch uses ‘depth-first’ aggregation, however you can also use the breadth-first collection mode.

6. Search Timeout Errors

Search timeouts are common and can occur for many reasons, such as large datasets or memory-intensive queries.

To eliminate search timeouts, you can increase the Elasticsearch Request Timeout configuration, reduce the number of documents returned per request, reduce the time range, tweak your memory settings, and optimize your query, indices, and shards. You can also enable slow search logs to monitor search run time and scan for heavy searches.

7. Process Memory Locking Failure

To ensure nodes remain healthy in the cluster, you must ensure that none of the JVM memory is ever swapped out to disk. You can do this by setting the bootstrap memory lock to true. You should also ensure that you’ve set up memory locking correctly by consulting the Elasticsearch configuration documentation. 

8. Shards are Failing

When searching in Elasticsearch, you may encounter ‘shards failure’ error messages. This happens when a read request fails to get a response from a shard. This can happen if the data is not yet searchable because the cluster or node is still in an initial start process, or when the shard is missing, or in recovery mode and the cluster is red.

To ensure better management of shards, especially when dealing with future growth, you are better off reindexing the data and specifying more primary shards in newly created indexes. To optimize your use case for indexing, make sure you designate enough primary shards so that you can spread the indexing load evenly across all of your nodes. You can also factor disabling merge throttling, increasing the size of the indexing buffer, and refresh less frequently by increasing the refresh interval.

9. Production Fine Tuning

By default, the first cluster that Elasticsearch starts is called ‘elasticsearch’. If you are unsure about how to change an Elasticsearch configuration, it’s best to stick to the default. However, it’s good practice to rename your production cluster to prevent unwanted nodes from joining your cluster.

Any applied changes result in recovery settings affecting how nodes recover when clusters restart. Elasticsearch allows nodes that belong to the same cluster to join that cluster automatically whenever a recovery occurs. Some nodes within a cluster boot up quickly after recovery. However, others may take a bit longer at times.

It is important to configure the number of nodes that will be in each cluster, as well as the amount of time that it will take for them to boot up in Elasticsearch.