[Live Webinar] The Gold Standard of K8s Observability.

Register today!

Quick Start Observability for Elasticsearch

thank you

Thank you!

We got your information.

Elasticsearch
Elasticsearch icon

Coralogix Extension For Elasticsearch Includes:

Dashboards - 1

Gain instantaneous visualization of all your Elasticsearch data.

Elasticsearch
Elasticsearch

Alerts - 11

Stay on top of Elasticsearch key performance metrics. Keep everyone in the know with integration with Slack, PagerDuty and more.

Elasticsearch Cluster Health Critical - Status Red

This alert is triggered when the Elasticsearch cluster enters a red health status, which indicates that some of the primary shards are not allocated. This can severely impact the availability of the Elasticsearch service. The alert monitors the overall health of the Elasticsearch cluster and is activated when the health status is red for more than 1 minute. Customization Guidance: - Threshold: The default is to alert when the cluster health is red for 1 minute. This can be adjusted based on the criticality of the cluster. - Monitoring Period: This period can be increased for less critical systems where short-term unavailability might not be problematic. - Notification Frequency: Consider the frequency of this alert to optimize the balance between responsiveness and noise. Adjust according to the criticality of the cluster's uninterrupted operation Action: Investigate shard allocation issues, node failures, or insufficient resources. Address these issues promptly to restore the cluster to a healthy state.

Elasticsearch Cluster Health Critical - Status Yellow

This alert is triggered when the Elasticsearch cluster enters a yellow health status, which means that some replica shards are unallocated, but all primary shards are assigned. The system is operational, but the redundancy is compromised. The alert monitors the overall health of the Elasticsearch cluster and is activated when the health status is yellow for more than 1 minute. Customization Guidance: - Threshold: The default is to alert when the cluster health is yellow for 1 minute. You can adjust the threshold based on how critical it is to maintain full redundancy - Monitoring Period: This period can be increased for less critical systems where short-term unavailability might not be problematic. - Notification Frequency: Consider the frequency of this alert to optimize the balance between responsiveness and noise. For less critical clusters, this period can be extended. Action: Review the replica shard assignment or node resource availability to restore full redundancy.

Elasticsearch Node High Disk Usage

This alert is triggered when an Elasticsearch node's disk usage exceeds 80%, which could lead to potential write failures and degraded performance. Monitoring high disk usage is essential to avoid unavailability or write issues caused by full disks. Customization Guidance: - Threshold: The default threshold is set to 80%. You can adjust the threshold based on how conservative you want to be with disk space. - Monitoring Period: The default period is 10 minutes. For busy clusters, this could be reduced to detect issues sooner - Notification Frequency: Consider the frequency of this alert to optimize the balance between responsiveness and noise. For less critical clusters, this period can be extended. Action: Investigate the nodes with high disk usage and consider expanding the disk capacity or rebalancing shards to other nodes.

Elasticsearch Cluster Nodes Down

This alert triggers when the number of available nodes in the Elasticsearch cluster drops below the expected count, which can impact cluster performance and shard distribution. Monitoring node availability ensures the stability and redundancy of the Elasticsearch cluster. Customization Guidance: - Threshold: Adjust the threshold based on your cluster’s criticality. By default, the alert triggers when any nodes are down. - Monitoring Period: Adjust the threshold based on your cluster’s criticality. By default, the alert triggers when any nodes are down. - Notification Frequency: Consider the frequency of this alert to optimize the balance between responsiveness and noise. For less critical clusters, this period can be extended. Action: Investigate the reason for node failure (e.g., resource exhaustion, network issues) and take steps to restore node availability.

Elasticsearch Shard Allocation Failures

This alert triggers when Elasticsearch fails to allocate one or more shards, which could lead to degraded performance or unavailability of data. Monitoring shard allocation failures helps in detecting and mitigating issues with resource allocation or cluster configuration. Customization Guidance: - Threshold: By default, this alert triggers for any shard allocation failure. Adjust as needed based on the impact of missing shards on your system. - Monitoring Period: Adjust the threshold based on your cluster’s criticality. By default, the alert triggers when any shard allocation failure - Notification Frequency: Consider the frequency of this alert to optimize the balance between responsiveness and noise. For less critical clusters, this period can be extended. Action: Review the allocation rules and resources available to the cluster to resolve shard allocation failures.

Elasticsearch High JVM Heap Usage

This alert triggers when the JVM heap memory usage on an Elasticsearch node exceeds 80%, potentially leading to garbage collection overhead and degraded performance. Monitoring heap usage helps ensure that Elasticsearch nodes are not overloaded, which could lead to memory-related failures. Customization Guidance: - Threshold: The default threshold is set to 80%. For performance-critical clusters, you may want to set this lower. - Monitoring Period: The default period is 10 minutes. For busy clusters, this could be reduced to detect issues sooner - Notification Frequency: Consider the frequency of this alert to optimize the balance between responsiveness and noise. For less critical clusters, this period can be extended. Action: Investigate memory usage on the affected node. Consider increasing the JVM heap size or optimizing queries and shard allocation.

Elasticsearch High Pending Tasks

This alert is triggered when Elasticsearch has a backlog of pending tasks, which can indicate that the cluster is overloaded or under-resourced. Monitoring pending tasks is critical to understanding the workload and capacity limits of the Elasticsearch cluster. Customization Guidance: - Threshold: The default threshold is 10 pending tasks. Adjust this based on the expected load of your system. - Monitoring Period: The default period is 10 minutes. For highly active clusters, shorter periods may be more appropriate - Notification Frequency: Consider the frequency of this alert to optimize the balance between responsiveness and noise. For less critical clusters, this period can be extended. Action: Investigate the source of the backlog, optimize the cluster configuration, or add more resources.

Elasticsearch Cluster Relocations

This alert triggers when Elasticsearch is relocating shards between nodes, which can be an indication of cluster rebalancing or resource constraints. Monitoring relocations helps ensure that nodes are balanced and operating efficiently. Customization Guidance: - Threshold: This alert triggers for any shard relocations. Adjust based on how much rebalancing your system normally experiences. - Monitoring Period: Adjust the threshold based on your cluster’s criticality. By default, the alert triggers when any shard allocation failure - Notification Frequency: Consider the frequency of this alert to optimize the balance between responsiveness and noise. For less critical clusters, this period can be extended. Action: Review the shard allocation and cluster configuration to optimize for balance and performance.

Elasticsearch Unassigned Shards

This alert is triggered when Elasticsearch has unassigned shards, which could impact the availability of data and the performance of the cluster. Monitoring unassigned shards helps maintain the stability and reliability of the Elasticsearch cluster. Customization Guidance: - Threshold: This alert triggers for any unassigned shards. Adjust based on your tolerance for incomplete redundancy - Monitoring Period: Adjust the threshold based on your cluster’s criticality. By default, the alert triggers when any shard allocation failure - Notification Frequency: Consider the frequency of this alert to optimize the balance between responsiveness and noise. For less critical clusters, this period can be extended. Action: Investigate the cause of unassigned shards, such as node failures or resource shortages, and reallocate or adjust resources accordingly.

Elasticsearch Cluster High CPU Usage

This alert is triggered when the CPU usage on an Elasticsearch node exceeds 80%, which could lead to degraded query performance and slower indexing operations. Monitoring CPU usage is important to detect resource exhaustion and to ensure optimal performance. Customization Guidance: - Threshold: The default threshold is set to 80%. Adjust based on the workload and capacity of your cluster. - Monitoring Period: The default period is 10 minutes. For busy clusters, this could be reduced to detect issues sooner - Notification Frequency: Consider the frequency of this alert to optimize the balance between responsiveness and noise. For less critical clusters, this period can be extended. Action: Investigate the queries or tasks consuming excessive CPU and consider optimizing cluster performance or adding more nodes.

Elasticsearch Cluster High Memory Usage

This alert is triggered when an Elasticsearch node's memory usage exceeds 80%, which could result in swapping or node failures. Monitoring memory usage ensures that the cluster is operating within its resource limits and avoids performance degradation. Customization Guidance: - Threshold: The default threshold is set to 90% by default. Adjust based on the memory demands of your cluster. - Monitoring Period: The default period is 10 minutes. For busy clusters, this could be reduced to detect issues sooner - Notification Frequency: Consider the frequency of this alert to optimize the balance between responsiveness and noise. For less critical clusters, this period can be extended. Action: Increase the memory allocation to nodes or rebalance the workload to prevent overuse.

Integration

Learn more about Coralogix's out-of-the-box integration with Elasticsearch in our documentation.

Read More
Schedule Demo

Enterprise-Grade Solution