[Live Webinar] Unlocking real-time AI Observability with Coralogix's AI Center

Register Now

Quick Start Observability for Amazon OpenSearch

thank you

Thank you!

We got your information.

Amazon OpenSearch
Amazon OpenSearch icon

Coralogix Extension For Amazon OpenSearch Includes:

Dashboards - 1

Gain instantaneous visualization of all your Amazon OpenSearch data.

Amazon Opensearch Service
Amazon Opensearch Service

Alerts - 10

Stay on top of Amazon OpenSearch key performance metrics. Keep everyone in the know with integration with Slack, PagerDuty and more.

OpenSearch Cluster Health Critical - Status Red

This alert is triggered when the OpenSearch cluster enters a red health status, which indicates that some of the primary shards are not allocated. This can severely impact the availability of the OpenSearch service. The alert monitors the overall health of the OpenSearch cluster and is activated when the health status is red for more than 1 minute. Customization Guidance: - Threshold: The default is to alert when the cluster health is red for 1 minute. This can be adjusted based on the criticality of the cluster. - Monitoring Period: This period can be increased for less critical systems where short-term unavailability might not be problematic. - Notification Frequency: Consider the frequency of this alert to optimize the balance between responsiveness and noise. Adjust according to the criticality of the cluster's uninterrupted operation Action: Investigate shard allocation issues, node failures, or insufficient resources. Address these issues promptly to restore the cluster to a healthy state.

OpenSearch Cluster Health Critical - Status Yellow

This alert is triggered when the OpenSearch cluster enters a yellow health status, which means that some replica shards are unallocated, but all primary shards are assigned. The system is operational, but the redundancy is compromised. The alert monitors the overall health of the OpenSearch cluster and is activated when the health status is yellow for more than 1 minute. Customization Guidance: - Threshold: The default is to alert when the cluster health is yellow for 1 minute. You can adjust the threshold based on how critical it is to maintain full redundancy - Monitoring Period: This period can be increased for less critical systems where short-term unavailability might not be problematic. - Notification Frequency: Consider the frequency of this alert to optimize the balance between responsiveness and noise. For less critical clusters, this period can be extended. Action: Review the replica shard assignment or node resource availability to restore full redundancy.

OpenSearch Node High Disk Usage

This alert is triggered when an OpenSearch node's disk usage exceeds 80%, which could lead to potential write failures and degraded performance. Monitoring high disk usage is essential to avoid unavailability or write issues caused by full disks. Customization Guidance: - Threshold: The default threshold is set to 80%. You can adjust the threshold based on how conservative you want to be with disk space. - Monitoring Period: The default period is 10 minutes. For busy clusters, this could be reduced to detect issues sooner - Notification Frequency: Consider the frequency of this alert to optimize the balance between responsiveness and noise. For less critical clusters, this period can be extended. Action: Investigate the nodes with high disk usage and consider expanding the disk capacity or rebalancing shards to other nodes.

OpenSearch Cluster Nodes Down

This alert triggers when the number of available nodes in the OpenSearch cluster drops below the expected count, which can impact cluster performance and shard distribution. Monitoring node availability ensures the stability and redundancy of the OpenSearch cluster. Customization Guidance: - Threshold: Adjust the threshold based on your cluster’s criticality. By default, the alert triggers when any nodes are down. - Monitoring Period: Adjust the threshold based on your cluster’s criticality. By default, the alert triggers when any nodes are down. - Notification Frequency: Consider the frequency of this alert to optimize the balance between responsiveness and noise. For less critical clusters, this period can be extended. Action: Investigate the reason for node failure (e.g., resource exhaustion, network issues) and take steps to restore node availability.

OpenSearch Shard Allocation Failures

This alert triggers when OpenSearch fails to allocate one or more shards, which could lead to degraded performance or unavailability of data. Monitoring shard allocation failures helps in detecting and mitigating issues with resource allocation or cluster configuration. Customization Guidance: - Threshold: By default, this alert triggers for any shard allocation failure. Adjust as needed based on the impact of missing shards on your system. - Monitoring Period: Adjust the threshold based on your cluster’s criticality. By default, the alert triggers when any shard allocation failure - Notification Frequency: Consider the frequency of this alert to optimize the balance between responsiveness and noise. For less critical clusters, this period can be extended. Action: Review the allocation rules and resources available to the cluster to resolve shard allocation failures.

OpenSearch High JVM Heap Usage

This alert triggers when the JVM heap memory usage on an OpenSearch node exceeds 80%, potentially leading to garbage collection overhead and degraded performance. Monitoring heap usage helps ensure that OpenSearch nodes are not overloaded, which could lead to memory-related failures. Customization Guidance: - Threshold: The default threshold is set to 80%. For performance-critical clusters, you may want to set this lower. - Monitoring Period: The default period is 10 minutes. For busy clusters, this could be reduced to detect issues sooner - Notification Frequency: Consider the frequency of this alert to optimize the balance between responsiveness and noise. For less critical clusters, this period can be extended. Action: Investigate memory usage on the affected node. Consider increasing the JVM heap size or optimizing queries and shard allocation.

OpenSearch Cluster Relocations

This alert triggers when OpenSearch is relocating shards between nodes, which can be an indication of cluster rebalancing or resource constraints. Monitoring relocations helps ensure that nodes are balanced and operating efficiently. Customization Guidance: - Threshold: This alert triggers for any shard relocations. Adjust based on how much rebalancing your system normally experiences. - Monitoring Period: Adjust the threshold based on your cluster’s criticality. By default, the alert triggers when any shard allocation failure - Notification Frequency: Consider the frequency of this alert to optimize the balance between responsiveness and noise. For less critical clusters, this period can be extended. Action: Review the shard allocation and cluster configuration to optimize for balance and performance.

OpenSearch Unassigned Shards

This alert is triggered when OpenSearch has unassigned shards, which could impact the availability of data and the performance of the cluster. Monitoring unassigned shards helps maintain the stability and reliability of the OpenSearch cluster. Customization Guidance: - Threshold: This alert triggers for any unassigned shards. Adjust based on your tolerance for incomplete redundancy - Monitoring Period: Adjust the threshold based on your cluster’s criticality. By default, the alert triggers when any shard allocation failure - Notification Frequency: Consider the frequency of this alert to optimize the balance between responsiveness and noise. For less critical clusters, this period can be extended. Action: Investigate the cause of unassigned shards, such as node failures or resource shortages, and reallocate or adjust resources accordingly.

OpenSearch Cluster High CPU Usage

This alert is triggered when the CPU usage on an OpenSearch node exceeds 80%, which could lead to degraded query performance and slower indexing operations. Monitoring CPU usage is important to detect resource exhaustion and to ensure optimal performance. Customization Guidance: - Threshold: The default threshold is set to 80%. Adjust based on the workload and capacity of your cluster. - Monitoring Period: The default period is 10 minutes. For busy clusters, this could be reduced to detect issues sooner - Notification Frequency: Consider the frequency of this alert to optimize the balance between responsiveness and noise. For less critical clusters, this period can be extended. Action: Investigate the queries or tasks consuming excessive CPU and consider optimizing cluster performance or adding more nodes.

OpenSearch Cluster High Memory Usage

This alert is triggered when an OpenSearch node's memory usage exceeds 80%, which could result in swapping or node failures. Monitoring memory usage ensures that the cluster is operating within its resource limits and avoids performance degradation. Customization Guidance: - Threshold: The default threshold is set to 90% by default. Adjust based on the memory demands of your cluster. - Monitoring Period: The default period is 10 minutes. For busy clusters, this could be reduced to detect issues sooner - Notification Frequency: Consider the frequency of this alert to optimize the balance between responsiveness and noise. For less critical clusters, this period can be extended. Action: Increase the memory allocation to nodes or rebalance the workload to prevent overuse.

Integration

Learn more about Coralogix's out-of-the-box integration with Amazon OpenSearch in our documentation.

Read More
Schedule Demo

Enterprise-Grade Solution