Quick Start Observability for AWS EMR
Thank you!
We got your information.
Coralogix Extension For AWS EMR Includes:
Dashboards - 1
Gain instantaneous visualization of all your AWS EMR data.
Alerts - 7
Stay on top of AWS EMR key performance metrics. Keep everyone in the know with integration with Slack, PagerDuty and more.
HDFS utilization >80%
This alert triggers when the HDFS utilization in AWS EMR exceeds 80% for a continuous period of 10 minutes. Customization Guidance: Threshold: The default threshold is set at 80%. Adjust this threshold based on your application's tolerance for HDFS utilization and historical data. Cluster Specificity: Customize alerts for different clusters based on their criticality in your infrastructure. Notification Frequency: Optimize the frequency to balance responsiveness with the potential for alert fatigue. Action: Investigate the cause of high HDFS utilization by reviewing recent jobs, checking for large file uploads, and optimizing HDFS storage usage.
Percentage of running nodes in the multi-master instance group <90%
This alert triggers when the percentage of running nodes in the multi-master instance group falls below 90% for a continuous period of 10 minutes. Customization Guidance: Threshold: The default threshold is set at 90%. Adjust this threshold based on your application's tolerance for running nodes and historical data. Instance Group Specificity: Customize alerts for different instance groups based on their roles in your infrastructure. Notification Frequency: Optimize the frequency to balance responsiveness with the potential for alert fatigue. Action: Investigate the cause of the low percentage of running nodes, check for instance failures, and ensure that the instance group is correctly configured and scaled.
Remaining capacity falls below 10 GB
This alert triggers when the remaining capacity in AWS EMR falls below 10 GB for a continuous period of 10 minutes. Customization Guidance: Threshold: The default threshold is set at 10 GB. Adjust this threshold based on your application's tolerance for remaining capacity and historical data. Cluster Specificity: Customize alerts for different clusters based on their criticality in your infrastructure. Notification Frequency: Optimize the frequency to balance responsiveness with the potential for alert fatigue. Action: Investigate the cause of low remaining capacity, check for large jobs or data transfers, and optimize resource usage in the cluster.
Available memory falls below 1000 MB
This alert triggers when the available memory in AWS EMR falls below 1000 MB for a continuous period of 10 minutes. Customization Guidance: Threshold: The default threshold is set at 1000 MB. Adjust this threshold based on your application's tolerance for available memory and historical data. Cluster Specificity: Customize alerts for different clusters based on their roles in your infrastructure. Notification Frequency: Optimize the frequency to balance responsiveness with the potential for alert fatigue. Action: Investigate the cause of low available memory, check for memory-intensive jobs, and optimize memory usage in the cluster.
Number of missing blocks in the cluster
This alert triggers when the number of missing blocks in the AWS EMR cluster exceeds 0 for a continuous period of 10 minutes. Customization Guidance: Threshold: The default threshold is set at 0 missing blocks. Adjust this threshold based on your application's tolerance for missing blocks and historical data. Cluster Specificity: Customize alerts for different clusters based on their roles in your infrastructure. Notification Frequency: Optimize the frequency to balance responsiveness with the potential for alert fatigue. Action: Investigate the cause of missing blocks, check for disk failures, and ensure that data replication is correctly configured.
Total load is high
This alert triggers when the total load in the AWS EMR cluster exceeds 75% for a continuous period of 10 minutes. Customization Guidance: Threshold: The default threshold is set at 75%. Adjust this threshold based on your application's tolerance for total load and historical data. Cluster Specificity: Customize alerts for different clusters based on their roles in your infrastructure. Notification Frequency: Optimize the frequency to balance responsiveness with the potential for alert fatigue. Action: Investigate the cause of high total load, check for resource-intensive jobs, and optimize resource usage in the cluster.
Auto termination is cluster idle is high
This alert triggers when the auto termination threshold for an idle cluster in AWS EMR is high, indicating that the cluster is idle for a prolonged period. Customization Guidance: Threshold: The default threshold is set based on your specific requirements for idle cluster termination. Adjust this threshold based on your application's tolerance for idle clusters and cost management strategies. Cluster Specificity: Customize alerts for different clusters based on their roles in your infrastructure. Notification Frequency: Optimize the frequency to balance responsiveness with the potential for alert fatigue. Action: Investigate the cause of the idle cluster, check for scheduled tasks, and review the cluster's usage patterns to optimize cost management.
Integration
Learn more about Coralogix's out-of-the-box integration with AWS EMR in our documentation.