Quick Start Observability for Google GKE
Thank you!
We got your information.
Coralogix Extension For Google GKE Includes:
Dashboards - 1
Gain instantaneous visualization of all your Google GKE data.
Alerts - 3
Stay on top of Google GKE key performance metrics. Keep everyone in the know with integration with Slack, PagerDuty and more.
High CPU utilization >90%
This alert monitors the CPU limit utilization ratio for Kubernetes containers across your Google environments. It is triggered when the summed CPU limit utilization ratio exceeds 90% for a continuous period of 10 minutes, indicating potential over-utilization of CPU resources that may affect performance and availability of your applications. Customization Guidance: Threshold: The default threshold is set at 90%. Depending on the specific requirements of your workloads and historical performance data, you might need to adjust this threshold. For CPU-intensive applications that require more headroom, a lower threshold might be necessary to prevent resource contention and ensure smooth operation. Scope Specificity: Customize the alert for different namespaces, clusters, projects, and locations based on their roles and importance within your infrastructure. Critical environments that support high-priority applications may require more stringent monitoring to avoid performance degradation. Notification Frequency: Adjust the frequency of notifications to strike a balance between being responsive and avoiding alert fatigue. Consider the criticality of the services hosted in the affected namespaces, clusters, and projects. For high-importance environments, more frequent notifications may be justified to enable swift responses to potential issues. Action: When this alert is triggered, immediate steps should include reviewing the CPU utilization metrics in the affected namespaces, clusters, and projects. Check for any ongoing CPU-intensive processes or applications that may be driving high utilization. Consider scaling up resources or redistributing workloads to mitigate high CPU usage. Review resource requests and limits to ensure they are appropriately set to prevent over-utilization.
Low available ephemeral storage <10 GB
This alert monitors the available ephemeral storage for Kubernetes nodes across your Google environments. It is triggered when the available ephemeral storage falls below 10 GB for a continuous period of 10 minutes, indicating potential storage exhaustion that could disrupt node operations and affect application performance. Customization Guidance: Threshold: The default threshold is set to alert when the available ephemeral storage falls below 10 GB. Depending on the specific requirements of your workloads and historical usage data, you may need to adjust this threshold. For nodes handling storage-intensive applications, a higher threshold might be required to ensure they have sufficient headroom to prevent storage-related issues. Scope Specificity: Customize the alert for different nodes, clusters, projects, and locations based on their roles and importance within your infrastructure. Critical nodes that support high-priority applications may require more stringent monitoring to ensure they maintain sufficient storage to avoid performance degradation. Notification Frequency: Adjust the frequency of notifications to balance responsiveness and alert fatigue. Consider the criticality of the applications and services hosted on the affected nodes. For high-importance environments, more frequent notifications may be necessary to enable swift responses to potential storage issues. Action: When this alert is triggered, immediate actions should include reviewing the ephemeral storage usage on the affected nodes. Identify and remove any unnecessary files or data to free up space. Consider increasing the ephemeral storage allocation or redistributing workloads to nodes with sufficient storage capacity. Regularly review storage policies and ensure that resource limits and requests are appropriately set to prevent storage over-utilization.
Pod restart frequency
This alert monitors the restart frequency of Kubernetes pods across your Google environments. It is triggered when a pod restarts more than 3 times within a 5-minute window, indicating potential instability or configuration issues that could affect the availability and reliability of your applications. Customization Guidance: Threshold: The default threshold is set to trigger the alert when a pod restarts more than 3 times in a 5-minute period. Depending on the tolerance levels of your specific applications and operational requirements, you may need to adjust this threshold. For critical applications that require high availability and minimal disruptions, a lower threshold might be necessary to promptly detect and address potential issues. Scope Specificity: Customize the alert for different namespaces, clusters, projects, and locations based on their roles and criticality within your infrastructure. Pods supporting high-priority or mission-critical applications may require more stringent monitoring to avoid performance degradation and ensure consistent service availability. Notification Frequency: Adjust the notification frequency to balance between responsiveness and avoiding alert fatigue. Consider the importance of the services hosted in the affected namespaces, clusters, and projects. For high-importance environments, more frequent notifications might be justified to enable quick responses to recurring pod restarts. Action: When this alert is triggered, immediate actions should include investigating the affected pod's logs and events to identify the cause of the restarts. Check for any misconfigurations, resource constraints, or application errors that may be contributing to the frequent restarts. Consider scaling up resources, updating configurations, or deploying fixes to stabilize the pod. Regularly review and optimize your pod configurations to prevent future occurrences.
Integration
Learn more about Coralogix's out-of-the-box integration with Google GKE in our documentation.