[Workshop Alert] Dynamic Scoring for WAF Actions and CloudFront Traffic - Save Your Seat Now!

Top 6 Kubernetes Metrics & How to Monitor Them

  • 8 min read

What Are Kubernetes Metrics?

Kubernetes metrics are data points that provide insights into the performance and health of Kubernetes clusters. They help administrators understand the availability, utilization, and overall operation of resources spanning nodes, pods, and containers managed within a Kubernetes environment. 

Analyzing these metrics allows for optimized resource allocation and improved system performance. Tools like Prometheus and Grafama are commonly used for monitoring these metrics in real-time.

Metrics collected from Kubernetes deployments include CPU usage, memory consumption, network IO, and storage details. These metrics are useful for maintaining cluster health by identifying bottlenecks, ensuring efficient resource usage, and triggering scaling actions. 

This is part of a series of articles about Kubernetes Monitoring.

In this article, you will learn:

Why Is Monitoring Kubernetes So Important?

Monitoring Kubernetes ensures that the clusters operate efficiently and remain healthy. Continuous monitoring helps detect and resolve issues before they impact the service availability or performance. It also aids in capacity planning by providing visibility into when and where resources are maxed out, which helps in proactive scaling and resource management.

Monitoring also supports security compliance and governance by logging and analyzing cluster activity. This is important for spotting unusual or unauthorized activities that could indicate security incidents. It helps ensure that Kubernetes deployments adhere to best practices for security, reducing the risk of vulnerabilities.

Types of Kubernetes Metrics to Monitor

Kubernetes monitoring should track metrics related to several components.

Cluster Metrics

Kubernetes cluster metrics provide a high-level overview of the cluster’s health and performance. These include cluster-wide resource usage like total CPU and memory consumption, which can help in understanding the workload distribution and identifying resource-intensive applications. 

Metrics such as node availability and cluster state changes are also monitored to ensure high availability and fault tolerance. Cluster metrics also include performance metrics like request rates, error rates, and latencies. These can be helpful in identifying performance trends over time and making data-driven decisions about scaling and resource allocation.

Control Plane Metrics

Control plane metrics focus on the number of clusters going sharply up or down. They cover components like the Controller Manager and Scheduler. Monitoring these metrics is important because they manage the state and configuration data of all Kubernetes objects. 

For example, metrics related to etcd database performance and API Server call latencies can impact the overall responsiveness and stability of the Kubernetes control plane. Monitoring these components helps in detecting and analyzing issues that affect the orchestration and operational aspects of Kubernetes.

Node Metrics

Node metrics in Kubernetes provide information about the performance and status of the worker nodes. These include CPU, memory, disk, and network metrics for each node, useful for identifying underperforming nodes or potential failures. Monitoring node status, readiness, and condition can help ensure that nodes are functioning correctly and are healthy.

This type of monitoring helps in managing the node lifecycle, from scaling up operations to replacing nodes that are continuously problematic. This aids in maintaining the normal functioning of the cluster.

Pod Metrics

Pod metrics are data points related to the operational aspects of pods within Kubernetes. This includes resource usage metrics such as CPU and memory utilization per pod, restart counts, and up/down status. These metrics are useful for troubleshooting individual pods and understanding their performance within the cluster.

By monitoring pod metrics, administrators can ensure that applications run smoothly and adhere to the configured resource limits and requests, preventing resource contention and ensuring reliable application performance.

Application Metrics

Application metrics focus on the performance of the applications running inside the pods. These could include transaction volumes, application response times, and custom application performance indicators. Tools such as Application Performance Monitoring (APM) are used to gather these metrics, providing insights related to application health.

Monitoring application metrics allows developers and operations teams to fine-tune applications, improve response times, and enhance the overall customer experience. They help ensure applications are performing as expected and efficiently using underlying resources.

Top 6 Kubernetes Metrics to Monitor 

Here are some of the most important metrics to track in Kubernetes.

1. CPU / Memory Requests

CPU and memory requests are the guaranteed amounts of these resources that a container will receive. Monitoring these metrics helps ensure that containers get the resources they need to run properly. 

Why this metric is important: If requests are set too low, containers may not get enough resources, leading to poor performance. Setting them too high can lead to wasted resources. Tracking these metrics allows administrators to adjust resource requests dynamically, optimizing the performance and efficiency of the cluster.

2. CPU / Memory Limit

CPU and memory limits define the maximum resources a container can use. Monitoring these metrics helps prevent any single container from consuming all the resources on a node, which could affect other containers running on the same node. 

Why this metric is important: By monitoring CPU and memory limits, administrators can identify containers that frequently hit their resource limits. These limits may need to be scaled or optimized to ensure fair resource distribution and maintain cluster stability.

3. Desired vs. Current Replicas

This metric compares the number of replicas specified in the deployment configuration (desired replicas) to the number of replicas currently running (current replicas). Monitoring this helps ensure that the application is running the intended number of instances, which is crucial for maintaining high availability and load distribution. 

Why this metric is important: Discrepancies between desired and current replicas can indicate issues with pod scheduling, resource shortages, or other operational problems that need immediate attention.

4. Resource Requests vs Limits

This metric provides an understanding of how resources are being utilized relative to what was allocated. Monitoring this helps identify resource contention and underutilization. 

Why this metric is important: By analyzing resource requests vs limits metrics, administrators can optimize resource allocation, ensuring that limits are set appropriately to maximize efficiency without overcommitting resources. This enables more balanced use of the cluster’s capabilities.

5. Network I/O

Network I/O metrics track the amount of data being transmitted and received by containers. Monitoring these metrics is essential for understanding the network load and identifying potential bottlenecks. 

Why this metric is important: High network I/O can indicate data-intensive applications that might need dedicated resources or optimization. Low network I/O can indicate underutilized resources. These metrics help ensure that network traffic is managed properly, supporting the overall performance and reliability of applications.

6. Scheduling Attempts

The scheduling attempt metric reveals the number of times the Kubernetes scheduler tries to place a pod on a node. High numbers of scheduling attempts can indicate issues with resource availability or constraints that prevent pods from being scheduled successfully. 

Why this metric is important: Monitoring this metric helps identify problems in the scheduling process, such as insufficient resources, node taints, or affinity/anti-affinity rules. This allows administrators to take corrective actions to improve the efficiency and reliability of the scheduling process.

How to Monitor Kubernetes Metrics 

Kubernetes metrics can be monitored natively in Kubernetes or via third-party tools.

Kubernetes Metrics Server

The Kubernetes Metrics Server is a lightweight, scalable source of container resource metrics for Kubernetes built-in autoscaling pipelines, such as the Horizontal Pod Autoscaler and the Vertical Pod Autoscaler. It collects metrics from the Kubelets on each node and provides an aggregated view of resource usage across the cluster.

To deploy the Metrics Server, teams can use the provided manifests or Helm charts. Once deployed, it periodically polls the Kubelets for CPU and memory usage statistics from each node and pod, storing this data in memory. This allows for real-time monitoring without significant overhead.

The Metrics Server can be used to query the current resource usage of pods and nodes with the kubectl top command. For example, kubectl top nodes displays the CPU and memory usage for each node, while kubectl top pods shows the same metrics for each pod.

Third-Party Kubernetes Monitoring Tools

While the Kubernetes Metrics Server is useful for basic monitoring, it has limitations that might require the use of more advanced third-party tools. The main limitation of the Metrics Server is its focus on real-time metrics with minimal historical data retention. This can be insufficient for in-depth analysis, trend forecasting, and long-term capacity planning.

Benefits of using third-party tools include:

  1. Comprehensive data retention: Third-party tools often provide extensive historical data retention, enabling long-term analysis and trend detection.
  2. Enhanced visualization: These tools offer advanced visualization capabilities, allowing for the creation of detailed, custom dashboards that provide a clear view of a cluster’s performance over time.
  3. Advanced alerting: They come with powerful alerting mechanisms that can notify admins of potential issues before they escalate, based on custom thresholds and complex conditions.
  4. Detailed metrics: Third-party solutions typically collect a broader range of metrics, including custom application metrics, which give a fuller view of the cluster’s health and performance.
  5. Scalability: These tools can handle the complexities of large, dynamic environments, making them suitable for monitoring extensive Kubernetes deployments with numerous nodes and pods.
  6. Integration and extensibility: They often integrate with various other tools and platforms, providing a more unified and extensible monitoring solution that can incorporate logs, traces, and other observability data.

Third-party monitoring tools help achieve a more scalable and actionable view of the Kubernetes environment, ensuring optimal performance and reliability.

Related content: Read our guide to Kubernetes Monitoring Tools

Coralogix for Kubernetes Observability

Coralogix sets itself apart in observability with its modern architecture, enabling real-time insights into logs, metrics, and traces with built-in cost optimization. Coralogix’s straightforward pricing covers all its platform offerings including APM, RUM, SIEM, infrastructure monitoring and much more. With unparalleled support that features less than 1 minute response times and 1 hour resolution times, Coralogix is a leading choice for thousands of organizations across the globe.

Learn more about Coralogix for Kubernetes

Observability and Security
that Scale with You.