Our next-gen architecture is built to help you make sense of your ever-growing data Watch a 4-min demo video!

An Introduction to Kubernetes Observability

  • Coralogix
  • July 14, 2022
kubernetes observability

If your organization is embracing cloud-native practices, then breaking systems into smaller components or services and moving those services to containers is an essential step in that journey. 

Containers allow you to take advantage of cloud-hosted distributed infrastructure, move and replicate services as required to ensure your application can meet demand, and take instances offline when they’re no longer needed to save costs.

Once you’re dealing with more than a handful of containers in production, a container orchestration platform becomes practically essential. Kubernetes, or K8s for short, has become the de-facto standard for container orchestration, with all major cloud providers offering K8s support and their own Kubernetes managed service.

With Kubernetes, you can automate your containers’ deployment, management, and scaling, making it possible to work with hundreds or thousands of containers and ensure reliable and resilient service. 

Fundamental to the design of Kubernetes is its declarative model: you define what you want the state of your system to be, and Kubernetes works to ensure that the cluster meets those requirements, automatically adding, removing, or replacing pods (the wrapper around individual containers) as required.  

The self-healing design can give the impression that observability and monitoring are all taken care of when you deploy with Kubernetes. Unfortunately, that’s not the case. While some things are handled automatically – like replacing failed cluster nodes or scaling services – Kubernetes observability still needs to be built in and used to ensure the health and performance of a K8s deployment.

Log data plays a central role in creating an observable system. By monitoring logs in real-time, you gain a better understanding of how your system is operating and can be proactive in addressing issues as they emerge, before they cause any real damage. This article will look at how Kubernetes observability can be built into your Kubernetes-managed cluster, starting at the bottom of the stack.

Observability for K8s infrastructure

As a container orchestration platform, Kubernetes handles the containers running your application workloads but doesn’t manage the underlying infrastructure that hosts those containers. 

A Kubernetes cluster consists of multiple physical and/or virtual machines (the cluster nodes) connected over a network. While Kubernetes will take care of deploying containers to the nodes (according to the declared configuration) and packing them efficiently, it cannot manage the nodes’ health.

Your cloud provider is responsible for keeping servers online and providing computing resources on demand in a public cloud context. However, to avoid the risk of a huge bill, you’ll want to keep an eye on your usage – and potentially set quotas – to prevent auto-scaling and elastic resources from running wild. If you’ve set quotas, you’ll need to monitor your usage and be ready to provision additional capacity as demand grows.

If you’re running Kubernetes on a private cloud or on-premise infrastructure, monitoring the health of your servers – including disk space, memory, and CPU – and keeping them patched and up-to-date is essential. 

Although Kubernetes will take care of moving pods to healthy nodes if a machine fails, with a fixed set of resources, that approach can only stretch so far before running out of server nodes. To use Kubernetes’ self-healing and auto-scaling features to the best effect, you must ensure sufficient cluster nodes are online and available at all times.

Using Kubernetes’ metrics and logs

Once you’ve considered the observability of the servers hosting your Kubernetes cluster, the next layer to consider is the Kubernetes deployment itself. 

Although Kubernetes is self-healing, it is still dependent on the configuration you specify; by getting visibility into how your cluster is being used, you can identify misconfigurations, such as faulty replica sets, and spot opportunities to streamline your setup, like underused nodes.

As you might expect, the various components of Kubernetes each emit log messages so that the inner workings of the system can be observed. This includes:

  • kube-apiserver – This serves the REST API that allows you, as an end-user, to communicate with the cluster components via kubectl or a GUI application, and enables communication between control plane components over gRPC. The API server logs include details of error messages and requests. Monitoring these logs can alert you to early signs of the server needing to be scaled out to accommodate increased load or issues down the pipeline that are slowing down the processing of incoming requests.
  • kube-scheduler – The scheduler assigns pods to cluster nodes according to configuration rules and node availability. Unexpected changes in the number of pods assigned could signify a misconfiguration or issues with the infrastructure hosting the pods.
  • kube-controller-manager – This runs the controller processes. Controllers are responsible for monitoring the status of the different elements in a cluster, such as nodes or endpoints, and moving them to the desired state when needed. By monitoring the controller manager over time, you can determine a baseline for normal operations and use that information to spot increases in latency or retries. This may indicate something is not working as expected.

The Kubernetes logging library, klog, generates log messages for these system components and others, such as kubelet. Configuring the log verbosity allows you to control whether logs are only generated for critical or error states or lower severity levels too. 

While you can view log messages from the Kubernetes CLI, kubectl, forwarding logs to a central platform allows you to gain deeper insights. By building up a picture of the log data over time, you can identify trends and compare these to the latest data in real-time, using it to identify changes in cluster behavior.

Monitoring a Kubernetes-hosted application

In addition to the cluster-level logging, you need to generate logs at the application level for full observability of your system. Kubernetes observability ensures your services are available, but it lacks visibility or understanding of your application logic. 

Instrumenting your code to generate logs at appropriate severity levels makes it possible to understand how your application is behaving at runtime and can provide essential clues when debugging failures or investigating security issues.

Once you’ve enabled logging into your application, the next step is to ensure those logs are stored and available for analysis. By their very nature, containers are ephemeral – spun up and taken offline as demand requires. 

Kubernetes stores the logs for the current pods and the previous pods on a given node, but if a pod is created and removed multiple times, the earlier log data is lost. 

As log data is essential for determining what normal behavior looks like, investigating past incidents and for audit purposes, it’s a good idea to consider shipping logs to a centralized platform for storage and analysis.

The two main patterns for shipping logs from Kubernetes are to use either a node logging agent or a sidecar logging agent:

  • With a node logging agent, the agent is installed on the cluster node (the physical or virtual server) and forwards the logs for all pods on that node.
  • With a sidecar logging agent, each pod holds the application container together with a sidecar container hosting the logging agent. The agent forwards all logs from the container.

Once you’ve forwarded your application logs to a log observability platform, you can start analyzing the log data in real-time. Tracking business metrics, such as completed transactions or order quantities, can help to spot unusual patterns as they begin to emerge. 

Monitoring these alongside lower-level application, cluster, and infrastructure health data makes it easier to correlate data and drill down into the root cause of issues.

Summary

While Kubernetes offers many benefits when running distributed, complex systems, it doesn’t prevent the need to build observability into your application and monitor outputs from all levels of the stack to understand how your system is behaving. 

With Coralogix, you can perform real-time analysis of log data from each part of the system to build a holistic view of your services. You can forward your logs using Fluentd, Fluent-Bit, or Filebeat, and use the Coralogix Kubernetes operator to apply log parsing and alerting features to your Kubernetes deployment natively using Kubernetes custom resources.

Related Articles