Kubernetes complete observability: Advanced configuration
Coralogix provides Kubernetes Observability using OpenTelemetry for comprehensive monitoring of your Kubernetes clusters and applications. This guide explains advanced configuration options for optimizing your Kubernetes observability setup.
For basic configuration instructions, see our basic configuration tutorial.
Prerequisites
- Kubernetes (version 1.24 or later) with kubectl command-line tool installed
- Helm (version 3.9 or later) installed and configured
Overview
The OpenTelemetry Integration Chart uses the values.yaml file as its default configuration. This configuration is based on the OpenTelemetry Collector Configuration for both the OpenTelemetry Agent Collector and OpenTelemetry Cluster Collector.
Default configuration
STEP 1. Create a new YAML-formatted override file that defines values for the OpenTelemetry Integration Chart.
The following global values are the minimum required configurations for a working chart:
Configure these values:
domain: Specify your OpenTelemetry endpoint for the domain associated with your Coralogix account.clusterName: A required identifier for your cluster
You can also copy additional configurations from the repository values.yaml file.
!!! note\n
If you want to override array values such as `extraEnvs`, `extraVolumes`, or `extraVolumeMounts`, note that Helm doesn't support array merging. Instead, arrays [are nulled out](https://github.com/helm/helm/issues/3486). If you need to customize these arrays, first copy the existing values from the provided [`values.yaml`](https://github.com/coralogix/telemetry-shippers/blob/master/otel-integration/k8s-helm/values.yaml) file.
STEP 2. Save this file as values.yaml
STEP 3. Install using the helm upgrade --install command:
helm upgrade --install otel-integration \
coralogix-charts-virtual/otel-integration \
-f values.yaml \
-n $NAMESPACE
Optional configurations
Enabling dependent charts
The OpenTelemetry Agent is primarily used for collecting application telemetry, while the OpenTelemetry Cluster Collector is primarily used to collect cluster-level data. Depending on your requirements, you can either use the default configuration that enables both components, or you can choose to disable either of them by modifying the enabled flag in the values.yaml file under the opentelemetry-agent or opentelemetry-cluster-collector section as shown below:
...
opentelemetry-agent:
enabled: true
mode: daemonset
...
opentelemetry-cluster-collector:
enabled: true
mode: deployment
Installing the chart on clusters with mixed operating systems (Linux and Windows)
Installing otel-integration is also possible on clusters that support running Windows workloads on Windows node alongside Linux nodes (such as EKS, AKS or GKE). The collector will be installed on Linux nodes, as these components are supported only on Linux operating systems. Conversely, the agent will be installed on both Linux and Windows nodes as a daemonset, in order to collect metrics for both operating systems. In order to do so, the chart needs to be installed with few adjustments.
Depending on your Windows server version, you might need to adjust the image you are using with the Windows agent. The default image is coralogixrepo/opentelemetry-collector-contrib-windows:<semantic_version>. For Windows 2022 servers, use the coralogixrepo/opentelemetry-collector-contrib-windows:<semantic_version>-windows2022 version. You can do this by adjusting the opentelemetry-agent-windows.image.tag value in the values-windows.yaml file.
Add the Coralogix Helm charts repository to your local repository list by running:
To update your local Helm repository cache with the latest charts, run:
Install the chart using the values-windows.yaml CRD file. You can provide the global values (secret key and cluster name) in one of two ways:
- Edit the main
values.yamlfile and pass both files to thehelm upgradecommand:
helm upgrade --install otel-coralogix-integration coralogix-charts-virtual/otel-integration \
--render-subchart-notes -f values.yaml -f values-windows.yaml
- Provide the values directly in the command line by passing them with the
--setflag:
helm upgrade --install otel-coralogix-integration coralogix-charts-virtual/otel-integration \
--render-subchart-notes -f values-windows.yaml --set global.clusterName=<cluster_name> --set global.domain=<domain>
OpenTelemetry Agent
The OpenTelemetry Agent is enabled and deployed as a daemonset by default. This creates an Agent pod per node. Allowing the collection of logs, metrics, and traces from application pods to be sent to OpenTelemetry pods hosted on the same node and spreads the ingestion load across the cluster. Be aware that the OpenTelemetry Agent pods consumes resources (e.g., CPU & memory) from each node on which it runs.
!!! note\n
If there are nodes without a running OpenTelemetry Agent pod, the hosted pods of applications may be missing metadata attributes (e.g. node info and host name) in the telemetry sent.
Agent presets
The multi-instanced OpenTelemetry Agent can be deployed across multiple nodes as a daemonset. It provides presets for collecting host metrics, Kubernetes attributes, and Kubelet metrics. When logs, metrics, and traces are generated from a pod, the collector enriches them with the metadata associated with the hosting machine. This metadata is very useful for linking infrastructure issues with performance degradation in services.
For more information on presets, refer to the documentation in values.yaml
# example
opentelemetry-agent:
...
presets:
# LogsCollection preset enables a configured filelog receiver to read all containers' logged console output (/var/log/pods/*/*/*.log).
logsCollection:
enabled: true
# KubernetesAttributes preset collects Kubernetes metadata such as k8s.pod.name, k8s.namespace.name, and k8s.node.name. It also adjusts the ClusterRole with appropriate RBAC roles to query the Kubernetes API.
kubernetesAttributes:
enabled: true
# HostMetrics preset enables collection of host metrics, involving CPU, memory, disk and network.
hostMetrics:
enabled: true
# Process preset adds collection of host processes.
process:
enabled: true
# KubeletMetrics enables the kubeletstats receiver to collect node, pod and container metrics from the Kubernetes API. It also adjusts the ClusterRole with appropriate RBAC roles.
kubeletMetrics:
enabled: true
For example, setting the kubeletMetrics preset to true will configure the kubeletstats receiver to pull node, pod, container, and volume metrics from the API server of the host's kubelet. The metrics will be sent to the metric pipeline.
# example
receivers:
kubeletstats:
auth_type: serviceAccount
collection_interval: 20s
endpoint: ${K8S_NODE_NAME}:10250
collect_all_network_interfaces:
pod: true
node: true
Receivers
Once configured, you will be able to send logs, metrics, and traces to be collected in the OpenTelemetry Agent pods before exporting them to Coralogix.
To achieve this, you need to first instrument your application with OpenTelemetry SDKs and expose the Collector to a corresponding receiver. It is recommended to use the OTLP receiver (OpenTelemetry protocol) for transmission over gRPC or HTTP endpoints.
The daemonset deployment of the OpenTelemetry Agent also uses hostPort for the otlp port, allowing agent pod IPs to be reachable via node IPs, as follows:
# K8s daemonset otlp port config
ports:
- containerPort: 4317
hostPort: 4317
name: otlp
protocol: TCP
Configuring auto-instrumented JavaScript applications
The following examples demonstrate how to configure an auto-instrumented JavaScript application to send traces to the agent pod's gRPC receiver.
STEP 1. Set the Kubernetes environment variables of the JavaScript application's deployment/pod as in the example below. Define the OTEL_EXPORTER_OTLP_ENDPOINT as the configured NODE_IP and OTLP_PORT. Configure OTEL_TRACES_EXPORTER to send in the otlp format. Choose OTEL_EXPORTER_OTLP_PRO as grpc.
# kubernetes deployment manifest's env section
spec:
containers:
...
env:
- name: NODE_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
- name: OTLP_PORT
value: "4317"
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://$(NODE_IP):$(OTLP_PORT)"
- name: OTEL_TRACES_EXPORTER
value: "otlp"
- name: OTEL_EXPORTER_OTLP_PROTOCOL
value: "grpc"
STEP 2. By default the agent has the otlp receiver configured as follows:
# collector config
receivers:
otlp:
protocols:
grpc:
endpoint: ${MY_POD_IP}:4317
http:
endpoint: ${MY_POD_IP}:4318
!!! note\n - ${MY_POD_IP} is a container environment variable that is mapped to the pod's IP address. - The agent is also preconfigured to collect data from jaeger.
Processors
Processors are generally used to process logs, metrics, and traces before the data is exported. This may include, for example, modifying or altering attributes or sampling traces.
In the example below, a k8sattributes processor is used to automatically discover k8s resources (pods), extract metadata from them, and add the extracted metadata to the relevant logs, metrics, and spans as resource attributes.
# default in values.yaml
processors:
k8sattributes:
filter:
node_from_env_var: KUBE_NODE_NAME
extract:
metadata:
- "k8s.namespace.name"
- "k8s.deployment.name"
- "k8s.statefulset.name"
- "k8s.daemonset.name"
- "k8s.cronjob.name"
- "k8s.job.name"
- "k8s.pod.name"
- "k8s.node.name"
!!! note\n - The k8sattributes processor is enabled by default at the preset level as kubernetesAttributes and further extended in the default values.yaml. - More information can be found in the Kubernetes Attributes Processor README.
OpenTelemetry Cluster Collector
Enable the opentelemetry-cluster-collector by setting enabled to true.
!!! note\n
The cluster collector operates as a `deployment` workload with a minimal replica of 1 to avoid duplication of telemetry data.
Cluster collector presets
The cluster collector is best suited to enable presets such as Kubernetes Events and Cluster Metrics. A smaller instance count of the deployment is sufficient to query the Kubernetes API.
presets:
clusterMetrics:
enabled: true
kubernetesEvents:
enabled: true
kubernetesExtraMetrics:
enabled: true
For example, if you enable the kubernetesEvents preset, the Kubernetes objects receiver configuration will be added dynamically during the Helm installation. This configuration enables the collection of events.k8s.io objects from the Kubernetes API server.
Kubernetes events: reducing the amount of collected data
When collecting Kubernetes events using the cluster collector, it is common for the number of events to reach millions, especially in large clusters with numerous nodes and constantly scaling applications. To collect only the relevant data, you can use the following settings.
Cleaning data
By default, a transform processor named transform/kube-events is configured to remove unneeded fields from the collected Kubernetes events. You may override this or alter the fields as desired.
processors:
transform/kube-events:
log_statements:
- context: log
statements:
- keep_keys(body["object"], ["type", "eventTime", "reason", "regarding", "note", "metadata", "deprecatedFirstTimestamp", "deprecatedLastTimestamp"])
- keep_keys(body["object"]["metadata"], ["creationTimestamp"])
- keep_keys(body["object"]["regarding"], ["kind", "name", "namespace"])
Filtering Kubernetes events
In large-scale environments, where there are numerous events occurring per hour, it may not be necessary to process all of them. In such cases, you can use an additional OpenTelemetry processor to filter out the events that do not need to be sent to Coralogix.
Below is a sample configuration for reference. This configuration filters out any event that has the field reason with one of those values BackoffLimitExceeded|FailedScheduling|Unhealthy.
processors:
filter/kube-events:
logs:
log_record:
- 'IsMatch(body["reason"], "(BackoffLimitExceeded|FailedScheduling|Unhealthy)") == true'
Collecting warning events only
Currently, Kubernetes has two different types of events: Normal and Warning. As we have the ability to filter events according to their type, you may choose to collect only Warning events, as these events are key to troubleshooting. One example could be the use of a filter processor to drop all unwanted Normal-type events.
Resource Catalog
The Coralogix Resource Catalog can be used to monitor the various resource types within your Kubernetes clusters. It collects component details and lets you observe performance metrics and review logs of the associated components. Data for this feature comes from multiple sources. There are several presets that can be used to enable these features.
Kubernetes resources preset
This preset enables the scrape of the Kubernetes API to populate your Kubernetes resource inventory. It uses the k8sobjects receiver and collects objects as defined in this configuration, uses a processor to enrich the collected objects, and exports it with a customized coralogix/resource_catalog exporter.
This preset needs to be enabled only in the cluster-collector configuration.
Host details presets
The last two presets collect important host information to enrich the catalog. This data is collected by the agent nodes and consists of host entity events and processes collected by hostmetrics receiver. While the hostEntityEvents preset is required, the hostMetrics.process preset is optional.
!!! note\n - The hostMetrics process preset is detailed in the Agent presets section above. - It is recommended to use the hostMetric preset only on agent collectors. Applying this preset to other collector types may result in duplicate host metrics.
Kubernetes infrastructure monitoring
If you already have an existing log shipper (such as, Fluentd, Filebeat) in place and your goal is to monitor all Kubernetes elements of your cluster, follow these steps to enable only the necessary collection of metrics and Kubernetes events to be sent to Coralogix.
STEP 1. Copy the following into a YAML-formatted override file and save as values.yaml.
global:
domain: "<coralogix-endpoint>"
clusterName: "<k8s-cluster-name>"
opentelemetry-agent:
presets:
logsCollection:
enabled: false
config:
exporters:
logging: {}
receivers:
zipkin: null
jaeger: null
service:
pipelines:
traces:
exporters:
- logging
receivers:
- otlp
logs:
exporters:
- logging
receivers:
- otlp
STEP 2. Install with the helm upgrade --install command.
helm upgrade --install otel-integration coralogix-charts-virtual/otel-integration -f values.yaml -n $NAMESPACE
Installing the chart on GKE Autopilot clusters
GKE Autopilot has limited access to host filesystems, host networking and host ports. Due to this some features of OpenTelemetry Collector do not work. More information about limitations is available in GKE Autopilot security capabilities document
Notable important differences from the regular otel-integration are:
- Host metrics receiver is not available, though you still get some metrics about the host through
kubeletstatsreceiver. - Kubernetes Dashboard does not work, due to missing Host Metrics.
- Host networking and host ports are not available, users need to send tracing spans through Kubernetes Service. The Service uses
internalTrafficPolicy: Local, to send traffic to locally running agents. - Log Collection works, but does not store check points. Restarting the agent will collect logs from the beginning.
To install otel-integration to GKE/Autopilot follow these steps:
First make sure to add our Helm charts repository to the local repos list with the following command:
In order to get the updated Helm charts from the added repository, please run:
Install the chart with the CRD gke-autopilot-values.yaml file. You can either provide the global values (secret key, cluster name) by adjusting the main values.yaml file and then passing the values.yaml file to the helm upgrade command as following:
helm upgrade --install otel-coralogix-integration coralogix-charts-virtual/otel-integration \
--render-subchart-notes -f values.yaml -f gke-autopilot-values.yaml
Or you can provide the values directly in the command line by passing them with the --set flag:
helm upgrade --install otel-coralogix-integration coralogix-charts-virtual/otel-integration \
--render-subchart-notes -f gke-autopilot-values.yaml --set global.clusterName=<cluster_name> --set global.domain=<domain>
Installing the chart on IPv6-only clusters
To run otel-integration inside an IPv6-only cluster, you need to install the chart using the ipv6-values.yaml file.
First, make sure to add our Helm charts repository to the local repo list using the following command:
To get the updated Helm charts from the added repository, run:
Install the chart with the ipv6-values.yaml file. You can either provide the global values (secret key, cluster name) by adjusting the main values.yaml file and then passing the values.yaml file to the helm upgrade command as follows:
helm upgrade --install otel-coralogix-integration coralogix-charts-virtual/otel-integration \
--render-subchart-notes -f values.yaml -f ipv6-values.yaml
Installing the chart on EKS Fargate clusters
AWS EKS Fargate is a serverless compute engine for Kubernetes that removes the need to provision and manage EC2 instances. Since Fargate pods run in an isolated environment, some collector features require special configuration.
Prerequisites
Before installing the chart on EKS Fargate, ensure the following:
- CoreDNS addon: The EKS cluster must have the CoreDNS addon installed for DNS resolution to work. If your cluster doesn't have CoreDNS, install it using:
CLUSTER_VERSION=$(aws eks describe-cluster --name <cluster-name> --region <region> --query 'cluster.version' --output text)
COREDNS_VERSION=$(aws eks describe-addon-versions --addon-name coredns --kubernetes-version $CLUSTER_VERSION --region <region> --query 'addons[0].addonVersions[0].addonVersion' --output text)
aws eks create-addon --cluster-name <cluster-name> --addon-name coredns --addon-version $COREDNS_VERSION --region <region>
- Fargate Profile: A Fargate profile must be created for the namespace where you plan to deploy the collectors. If you're deploying to the
defaultnamespace, create a Fargate profile:
aws eks create-fargate-profile \
--cluster-name <cluster-name> \
--region <region> \
--fargate-profile-name default \
--pod-execution-role-arn <pod-execution-role-arn> \
--subnets <subnet-id-1> <subnet-id-2> <subnet-id-3> \
--selectors namespace=default
- VPC DNS Settings: Ensure DNS support and DNS hostnames are enabled for your VPC:
aws ec2 modify-vpc-attribute --vpc-id <vpc-id> --enable-dns-support
aws ec2 modify-vpc-attribute --vpc-id <vpc-id> --enable-dns-hostnames
Notable important differences from the regular otel-integration are:
- Host metrics receiver is not available, though you still get some metrics about the host through
kubeletstatsreceiver. - Host networking and host ports are not available, users need to send tracing spans through Kubernetes Service.
- Log collection via hostPath mounts is not supported due to Fargate limitations.
- The collector requires the
K8S_NODE_NAMEenvironment variable to be set for proper node identification and kubelet stats collection.
Deployment Modes
There are two primary deployment patterns for EKS Fargate:
Per-namespace collector (
opentelemetry-agent-eks-fargate): Deploy the OpenTelemetry Collector as a StatefulSet in each Fargate namespace where your applications run. This collector will collect your application's telemetry data (traces, metrics, and logs) and also gather kubelet stats metrics from its own Fargate node. This is the recommended approach when you want to deploy the collector alongside your applications in Fargate.Centralized monitoring collector (
opentelemetry-agent-eks-fargate-monitoring): Deploy a dedicated OpenTelemetry Collector as a Deployment that acts as a centralized infrastructure monitoring component. This collector automatically discovers all Fargate nodes in the cluster and collects kubelet stats metrics from each of them. It uses the receiver creator to dynamically discover kubelet endpoints and filters metrics to only collect from Fargate nodes. This pattern is useful when you want to monitor the infrastructure separately from application telemetry, or when you want a single collector to gather node-level metrics across all Fargate pods in the cluster.
Why is this needed? Due to Fargate networking restrictions, a pod cannot communicate with its own kubelet endpoint to collect its own metrics. The per-namespace collector uses an init container to label its node with OTEL-collector-node=true, and the centralized monitoring collector specifically targets nodes with this label to collect the missing kubelet stats metrics. This workaround ensures complete infrastructure monitoring coverage across all Fargate nodes.
Installation
First, make sure to add our Helm charts repository to the local repo list using the following command:
To get the updated Helm charts from the added repository, run:
Install the chart with the values-eks-fargate.yaml file. You must provide the required global values (clusterName and domain). You can either adjust the main values.yaml file with these values and then pass it to the helm upgrade command:
helm upgrade --install otel-coralogix-integration coralogix-charts-virtual/otel-integration \
--render-subchart-notes -f values.yaml -f values-eks-fargate.yaml
Or you can provide the values directly in the command line by passing them with the --set flag:
helm upgrade --install otel-coralogix-integration coralogix-charts-virtual/otel-integration \
--render-subchart-notes -f values-eks-fargate.yaml \
--set global.clusterName=<cluster_name> \
--set global.domain=<coralogix-endpoint>
!!! note\n The global.domain value must be set to your Coralogix endpoint domain (e.g., coralogix.com, coralogix.us, coralogix.in, etc.). If you have the domain stored in the CORALOGIX_DOMAIN environment variable, you can use --set global.domain=$CORALOGIX_DOMAIN.
Configuration
The values-eks-fargate.yaml file enables both deployment modes by default. To use only one mode, you can disable the other:
- To use only the per-namespace collector, set
opentelemetry-agent-eks-fargate-monitoring.enabled: falsein your values file. - To use only the centralized monitoring collector, set
opentelemetry-agent-eks-fargate.enabled: falsein your values file.
The EKS Fargate preset configuration is nested under each collector's configuration. For the per-namespace collector (opentelemetry-agent-eks-fargate):
opentelemetry-agent-eks-fargate:
presets:
eksFargate:
# Set to false for per-namespace collectors
monitoringCollector: false
kubeletStats:
# Collection interval for kubelet stats metrics
collectionInterval: "30s"
initContainer:
enabled: true
image:
repository: "public.ecr.aws/aws-cli/aws-cli"
tag: "2.28.17"
For the centralized monitoring collector (opentelemetry-agent-eks-fargate-monitoring):
opentelemetry-agent-eks-fargate-monitoring:
presets:
eksFargate:
# Set to true for centralized monitoring collector
monitoringCollector: true
kubeletStats:
# Collection interval for kubelet stats metrics
collectionInterval: "30s"
Required Environment Variables
When using EKS Fargate, the K8S_NODE_NAME environment variable is automatically configured in each collector's configuration. For example, in opentelemetry-agent-eks-fargate:
opentelemetry-agent-eks-fargate:
extraEnvs:
- name: K8S_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
The same configuration is also present in opentelemetry-agent-eks-fargate-monitoring. This variable is used by the resource detection processor to identify the node and by the receiver creator to collect kubelet stats.
Note: Due to Fargate limitations, these options will not work: - presets.hostMetrics - presets.logsCollection (container log collection via hostPath mounts)
Next steps
Validation instructions can be found here.
Tail Sampling with OpenTelemetry using Kubernetes
This tutorial demonstrates how to configure a Kubernetes cluster, deploy OpenTelemetry to collect logs, metrics, and traces, and enable trace sampling. We will cover an example of enabling a tail sample for the Opentelemetry Demo Application and a more precise example using the small trace-generating application.
Prerequisites
A Kubernetes cluster
Helm installed
Coralogix Send-Your-Data API key
How it works
The Kubernetes OpenTelemetry Integration consists of the following components:
OpenTelemetry Agent. The Agent is deployed to each node within the Cluster and collects telemetry data from the applications running on that node. The agent is configured to send the telemetry data to the OpenTelemetry Gateway. The agent ensures that traces with the same ID are sent to the same gateway. This allows tail sampling to be performed on the traces correctly, even if they span multiple applications and nodes.
OpenTelemetry Gateway. The Gateway is responsible for receiving telemetry data from the agents and forwarding it to the Coralogix backend. The Gateway is also responsible for load balancing the telemetry data to the Coralogix backend.
Install the Coralogix OpenTelemetry Integration
This integration uses the Coralogix OpenTelemetry Helm Chart. While this document focuses on tail sampling for traces, deploying this chart also deploys the infrastructure to collect logs, metrics, and traces from your Kubernetes cluster and pods.
The following configuration enables OTel-agent pods to send span data to the coralogix-opentelemetry-gateway deployment using the loadbalancing exporter.
To ensure optimal performance:
- Configure an appropriate number of replicas based on your traffic volume
- Set resource requests and limits to handle the expected load
- Define custom tail sampling policies to control which spans are collected.
!!! note\n - When running in OpenShift environments, set distribution: "openshift" in your values.yaml - When running in Windows environments, use the values-windows-tailsampling.yaml values file
STEP 1. Add the Coralogix Helm repository.
STEP 2. Copy the tail-sampling-values.yaml file found here and update the relevant fields with your values.
global:
domain: "<your-coralogix-domain>"
clusterName: ""
defaultApplicationName: "otel"
defaultSubsystemName: "integration"
logLevel: "warn"
collectionInterval: "30s"
opentelemetry-agent:
enabled: true
mode: daemonset
presets:
loadBalancing:
enabled: true
routingKey: "traceID"
hostname: coralogix-opentelemetry-gateway
config:
service:
pipelines:
traces:
exporters:
- loadbalancing
opentelemetry-gateway:
enabled: true
replicaCount: 3
config:
processors:
tail_sampling:
decision_wait: 10s
num_traces: 100
expected_new_traces_per_sec: 10
policies:
[
{
name: errors-policy,
type: status_code,
status_code: {status_codes: [ERROR]}
},
{
name: randomized-policy,
type: probabilistic,
probabilistic: {sampling_percentage: 10}
},
]
opentelemetry-collector:
enabled: false
STEP 3. Add your Coralogix Send-Your-Data API key to the tail-sampling-values.yaml file.
STEP 4. Install the otel-integration.
helm upgrade --install otel-coralogix-integration coralogix-charts-virtual/otel-integration \
--render-subchart-notes -f tail-sampling-values.yaml
kubectl get pods
NAME READY STATUS RESTARTS AGE
coralogix-opentelemetry-agent-86qdb 1/1 Running 0 7h59m
coralogix-opentelemetry-gateway-65dfbb5567-6rk4j 1/1 Running 0 7h59m
coralogix-opentelemetry-gateway-65dfbb5567-g7m5l 1/1 Running 0 7h59m
coralogix-opentelemetry-gateway-65dfbb5567-zbprd 1/1 Running 0 7h59m
You should end up with as many opentelemetry-agent pods as you have nodes in your cluster, and 3 opentelemetry-gateway pods.
Install test application environment
In the next section, we will describe the process for installing 2 application environments, the OpenTelemetry Demo Application and a Small Trace Generating. You do not need to install both these examples.
Install OpenTelemetry demo
STEP 1. Add the Helm chart for the OpenTelemetry Demo Application.
STEP 2. Create a values.yaml file and add the following:
default:
env:
- name: OTEL_SERVICE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: "metadata.labels['app.kubernetes.io/component']"
- name: OTEL_COLLECTOR_NAME
value: '{{ '{{' }} include "otel-demo.name" . }}-otelcol'
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: http://$(OTEL_COLLECTOR_NAME):4317
- name: OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE
value: cumulative
- name: OTEL_RESOURCE_ATTRIBUTES
value: service.name=$(OTEL_SERVICE_NAME),service.namespace=opentelemetry-demo
envOverrides:
- name: OTEL_COLLECTOR_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: http://$(OTEL_COLLECTOR_NAME):4317
serviceAccount:
create: true
annotations: {}
name: ""
opentelemetry-collector:
enabled: false
jaeger:
enabled: false
prometheus:
enabled: false
grafana:
enabled: false
This will configure the OpenTelemetry Demo Application to send traces to the Coralogix OpenTelemetry Agent running on the node.
STEP 3. Install the Opentelemetry Demo Application.
helm install otel-demo open-telemetry/opentelemetry-demo -f values.yaml
NAME: my-otel-demo
LAST DEPLOYED: Mon Feb 19 23:29:16 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Install the small trace-generating application
This application is a small trace-generating application. We will demonstrate how to connect it to the Coralogix OpenTelemetry Agent to enable tail sampling.
STEP 1. Create a file go-traces-demo.yaml and add the following:
apiVersion: apps/v1
kind: Deployment
metadata:
name: go-otel-traces-demo
spec:
selector:
matchLabels:
app: go-otel-traces-demo
replicas: 1
template:
metadata:
labels:
app: go-otel-traces-demo
spec:
containers:
- name: go-otel-traces-demo
image: public.ecr.aws/c1s3k2h4/go-otel-traces-demo:latest
imagePullPolicy: Always
env:
- name: NODE_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
- name: CX_ENDPOINT
value: $(NODE_IP):4317
STEP 2. Apply the Kubernetes deployment.
Validation
View your telemetry data in your Coralogix dashboard. Traces should arrive from the tail-sampling load balancer.
Configuring Head Sampling for Tracing
Head sampling is a feature that allows you to sample traces at the collection point. When enabled, it creates a separate pipeline for sampled traces using probabilistic sampling. This helps reduce the volume of traces while maintaining a representative sample.
When used in combination with tail sampling, head sampling is applied first at the agent level. The sampled traces are then forwarded to the tail sampling collectors, where additional sampling decisions can be made. This means that tail sampling will only see and process the traces that have already passed through head sampling.
The sampling configuration:
- Creates a new 'traces/sampled' pipeline in addition to the main traces pipeline
- Applies probabilistic sampling based on the configured percentage
- Supports different sampling modes:
- "proportional": Maintains the relative proportion of traces across services
- "equalizing": Attempts to sample equal numbers of traces from each service
- "hash_seed": Uses consistent hashing to ensure the same traces are sampled
To enable head sampling, configure the following in your values.yaml:
presets:
headSampling:
enabled: true
# Percentage of traces to sample (0-100)
percentage: 10
# Sampling mode - "proportional", "equalizing", "hash_seed"
mode: "proportional"
Deploying Central Collector Cluster for Tail Sampling
To deploy OpenTelemetry Collector in a separate "central" Kubernetes cluster for telemetry collection and tail sampling using OpenTelemetry Protocol (OTLP) receivers, install otel-integration using the central-tail-sampling-values.yaml values file. Review the values file for detailed configuration options.
This deployment creates two key components:
opentelemetry-receiver. Receives OTLP data and sends metrics and logs directly to Coralogix while performing load balancing of span data sent to theopentelemetry-gatewaydeployment.opentelemetry-gateway. Performs tail sampling decisions on the received span data before forwarding to Coralogix
To enable other Kubernetes clusters to send data to the opentelemetry-receiver, expose it using one of these methods:
- Service of type LoadBalancer
- Ingress object configuration
- Manual load balancer configuration
!!! important\n
Ensure you configure sufficient replicas and appropriate resource requests/limits to handle the expected load. You'll also need to set up custom [tail sampling processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor) policies.
STEP 1. Run the following commands to deploy the Central Collector Cluster.
helm upgrade --install otel-coralogix-central-collector coralogix-charts-virtual/otel-integration \
--render-subchart-notes -f central-tail-sampling-values.yaml
STEP 2. Validate the deployment by sending a sample of OTLP data to the opentelemetry-receiver Service and navigating to the Coralogix Explore Screen to view collected traces. This can be done via telemetrygen:
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: telemetrygen-deployment
spec:
replicas: 1
selector:
matchLabels:
app: telemetrygen
template:
metadata:
labels:
app: telemetrygen
spec:
containers:
- name: telemetrygen
image: ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen:latest
args:
- "traces"
- "--otlp-endpoint=coralogix-opentelemetry-receiver:4317"
- "--otlp-insecure"
- "--rate=10"
- "--duration=120s"
EOF
STEP 3. Configure a regular otel-integration deployment to send data to the Central Collector Cluster:
helm upgrade --install otel-coralogix-integration coralogix-charts-virtual/otel-integration \
--render-subchart-notes -f central-agent-values.yaml
Troubleshooting
Why am I getting ResourceExhausted errors when using Tail Sampling?
Typically, the errors look like this:
not retryable error: Permanent error: rpc error: code = ResourceExhausted desc = grpc: received message after decompression larger than max (5554999 vs. 4194304)
By default, the OTLP Server has a 4MiB size limit for a single gRPC request. This limit may be exceeded when the opentelemetry-agent sends trace data to the gateway's OTLP Server using the load balancing exporter. To resolve this, increase the size limit by adjusting the configuration. For example:
Additional Resources
| Documentation | Introduction to Tail Sampling with Coralogix & OpenTelemetry |
| OTLP Configuration | OTLP Receiver Configuration |
Target Allocator and Prometheus Operator with OpenTelemetry
Overview
Targets are endpoints that supply metrics via the Prometheus data model. For the Prometheus Receiver to scrape them, they can be statically configured via the static_configs parameters or dynamically discovered using one of the supported service discovery mechanisms.
The OpenTelemetry Target Allocator for Kubernetes, an optional component of the OpenTelemetry Operator now included in Coralogix's OpenTelemetry Integration Helm Chart, facilitates service discovery and manages the configuration of targets into the different agent collector's Prometheus Receiver across nodes.
If you're using the Prometheus Operator custom resources (ServiceMonitor and PodMonitor) and want to continue using them with the OpenTelemetry collector, you can enable target scraping through the Target Allocator component. This optional feature is disabled by default but can be enabled by setting opentelemetry-agent.targetAllocator.enabled: true in your values.yaml file.
When enabled, the target allocator is deployed as a separate deployment in the same namespace as the collector. It allocates targets to the agent collector on each node, enabling scraping of targets that reside on that specific node - effectively implementing a simple sharding mechanism. For high availability, you can run multiple target allocator instances by setting opentelemetry-agent.targetAllocator.replicas to a value greater than 1.
You can customize the scrape interval for Prometheus Custom Resources by configuring opentelemetry-agent.targetAllocator.prometheusCR.scrapeInterval. If not specified, it defaults to 30s.
For more details on Prometheus custom resources and target allocator see the documentation here.
Discovery
The Target Allocator discovers Prometheus Operator Custom Resources, namely the ServiceMonitor and PodMonitor as Metrics Targets. These metrics targets detail the endpoints of exportable metrics available on the Kubernetes cluster as "jobs."
Then, the Target Allocator detects available OpenTelemetry Collectors and distributes the targets among known collectors. As a result, the collectors routinely query the Target Allocator for their assigned metric targets to add to the scrape configuration.
Allocation strategies
Upon query from collectors, the Target Allocator assigns metric endpoint targets according to a chosen allocation strategy. To align with our chart's Opentelemetry agent in DaemonSet mode, the allocation strategy per node is preconfigured. This assigns each target to the OpenTelemetry collector running on the same Node as the metric endpoint.
Monitoring CRDs (ServiceMonitor & PodMonitor)
As part of the deployment model under the Prometheus Operator, concepts were introduced to simplify the configuration aspects of monitoring to align them with the capabilities of Kubernetes better.
Specifying endpoints under the monitoring scope as CRD objects:
Deployment in YAML files and packaging as Helm Charts or custom resources.
Decouples and de-centralises the monitoring configuration making it more agile for software changes and progression.
Reduces impact across monitored components for changes as there is no single standard file or resource to work with. Any different workload will continue to work.
Both ServiceMonitor and PodMonitor use selectors to detect pods or services to monitor with additional configurations on how to scrape them (e.g., port, interval, path).
ServiceMonitor
A ServiceMonitor provides metrics from the service itself and each of its endpoints. This means each pod implementing the service will be discovered and scraped.
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
serviceMonitorSelector: prometheus
name: prometheus
namespace: prometheus
spec:
endpoints:
- interval: 30s
targetPort: 9090
path: /metrics
namespaceSelector:
matchNames:
- prometheus
selector:
matchLabels:
target-allocation: "true"
Details:
endpoints: Defines an endpoint serving Prometheus metrics to be scraped by Prometheus. It specifies an interval, port, URL path, and scrape timeout duration. See the Endpoints spec.selector&namespaceSelector: Selectors for labels and namespaces from which the Kubernetes Endpoints objects will be discovered.
More details on writing the ServiceMonitor can be found in the ServiceMonitor Spec.
PodMonitor
For workloads that cannot be exposed behind a service, a PodMonitor is used instead.
This includes:
Services that are not HTTP-based, e.g. Kafka, SQS/SNS, JMS, etc.
Components such as CronJobs, DaemonSets, etc (e.g. using hostPort)
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: front-end
labels:
name: front-end
spec:
namespaceSelector:
matchNames:
- prometheus
selector:
matchLabels:
name: front-end
podMetricsEndpoints:
- targetPort: 8079
Details:
podMetricsEndpoints: Similar toendpoint, this defines the pod endpoint serving Prometheus metrics. See PodMetricsEndpoint spec.
Prerequisites
Kubernetes (v1.24+)
The command-line tool kubectl
Helm (v3.9+) installed and configured
CRDs for PodMonitors and ServiceMonitors installed.
Check that Custom Resource Definitions for PodMonitors and ServiceMonitors exist in your cluster using this command:
If not, you can install them with the following kubectl apply commands:
kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/main/example/prometheus-operator-crd/monitoring.coreos.com_podmonitors.yaml
kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/main/example/prometheus-operator-crd/monitoring.coreos.com_servicemonitors.yaml
Installation
The Target Allocator can be enabled by modifying the default values.yaml file in the OpenTelemetry Integration Chart. Once enabled, it is deployed to service the Prometheus Receivers of the OpenTelemetry Agent Collectors and allocate targets residing on the DaemonSet's nodes.
This guide assumes you have running services exporting Prometheus metrics running in your Kubernetes cluster.
STEP 1. Follow the instructions for Kubernetes Observability with OpenTelemetry, specifically the Advanced Configuration guide, which utilizes the otel-integration values.yaml file by setting opentelemetry-agent.targetAllocator.enabled to true:
opentelemetry-agent:
targetAllocator:
enabled: true ##set to true
replicas: 1
allocationStrategy: "per-node"
prometheusCR:
enabled: true
Also, as shown above, the default allocation strategy is per node to align with the OpenTelemetry agent's daemon set mode.
STEP 2. Install the Helm chart with the changes made to the values.yaml and deploy the target allocator pod:
helm upgrade --install otel-coralogix-integration coralogix-charts-virtual/otel-integration --render-subchart-notes -n <namespace> -f values.yaml
Troubleshooting
To check if the jobs and scrape configs generated by the Target Allocator are correct and ServiceMonitors and PodMonitors are successfully detected, port-forward to the Target Allocator's exposed service. The information will be available under the /jobs and /scrape_configs HTTP paths.
The Target Allocator’s service can be located with the following command: kubectl get svc -n <namespace>
Port forward to the target allocator pod with the following kubectl command:
You can browse or curl the /jobs and /scrape_configs endpoints for the detected PodMonitor & ServiceMonitor resources and the generated scrape configs.
The generated kubernetes_sd_configs is a common configuration syntax for discovering and scraping Kubernetes targets in Prometheus.
Opentelemetry EBPF Instrumentation
The OpenTelemetry EBPF Instrumentation is an OpenTelemetry component that uses eBPF to collect telemetry data from the Linux kernel, such as network metrics and spans, without requiring modifications to the application code. To enable the OpenTelemetry EBPF Instrumentation, set opentelemetry-ebpf-instrumenat.enabled to true in the values.yaml file.
for a full list of values for this chart, please look at values.yaml
K8s Cache
The OpenTelemetry EBPF Instrumentation includes a K8s Cache component that collects Kubernetes metadata and enriches the telemetry data with Kubernetes labels. This allows you to correlate the telemetry data with Kubernetes resources, such as Pods, Nodes, and Namespaces. The K8s Cache Component is critical for large scale kubernetes clusters, as it allows takes load of the K8s API Server by isolating the calls to only the K8s Cache services. The K8s Cache is turned on by default, with 2 replicas for high availability. You can configure the number of replicas by setting opentelemetry-ebpf-instrumentation.k8sCache.replicas in the values.yaml file. to turn off the K8s Cache, set opentelemetry-ebpf-instrumentation.k8sCache.replicas to 0 in the values.yaml file. Turning off the k8s cache will still enrich data with k8s metadata, but it will do so by calling the K8s API Server directly from each replica of the OpenTelemetry EBPF Instrumentation.
Coralogix Operator
The Coralogix Operator provides Kubernetes-native deployment and management for Coralogix, designed to simplify and automate the configuration of Coralogix APIs through Kubernetes custom resources definitions and controllers.
Enabling Coralogix Operator
To enable the Coralogix Operator, set coralogix-operator.enabled to true in the values.yaml file.




