Kubernetes complete observability: Advanced configuration

Coralogix provides Kubernetes Observability using OpenTelemetry for comprehensive monitoring of your Kubernetes clusters and applications. This guide explains advanced configuration options for optimizing your Kubernetes observability setup.

For basic configuration instructions, see our basic configuration tutorial.

Prerequisites

Kubernetes (version 1.24 or later) with kubectl command-line tool installed
Helm (version 3.9 or later) installed and configured

Overview

The OpenTelemetry Integration Chart uses the values.yaml file as its default configuration. This configuration is based on the OpenTelemetry Collector Configuration for both the OpenTelemetry Agent Collector and OpenTelemetry Cluster Collector.

Default configuration

STEP 1. Create a new YAML-formatted override file that defines values for the OpenTelemetry Integration Chart.

The following global values are the minimum required configurations for a working chart:

# values.yaml
global:
  domain: "<coralogix-endpoint>"
  clusterName: "<k8s-cluster-name>"

Configure these values:

domain: Specify your OpenTelemetry endpoint for the domain associated with your Coralogix account.
clusterName: A required identifier for your cluster

You can also copy additional configurations from the repository values.yaml file.

Note

If you want to override array values such as extraEnvs, extraVolumes, or extraVolumeMounts, note that Helm doesn't support array merging. Instead, arrays are nulled out. If you need to customize these arrays, first copy the existing values from the provided values.yaml file.

STEP 2. Save this file as values.yaml

STEP 3. Install using the helm upgrade --install command:

helm upgrade --install otel-integration \
  coralogix-charts-virtual/otel-integration \
  -f values.yaml \
  -n $NAMESPACE

Optional configurations

Enabling dependent charts

The OpenTelemetry Agent is primarily used for collecting application telemetry, while the OpenTelemetry Cluster Collector is primarily used to collect cluster-level data. Depending on your requirements, you can either use the default configuration that enables both components, or you can choose to disable either of them by modifying the enabled flag in the values.yaml file under the opentelemetry-agent or opentelemetry-cluster-collector section as shown below:

...
opentelemetry-agent:
  enabled: true
  mode: daemonset
...
opentelemetry-cluster-collector:
  enabled: true
  mode: deployment

Installing the chart on clusters with mixed operating systems (Linux and Windows)

Installing otel-integration is also possible on clusters that support running Windows workloads on Windows node alongside Linux nodes (such as EKS, AKS or GKE). The collector will be installed on Linux nodes, as these components are supported only on Linux operating systems. Conversely, the agent will be installed on both Linux and Windows nodes as a daemonset, in order to collect metrics for both operating systems. In order to do so, the chart needs to be installed with few adjustments.

Depending on your Windows server version, you might need to adjust the image you are using with the Windows agent. The default image is coralogixrepo/opentelemetry-collector-contrib-windows:<semantic_version>. For Windows 2022 servers, use the coralogixrepo/opentelemetry-collector-contrib-windows:<semantic_version>-windows2022 version. You can do this by adjusting the opentelemetry-agent-windows.image.tag value in the values-windows.yaml file.

Add the Coralogix Helm charts repository to your local repository list by running:

helm repo add coralogix-charts-virtual https://cgx.jfrog.io/artifactory/coralogix-charts-virtual

To update your local Helm repository cache with the latest charts, run:

helm repo update

Install the chart using the values-windows.yaml CRD file. You can provide the global values (secret key and cluster name) in one of two ways:

Edit the main values.yaml file and pass both files to the helm upgrade command:

helm upgrade --install otel-coralogix-integration coralogix-charts-virtual/otel-integration \
  --render-subchart-notes -f values.yaml -f values-windows.yaml

Provide the values directly in the command line by passing them with the --set flag:

helm upgrade --install otel-coralogix-integration coralogix-charts-virtual/otel-integration \
  --render-subchart-notes -f values-windows.yaml --set global.clusterName=<cluster_name> --set global.domain=<domain>

Service pipelines

The OpenTelemetry Collector Configuration guides you to initialise components and then add them to the pipelines in the service section. It is important to ensure that the telemetry type is supported. For example, the prometheus receiver documentation in the README states that it only supports metrics. Therefore, the following prometheus receiver can only be defined under receivers and added to the metrics pipelines in the service block to enable it.

opentelemetry-agent:
  config:
    receivers:
      prometheus:
        config:
          scrape_configs:
            - job_name: opentelemetry-infrastructure-collector
              scrape_interval: 30s
              static_configs:
                - targets:
                    - ${MY_POD_IP}:8888
    service:
      pipelines:
        logs:
        metrics:
          receivers:
            - prometheus
        traces:

Coralogix exporter

In both charts, you have the option to configure the sending of logs, metrics, and / or traces to Coralogix. This can be done by configuring the Coralogix Exporter for different pipelines. The default values.yaml file includes all three options, but you can customize it by removing the coralogix exporter from the pipelines configuration for either logs, metrics, or traces.

The following opentelemetry-agent exporter configuration also applies to the opentelemetry-cluster-collector:

global:
  domain: "<coralogix-domain>"
  clusterName: "<cluster-name>"
  defaultApplicationName: "otel"
  defaultSubsystemName: "integration"
opentelemetry-agent:
  config:
    exporters:
      coralogix:
        timeout: "30s"
        private_key: "${CORALOGIX_PRIVATE_KEY}"
        ## Values set in "global" section
        domain: "{{ '{{' }} .Values.global.domain }}"
        application_name: "{{ '{{' }} .Values.global.defaultApplicationName }}"
        subsystem_name: "{{ '{{' }} .Values.global.defaultSubsystemName }}"
    service:
      pipelines:
        metrics:
          exporters:
            - coralogix
        traces:
          exporters:
            - coralogix
        logs:
          exporters:
            - coralogix

OpenTelemetry Agent

The OpenTelemetry Agent is enabled and deployed as a daemonset by default. This creates an Agent pod per node. Allowing the collection of logs, metrics, and traces from application pods to be sent to OpenTelemetry pods hosted on the same node and spreads the ingestion load across the cluster. Be aware that the OpenTelemetry Agent pods consumes resources (e.g., CPU & memory) from each node on which it runs.

opentelemetry-agent:
  enabled: true
  mode: daemonset

Note

If there are nodes without a running OpenTelemetry Agent pod, the hosted pods of applications may be missing metadata attributes (e.g. node info and host name) in the telemetry sent.

Agent presets

The multi-instanced OpenTelemetry Agent can be deployed across multiple nodes as a daemonset. It provides presets for collecting host metrics, Kubernetes attributes, and Kubelet metrics. When logs, metrics, and traces are generated from a pod, the collector enriches them with the metadata associated with the hosting machine. This metadata is very useful for linking infrastructure issues with performance degradation in services.

Enabling the transactions preset groups all spans in a trace into Coralogix transactions, automatically tagging spans with the cgx.transaction identifier and marking transaction roots via cgx.transaction.root. This unlocks the transactions and service flows views without any extra manual configuration.

For more information on presets, refer to the documentation in values.yaml

# example
opentelemetry-agent:
...
  presets:
    # LogsCollection preset enables a configured filelog receiver to read all containers' logged console output (/var/log/pods/*/*/*.log).
    logsCollection:
      enabled: true
    # KubernetesAttributes preset collects Kubernetes metadata such as k8s.pod.name, k8s.namespace.name, and k8s.node.name. It also adjusts the ClusterRole with appropriate RBAC roles to query the Kubernetes API.
    kubernetesAttributes:
      enabled: true
    # HostMetrics preset enables collection of host metrics, involving CPU, memory, disk and network.
    hostMetrics:
      enabled: true
      # Process preset adds collection of host processes.
      process:
        enabled: true
    # KubeletMetrics enables the kubeletstats receiver to collect node, pod and container metrics from the Kubernetes API. It also adjusts the ClusterRole with appropriate RBAC roles.
    kubeletMetrics:
      enabled: true
    # Transactions preset groups all spans in a trace and enables the Coralogix transaction processor.
    transactions:
      enabled: true
      waitDuration: 30s

For example, setting the kubeletMetrics preset to true will configure the kubeletstats receiver to pull node, pod, container, and volume metrics from the API server of the host's kubelet. The metrics will be sent to the metric pipeline.

# example
receivers:
  kubeletstats:
    auth_type: serviceAccount
    collection_interval: 20s
    endpoint: ${K8S_NODE_NAME}:10250
    collect_all_network_interfaces:
      pod: true
      node: true

Receivers

Once configured, you will be able to send logs, metrics, and traces to be collected in the OpenTelemetry Agent pods before exporting them to Coralogix.

To achieve this, you need to first instrument your application with OpenTelemetry SDKs and expose the Collector to a corresponding receiver. It is recommended to use the OTLP receiver (OpenTelemetry protocol) for transmission over gRPC or HTTP endpoints.

The daemonset deployment of the OpenTelemetry Agent also uses hostPort for the otlp port, allowing agent pod IPs to be reachable via node IPs, as follows:

# K8s daemonset otlp port config
ports:
- containerPort: 4317
  hostPort: 4317
  name: otlp
  protocol: TCP

Configuring auto-instrumented JavaScript applications

The following examples demonstrate how to configure an auto-instrumented JavaScript application to send traces to the agent pod's gRPC receiver.

STEP 1. Set the Kubernetes environment variables of the JavaScript application's deployment/pod as in the example below. Define the OTEL_EXPORTER_OTLP_ENDPOINT as the configured NODE_IP and OTLP_PORT. Configure OTEL_TRACES_EXPORTER to send in the otlp format. Choose OTEL_EXPORTER_OTLP_PRO as grpc.

# kubernetes deployment manifest's env section
spec:
  containers:
  ...
  env:
  - name: NODE_IP
    valueFrom:
      fieldRef:
        fieldPath: status.hostIP
  - name: OTLP_PORT
    value: "4317"
  - name: OTEL_EXPORTER_OTLP_ENDPOINT
    value: "http://$(NODE_IP):$(OTLP_PORT)"
  - name: OTEL_TRACES_EXPORTER
    value: "otlp"
    - name: OTEL_EXPORTER_OTLP_PROTOCOL
    value: "grpc"

STEP 2. By default the agent has the otlp receiver configured as follows:

# collector config
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: ${MY_POD_IP}:4317
      http:
        endpoint: ${MY_POD_IP}:4318

Note

${MY_POD_IP} is a container environment variable that is mapped to the pod's IP address.
The agent is also preconfigured to collect data from jaeger.

Processors

Processors are generally used to process logs, metrics, and traces before the data is exported. This may include, for example, modifying or altering attributes or sampling traces.

In the example below, a k8sattributes processor is used to automatically discover k8s resources (pods), extract metadata from them, and add the extracted metadata to the relevant logs, metrics, and spans as resource attributes.

# default in values.yaml
processors:
    k8sattributes:
    filter:
      node_from_env_var: KUBE_NODE_NAME
    extract:
      metadata:
        - "k8s.namespace.name"
        - "k8s.deployment.name"
        - "k8s.statefulset.name"
        - "k8s.daemonset.name"
        - "k8s.cronjob.name"
        - "k8s.job.name"
        - "k8s.pod.name"
        - "k8s.node.name"

Note

The k8sattributes processor is enabled by default at the preset level as kubernetesAttributes and further extended in the default values.yaml.
More information can be found in the Kubernetes Attributes Processor README.

OpenTelemetry Cluster Collector

Enable the opentelemetry-cluster-collector by setting enabled to true.

opentelemetry-cluster-collector:
  enabled: true
  mode: deployment

Note

The cluster collector operates as a deployment workload with a minimal replica of 1 to avoid duplication of telemetry data.

Cluster collector presets

The cluster collector is best suited to enable presets such as Kubernetes Events and Cluster Metrics. A smaller instance count of the deployment is sufficient to query the Kubernetes API.

  presets:
    clusterMetrics:
      enabled: true
    kubernetesEvents:
      enabled: true
    kubernetesExtraMetrics:
      enabled: true

For example, if you enable the kubernetesEvents preset, the Kubernetes objects receiver configuration will be added dynamically during the Helm installation. This configuration enables the collection of events.k8s.io objects from the Kubernetes API server.

Kubernetes events: reducing the amount of collected data

When collecting Kubernetes events using the cluster collector, it is common for the number of events to reach millions, especially in large clusters with numerous nodes and constantly scaling applications. To collect only the relevant data, you can use the following settings.

Cleaning data

By default, a transform processor named transform/kube-events is configured to remove unneeded fields from the collected Kubernetes events. You may override this or alter the fields as desired.

processors:
  transform/kube-events:
    log_statements:
      - context: log
        statements:
          - keep_keys(body["object"], ["type", "eventTime", "reason", "regarding", "note", "metadata", "deprecatedFirstTimestamp", "deprecatedLastTimestamp"])
          - keep_keys(body["object"]["metadata"], ["creationTimestamp"])
          - keep_keys(body["object"]["regarding"], ["kind", "name", "namespace"])

Filtering Kubernetes events

In large-scale environments, where there are numerous events occurring per hour, it may not be necessary to process all of them. In such cases, you can use an additional OpenTelemetry processor to filter out the events that do not need to be sent to Coralogix.

Below is a sample configuration for reference. This configuration filters out any event that has the field reason with one of those values BackoffLimitExceeded|FailedScheduling|Unhealthy.

processors:
  filter/kube-events:
    logs:
      log_record:
        - 'IsMatch(body["reason"], "(BackoffLimitExceeded|FailedScheduling|Unhealthy)") == true'

Collecting warning events only

Currently, Kubernetes has two different types of events: Normal and Warning. As we have the ability to filter events according to their type, you may choose to collect only Warning events, as these events are key to troubleshooting. One example could be the use of a filter processor to drop all unwanted Normal-type events.

processors:
  filter/kube-events:
    logs:
      log_record:
        - 'IsMatch(body["object"]["type"], "Normal")'

Resource Catalog

The Coralogix Resource Catalog can be used to monitor the various resource types within your Kubernetes clusters. It collects component details and lets you observe performance metrics and review logs of the associated components. Data for this feature comes from multiple sources. There are several presets that can be used to enable these features.

Kubernetes resources preset

This preset enables the scrape of the Kubernetes API to populate your Kubernetes resource inventory. It uses the k8sobjects receiver and collects objects as defined in this configuration, uses a processor to enrich the collected objects, and exports it with a customized coralogix/resource_catalog exporter.

This preset needs to be enabled only in the cluster-collector configuration.

  presets:
    kubernetesResources:
      enabled: true

Host details presets

The last two presets collect important host information to enrich the catalog. This data is collected by the agent nodes and consists of host entity events and processes collected by hostmetrics receiver. While the hostEntityEvents preset is required, the hostMetrics.process preset is optional.

  presets:
    hostEntityEvents:
      enabled: true

Note

The hostMetrics process preset is detailed in the Agent presets section above.
It is recommended to use the hostMetric preset only on agent collectors. Applying this preset to other collector types may result in duplicate host metrics.

Kubernetes infrastructure monitoring

If you already have an existing log shipper (such as, Fluentd, Filebeat) in place and your goal is to monitor all Kubernetes elements of your cluster, follow these steps to enable only the necessary collection of metrics and Kubernetes events to be sent to Coralogix.

STEP 1. Copy the following into a YAML-formatted override file and save as values.yaml.

global:
  domain: "<coralogix-endpoint>"
  clusterName: "<k8s-cluster-name>"

opentelemetry-agent:
  presets:
    logsCollection:
      enabled: false
  config:
    exporters:
      logging: {}
    receivers:
      zipkin: null
      jaeger: null

    service:
      pipelines:
        traces:
          exporters:
            - logging
          receivers:
            - otlp
        logs:
          exporters:
            - logging
          receivers:
            - otlp

STEP 2. Install with the helm upgrade --install command.

helm upgrade --install otel-integration coralogix-charts-virtual/otel-integration -f values.yaml -n $NAMESPACE

Installing the chart on GKE Autopilot clusters

GKE Autopilot has limited access to host filesystems, host networking and host ports. Due to this some features of OpenTelemetry Collector do not work. More information about limitations is available in GKE Autopilot security capabilities document

Notable important differences from the regular otel-integration are:

Host metrics receiver is not available, though you still get some metrics about the host through kubeletstats receiver.
Host networking and host ports are not available, users need to send tracing spans through Kubernetes Service. The Service uses internalTrafficPolicy: Local, to send traffic to locally running agents.
Log Collection works, but does not store check points. Restarting the agent will collect logs from the beginning.

To install otel-integration to GKE/Autopilot follow these steps:

First make sure to add our Helm charts repository to the local repos list with the following command:

helm repo add coralogix-charts-virtual https://cgx.jfrog.io/artifactory/coralogix-charts-virtual

In order to get the updated Helm charts from the added repository, please run:

helm repo update

Install the chart with the CRD gke-autopilot-values.yaml file. You can either provide the global values (secret key, cluster name) by adjusting the main values.yaml file and then passing the values.yaml file to the helm upgrade command as following:

helm upgrade --install otel-coralogix-integration coralogix-charts-virtual/otel-integration \
  --render-subchart-notes -f values.yaml -f gke-autopilot-values.yaml

Or you can provide the values directly in the command line by passing them with the --set flag:

helm upgrade --install otel-coralogix-integration coralogix-charts-virtual/otel-integration \
  --render-subchart-notes -f gke-autopilot-values.yaml --set global.clusterName=<cluster_name> --set global.domain=<domain>

Installing the chart on IPv6-only clusters

To run otel-integration inside an IPv6-only cluster, you need to install the chart using the ipv6-values.yaml file.

First, make sure to add our Helm charts repository to the local repo list using the following command:

helm repo add coralogix-charts-virtual https://cgx.jfrog.io/artifactory/coralogix-charts-virtual

To get the updated Helm charts from the added repository, run:

helm repo update

Install the chart with the ipv6-values.yaml file. You can either provide the global values (secret key, cluster name) by adjusting the main values.yaml file and then passing the values.yaml file to the helm upgrade command as follows:

helm upgrade --install otel-coralogix-integration coralogix-charts-virtual/otel-integration \
  --render-subchart-notes -f values.yaml -f ipv6-values.yaml

Installing the chart on EKS Fargate clusters

AWS EKS Fargate is a serverless compute engine for Kubernetes that removes the need to provision and manage EC2 instances. Since Fargate pods run in an isolated environment, some collector features require special configuration.

Prerequisites

Before installing the chart on EKS Fargate, ensure the following:

CoreDNS addon: The EKS cluster must have the CoreDNS addon installed for DNS resolution to work. If your cluster doesn't have CoreDNS, install it using:

CLUSTER_VERSION=$(aws eks describe-cluster --name <cluster-name> --region <region> --query 'cluster.version' --output text)
COREDNS_VERSION=$(aws eks describe-addon-versions --addon-name coredns --kubernetes-version $CLUSTER_VERSION --region <region> --query 'addons[0].addonVersions[0].addonVersion' --output text)
aws eks create-addon --cluster-name <cluster-name> --addon-name coredns --addon-version $COREDNS_VERSION --region <region>

Fargate Profile: A Fargate profile must be created for the namespace where you plan to deploy the collectors. If you're deploying to the default namespace, create a Fargate profile:

aws eks create-fargate-profile \
  --cluster-name <cluster-name> \
  --region <region> \
  --fargate-profile-name default \
  --pod-execution-role-arn <pod-execution-role-arn> \
  --subnets <subnet-id-1> <subnet-id-2> <subnet-id-3> \
  --selectors namespace=default

VPC DNS Settings: Ensure DNS support and DNS hostnames are enabled for your VPC:

aws ec2 modify-vpc-attribute --vpc-id <vpc-id> --enable-dns-support
aws ec2 modify-vpc-attribute --vpc-id <vpc-id> --enable-dns-hostnames

Notable important differences from the regular otel-integration are:

Host metrics receiver is not available, though you still get some metrics about the host through kubeletstats receiver.
Host networking and host ports are not available, users need to send tracing spans through Kubernetes Service.
Log collection via hostPath mounts is not supported due to Fargate limitations.
The collector requires the K8S_NODE_NAME environment variable to be set for proper node identification and kubelet stats collection.

Deployment Modes

There are two primary deployment patterns for EKS Fargate:

Per-namespace collector (opentelemetry-agent-eks-fargate): Deploy the OpenTelemetry Collector as a StatefulSet in each Fargate namespace where your applications run. This collector will collect your application's telemetry data (traces, metrics, and logs) and also gather kubelet stats metrics from its own Fargate node. This is the recommended approach when you want to deploy the collector alongside your applications in Fargate.
Centralized monitoring collector (opentelemetry-agent-eks-fargate-monitoring): Deploy a dedicated OpenTelemetry Collector as a Deployment that acts as a centralized infrastructure monitoring component. This collector automatically discovers all Fargate nodes in the cluster and collects kubelet stats metrics from each of them. It uses the receiver creator to dynamically discover kubelet endpoints and filters metrics to only collect from Fargate nodes. This pattern is useful when you want to monitor the infrastructure separately from application telemetry, or when you want a single collector to gather node-level metrics across all Fargate pods in the cluster.

Why is this needed? Due to Fargate networking restrictions, a pod cannot communicate with its own kubelet endpoint to collect its own metrics. The per-namespace collector uses an init container to label its node with OTEL-collector-node=true, and the centralized monitoring collector specifically targets nodes with this label to collect the missing kubelet stats metrics. This workaround ensures complete infrastructure monitoring coverage across all Fargate nodes.

Installation

First, make sure to add our Helm charts repository to the local repo list using the following command:

helm repo add coralogix-charts-virtual https://cgx.jfrog.io/artifactory/coralogix-charts-virtual

To get the updated Helm charts from the added repository, run:

helm repo update

Install the chart with the values-eks-fargate.yaml file. You must provide the required global values (clusterName and domain). You can either adjust the main values.yaml file with these values and then pass it to the helm upgrade command:

helm upgrade --install otel-coralogix-integration coralogix-charts-virtual/otel-integration \
  --render-subchart-notes -f values.yaml -f values-eks-fargate.yaml

Or you can provide the values directly in the command line by passing them with the --set flag:

helm upgrade --install otel-coralogix-integration coralogix-charts-virtual/otel-integration \
  --render-subchart-notes -f values-eks-fargate.yaml \
  --set global.clusterName=<cluster_name> \
  --set global.domain=<coralogix-endpoint>

Note

The global.domain value must be set to your Coralogix endpoint domain (e.g., coralogix.com, coralogix.us, coralogix.in, etc.). If you have the domain stored in the CORALOGIX_DOMAIN environment variable, you can use --set global.domain=$CORALOGIX_DOMAIN.

Configuration

The values-eks-fargate.yaml file enables both deployment modes by default. To use only one mode, you can disable the other:

To use only the per-namespace collector, set opentelemetry-agent-eks-fargate-monitoring.enabled: false in your values file.
To use only the centralized monitoring collector, set opentelemetry-agent-eks-fargate.enabled: false in your values file.

The EKS Fargate preset configuration is nested under each collector's configuration. For the per-namespace collector (opentelemetry-agent-eks-fargate):

opentelemetry-agent-eks-fargate:
  presets:
    eksFargate:
      # Set to false for per-namespace collectors
      monitoringCollector: false
      kubeletStats:
        # Collection interval for kubelet stats metrics
        collectionInterval: "30s"
      initContainer:
        enabled: true
        image:
          repository: "public.ecr.aws/aws-cli/aws-cli"
          tag: "2.28.17"

For the centralized monitoring collector (opentelemetry-agent-eks-fargate-monitoring):

opentelemetry-agent-eks-fargate-monitoring:
  presets:
    eksFargate:
      # Set to true for centralized monitoring collector
      monitoringCollector: true
      kubeletStats:
        # Collection interval for kubelet stats metrics
        collectionInterval: "30s"

Required Environment Variables

When using EKS Fargate, the K8S_NODE_NAME environment variable is automatically configured in each collector's configuration. For example, in opentelemetry-agent-eks-fargate:

opentelemetry-agent-eks-fargate:
  extraEnvs:
    - name: K8S_NODE_NAME
      valueFrom:
        fieldRef:
          fieldPath: spec.nodeName

The same configuration is also present in opentelemetry-agent-eks-fargate-monitoring. This variable is used by the resource detection processor to identify the node and by the receiver creator to collect kubelet stats.

Note: Due to Fargate limitations, these options will not work: - presets.hostMetrics - presets.logsCollection (container log collection via hostPath mounts)

Next steps

Validation instructions can be found here.

Tail Sampling with OpenTelemetry using Kubernetes

This tutorial demonstrates how to configure a Kubernetes cluster, deploy OpenTelemetry to collect logs, metrics, and traces, and enable trace sampling. We will cover an example of enabling a tail sample for the Opentelemetry Demo Application and a more precise example using the small trace-generating application.

Prerequisites

A Kubernetes cluster
Helm installed
Coralogix Send-Your-Data API key

How it works

The Kubernetes OpenTelemetry Integration consists of the following components:

OpenTelemetry Agent. The Agent is deployed to each node within the Cluster and collects telemetry data from the applications running on that node. The agent is configured to send the telemetry data to the OpenTelemetry Gateway. The agent ensures that traces with the same ID are sent to the same gateway. This allows tail sampling to be performed on the traces correctly, even if they span multiple applications and nodes.
OpenTelemetry Gateway. The Gateway is responsible for receiving telemetry data from the agents and forwarding it to the Coralogix backend. The Gateway is also responsible for load balancing the telemetry data to the Coralogix backend.

Install the Coralogix OpenTelemetry Integration

This integration uses the Coralogix OpenTelemetry Helm Chart. While this document focuses on tail sampling for traces, deploying this chart also deploys the infrastructure to collect logs, metrics, and traces from your Kubernetes cluster and pods.

The following configuration enables OTel-agent pods to send span data to the coralogix-opentelemetry-gateway deployment using the loadbalancing exporter.

To ensure optimal performance:

Configure an appropriate number of replicas based on your traffic volume
Set resource requests and limits to handle the expected load
Define custom tail sampling policies to control which spans are collected.

Note

When running in OpenShift environments, set distribution: "openshift" in your values.yaml
When running in Windows environments, use the values-windows-tailsampling.yaml values file

STEP 1. Add the Coralogix Helm repository.

helm repo add coralogix-charts-virtual https://cgx.jfrog.io/artifactory/coralogix-charts-virtual

STEP 2. Copy the tail-sampling-values.yaml file found here and update the relevant fields with your values.

global:
  domain: "<your-coralogix-domain>"
  clusterName: ""
  defaultApplicationName: "otel"
  defaultSubsystemName: "integration"
  logLevel: "warn"
  collectionInterval: "30s"

opentelemetry-agent:
  enabled: true
  mode: daemonset
  presets:
    loadBalancing:
      enabled: true
      routingKey: "traceID"
      hostname: coralogix-opentelemetry-gateway

  config:
    service:
      pipelines:
        traces:
          exporters:
            - loadbalancing

opentelemetry-gateway:
  enabled: true
  replicaCount: 3

  config:
    processors:
      tail_sampling:
        decision_wait: 10s
        num_traces: 100
        expected_new_traces_per_sec: 10
        policies:
          [
            {
              name: errors-policy,
              type: status_code,
              status_code: {status_codes: [ERROR]}
            },
            {
              name: randomized-policy,
              type: probabilistic,
              probabilistic: {sampling_percentage: 10}
            },
          ]

opentelemetry-collector:
  enabled: false

STEP 3. Add your Coralogix Send-Your-Data API key to the tail-sampling-values.yaml file.

kubectl create secret generic coralogix-keys --from-literal 'PRIVATE_KEY=<your-private-key>'

STEP 4. Install the otel-integration.

helm upgrade --install otel-coralogix-integration coralogix-charts-virtual/otel-integration \
  --render-subchart-notes -f tail-sampling-values.yaml

kubectl get pods
NAME                                               READY   STATUS    RESTARTS   AGE
coralogix-opentelemetry-agent-86qdb                1/1     Running   0          7h59m
coralogix-opentelemetry-gateway-65dfbb5567-6rk4j   1/1     Running   0          7h59m
coralogix-opentelemetry-gateway-65dfbb5567-g7m5l   1/1     Running   0          7h59m
coralogix-opentelemetry-gateway-65dfbb5567-zbprd   1/1     Running   0          7h59m

You should end up with as many opentelemetry-agent pods as you have nodes in your cluster, and 3 opentelemetry-gateway pods.

Install test application environment

In the next section, we will describe the process for installing 2 application environments, the OpenTelemetry Demo Application and a Small Trace Generating. You do not need to install both these examples.

Install OpenTelemetry demo

STEP 1. Add the Helm chart for the OpenTelemetry Demo Application.

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts

STEP 2. Create a values.yaml file and add the following:

default:
  env:
    - name: OTEL_SERVICE_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: "metadata.labels['app.kubernetes.io/component']"
    - name: OTEL_COLLECTOR_NAME
      value: '{{ '{{' }} include "otel-demo.name" . }}-otelcol'
    - name: OTEL_EXPORTER_OTLP_ENDPOINT
      value: http://$(OTEL_COLLECTOR_NAME):4317
    - name: OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE
      value: cumulative
    - name: OTEL_RESOURCE_ATTRIBUTES
      value: service.name=$(OTEL_SERVICE_NAME),service.namespace=opentelemetry-demo

  envOverrides:
    - name: OTEL_COLLECTOR_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: spec.nodeName
    - name: OTEL_EXPORTER_OTLP_ENDPOINT
      value: http://$(OTEL_COLLECTOR_NAME):4317

serviceAccount:
  create: true
  annotations: {}
  name: ""

opentelemetry-collector:
  enabled: false

jaeger:
  enabled: false

prometheus:
  enabled: false

grafana:
  enabled: false

This will configure the OpenTelemetry Demo Application to send traces to the Coralogix OpenTelemetry Agent running on the node.

STEP 3. Install the Opentelemetry Demo Application.

helm install otel-demo open-telemetry/opentelemetry-demo -f values.yaml

NAME: my-otel-demo
LAST DEPLOYED: Mon Feb 19 23:29:16 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:

Install the small trace-generating application

This application is a small trace-generating application. We will demonstrate how to connect it to the Coralogix OpenTelemetry Agent to enable tail sampling.

STEP 1. Create a file go-traces-demo.yaml and add the following:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: go-otel-traces-demo
spec:
  selector:
    matchLabels:
      app: go-otel-traces-demo
  replicas: 1
  template:
    metadata:
      labels:
        app: go-otel-traces-demo
    spec:
      containers:
        - name: go-otel-traces-demo
          image: public.ecr.aws/c1s3k2h4/go-otel-traces-demo:latest
          imagePullPolicy: Always
          env:
            - name: NODE_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.hostIP
            - name: CX_ENDPOINT
              value: $(NODE_IP):4317

STEP 2. Apply the Kubernetes deployment.

kubectl apply -f go-traces-demo.yaml

Validation

View your telemetry data in your Coralogix dashboard. Traces should arrive from the tail-sampling load balancer.

Configuring Head Sampling for Tracing

Head sampling is a feature that allows you to sample traces at the collection point. When enabled, it creates a separate pipeline for sampled traces using probabilistic sampling. This helps reduce the volume of traces while maintaining a representative sample.

When used in combination with tail sampling, head sampling is applied first at the agent level. The sampled traces are then forwarded to the tail sampling collectors, where additional sampling decisions can be made. This means that tail sampling will only see and process the traces that have already passed through head sampling.

The sampling configuration:

Creates a new 'traces/sampled' pipeline in addition to the main traces pipeline
Applies probabilistic sampling based on the configured percentage
Supports different sampling modes:
"proportional": Maintains the relative proportion of traces across services
"equalizing": Attempts to sample equal numbers of traces from each service
"hash_seed": Uses consistent hashing to ensure the same traces are sampled

To enable head sampling, configure the following in your values.yaml:

presets:
  headSampling:
    enabled: true
    # Percentage of traces to sample (0-100)
    percentage: 10
    # Sampling mode - "proportional", "equalizing", "hash_seed"
    mode: "proportional"

Deploying Central Collector Cluster for Tail Sampling

To deploy OpenTelemetry Collector in a separate "central" Kubernetes cluster for telemetry collection and tail sampling using OpenTelemetry Protocol (OTLP) receivers, install otel-integration using the central-tail-sampling-values.yaml values file. Review the values file for detailed configuration options.

This deployment creates two key components:

opentelemetry-receiver. Receives OTLP data and sends metrics and logs directly to Coralogix while performing load balancing of span data sent to the opentelemetry-gateway deployment.
opentelemetry-gateway. Performs tail sampling decisions on the received span data before forwarding to Coralogix

To enable other Kubernetes clusters to send data to the opentelemetry-receiver, expose it using one of these methods:

Service of type LoadBalancer
Ingress object configuration
Manual load balancer configuration

Important

Ensure you configure sufficient replicas and appropriate resource requests/limits to handle the expected load. You'll also need to set up custom tail sampling processor policies.

STEP 1. Run the following commands to deploy the Central Collector Cluster.

helm repo add coralogix-charts-virtual https://cgx.jfrog.io/artifactory/coralogix-charts-virtual

helm upgrade --install otel-coralogix-central-collector coralogix-charts-virtual/otel-integration \
  --render-subchart-notes -f central-tail-sampling-values.yaml

STEP 2. Validate the deployment by sending a sample of OTLP data to the opentelemetry-receiver Service and navigating to the Coralogix Explore Screen to view collected traces. This can be done via telemetrygen:

kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: telemetrygen-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: telemetrygen
  template:
    metadata:
      labels:
        app: telemetrygen
    spec:
      containers:
      - name: telemetrygen
        image: ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen:latest
        args:
          - "traces"
          - "--otlp-endpoint=coralogix-opentelemetry-receiver:4317"
          - "--otlp-insecure"
          - "--rate=10"
          - "--duration=120s"
EOF

STEP 3. Configure a regular otel-integration deployment to send data to the Central Collector Cluster:

helm upgrade --install otel-coralogix-integration coralogix-charts-virtual/otel-integration \
  --render-subchart-notes -f central-agent-values.yaml

Troubleshooting

Why am I getting ResourceExhausted errors when using Tail Sampling?

Typically, the errors look like this:

not retryable error: Permanent error: rpc error: code = ResourceExhausted desc = grpc: received message after decompression larger than max (5554999 vs. 4194304)

By default, the OTLP Server has a 4MiB size limit for a single gRPC request. This limit may be exceeded when the opentelemetry-agent sends trace data to the gateway's OTLP Server using the load balancing exporter. To resolve this, increase the size limit by adjusting the configuration. For example:

receivers:
  otlp:
    protocols:
      grpc:
        max_recv_msg_size_mib: 20

Additional Resources


Documentation	Introduction to Tail Sampling with Coralogix & OpenTelemetry
OTLP Configuration	OTLP Receiver Configuration

Target Allocator and Prometheus Operator with OpenTelemetry

Overview

Targets are endpoints that supply metrics via the Prometheus data model. For the Prometheus Receiver to scrape them, they can be statically configured via the static_configs parameters or dynamically discovered using one of the supported service discovery mechanisms.

The OpenTelemetry Target Allocator for Kubernetes, an optional component of the OpenTelemetry Operator now included in Coralogix's OpenTelemetry Integration Helm Chart, facilitates service discovery and manages the configuration of targets into the different agent collector's Prometheus Receiver across nodes.

If you're using the Prometheus Operator custom resources (ServiceMonitor and PodMonitor) and want to continue using them with the OpenTelemetry collector, you can enable target scraping through the Target Allocator component. This optional feature is disabled by default but can be enabled by setting opentelemetry-agent.targetAllocator.enabled: true in your values.yaml file.

When enabled, the target allocator is deployed as a separate deployment in the same namespace as the collector. It allocates targets to the agent collector on each node, enabling scraping of targets that reside on that specific node - effectively implementing a simple sharding mechanism. For high availability, you can run multiple target allocator instances by setting opentelemetry-agent.targetAllocator.replicas to a value greater than 1.

You can customize the scrape interval for Prometheus Custom Resources by configuring opentelemetry-agent.targetAllocator.prometheusCR.scrapeInterval. If not specified, it defaults to 30s.

For more details on Prometheus custom resources and target allocator see the documentation here.

Discovery

The Target Allocator discovers Prometheus Operator Custom Resources, namely the ServiceMonitor and PodMonitor as Metrics Targets. These metrics targets detail the endpoints of exportable metrics available on the Kubernetes cluster as "jobs."

Then, the Target Allocator detects available OpenTelemetry Collectors and distributes the targets among known collectors. As a result, the collectors routinely query the Target Allocator for their assigned metric targets to add to the scrape configuration.

Allocation strategies

Upon query from collectors, the Target Allocator assigns metric endpoint targets according to a chosen allocation strategy. To align with our chart's Opentelemetry agent in DaemonSet mode, the allocation strategy per node is preconfigured. This assigns each target to the OpenTelemetry collector running on the same Node as the metric endpoint.

Monitoring CRDs (ServiceMonitor & PodMonitor)

As part of the deployment model under the Prometheus Operator, concepts were introduced to simplify the configuration aspects of monitoring to align them with the capabilities of Kubernetes better.

Specifying endpoints under the monitoring scope as CRD objects:

Deployment in YAML files and packaging as Helm Charts or custom resources.
Decouples and de-centralises the monitoring configuration making it more agile for software changes and progression.
Reduces impact across monitored components for changes as there is no single standard file or resource to work with. Any different workload will continue to work.

Both ServiceMonitor and PodMonitor use selectors to detect pods or services to monitor with additional configurations on how to scrape them (e.g., port, interval, path).

ServiceMonitor

A ServiceMonitor provides metrics from the service itself and each of its endpoints. This means each pod implementing the service will be discovered and scraped.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    serviceMonitorSelector: prometheus
  name: prometheus
  namespace: prometheus
spec:
  endpoints:
  - interval: 30s
    targetPort: 9090
    path: /metrics
  namespaceSelector:
    matchNames:
    - prometheus
  selector:
    matchLabels:
      target-allocation: "true"

Details:

endpoints: Defines an endpoint serving Prometheus metrics to be scraped by Prometheus. It specifies an interval, port, URL path, and scrape timeout duration. See the Endpoints spec.
selector & namespaceSelector: Selectors for labels and namespaces from which the Kubernetes Endpoints objects will be discovered.

More details on writing the ServiceMonitor can be found in the ServiceMonitor Spec.

PodMonitor

For workloads that cannot be exposed behind a service, a PodMonitor is used instead.

This includes:

Services that are not HTTP-based, e.g. Kafka, SQS/SNS, JMS, etc.
Components such as CronJobs, DaemonSets, etc (e.g. using hostPort)

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: front-end
  labels:
    name: front-end
spec:
  namespaceSelector:
    matchNames:
      - prometheus
  selector:
    matchLabels:
      name: front-end
  podMetricsEndpoints:
  - targetPort: 8079

Details:

podMetricsEndpoints : Similar to endpoint, this defines the pod endpoint serving Prometheus metrics. See PodMetricsEndpoint spec.

Prerequisites

Kubernetes (v1.24+)
The command-line tool kubectl
Helm (v3.9+) installed and configured
CRDs for PodMonitors and ServiceMonitors installed.

Check that Custom Resource Definitions for PodMonitors and ServiceMonitors exist in your cluster using this command:

kubectl get crd | grep monitoring

If not, you can install them with the following kubectl apply commands:

kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/main/example/prometheus-operator-crd/monitoring.coreos.com_podmonitors.yaml
kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/main/example/prometheus-operator-crd/monitoring.coreos.com_servicemonitors.yaml

Installation

The Target Allocator can be enabled by modifying the default values.yaml file in the OpenTelemetry Integration Chart. Once enabled, it is deployed to service the Prometheus Receivers of the OpenTelemetry Agent Collectors and allocate targets residing on the DaemonSet's nodes.

This guide assumes you have running services exporting Prometheus metrics running in your Kubernetes cluster.

STEP 1. Follow the instructions for Kubernetes Observability with OpenTelemetry, specifically the Advanced Configuration guide, which utilizes the otel-integration values.yaml file by setting opentelemetry-agent.targetAllocator.enabled to true:

opentelemetry-agent:
  targetAllocator:
    enabled: true   ##set to true
    replicas: 1
    allocationStrategy: "per-node"
    prometheusCR:
      enabled: true

Also, as shown above, the default allocation strategy is per node to align with the OpenTelemetry agent's daemon set mode.

STEP 2. Install the Helm chart with the changes made to the values.yaml and deploy the target allocator pod:

helm upgrade --install otel-coralogix-integration coralogix-charts-virtual/otel-integration --render-subchart-notes -n <namespace> -f values.yaml

Troubleshooting

To check if the jobs and scrape configs generated by the Target Allocator are correct and ServiceMonitors and PodMonitors are successfully detected, port-forward to the Target Allocator's exposed service. The information will be available under the /jobs and /scrape_configs HTTP paths.

The Target Allocator’s service can be located with the following command: kubectl get svc -n <namespace>

Port forward to the target allocator pod with the following kubectl command:

kubectl port-forward -n <namespace> svc/coralogix-opentelemetry-targetallocator 8080:8080

You can browse or curl the /jobs and /scrape_configs endpoints for the detected PodMonitor & ServiceMonitor resources and the generated scrape configs.

The generated kubernetes_sd_configs is a common configuration syntax for discovering and scraping Kubernetes targets in Prometheus.

Opentelemetry EBPF Instrumentation

The OpenTelemetry EBPF Instrumentation is an OpenTelemetry component that uses eBPF to collect telemetry data from the Linux kernel, such as network metrics and spans, without requiring modifications to the application code. To enable the OpenTelemetry EBPF Instrumentation, set opentelemetry-ebpf-instrumenat.enabled to true in the values.yaml file.

for a full list of values for this chart, please look at values.yaml

K8s Cache

The OpenTelemetry EBPF Instrumentation includes a K8s Cache component that collects Kubernetes metadata and enriches the telemetry data with Kubernetes labels. This allows you to correlate the telemetry data with Kubernetes resources, such as Pods, Nodes, and Namespaces. The K8s Cache Component is critical for large scale kubernetes clusters, as it allows takes load of the K8s API Server by isolating the calls to only the K8s Cache services. The K8s Cache is turned on by default, with 2 replicas for high availability. You can configure the number of replicas by setting opentelemetry-ebpf-instrumentation.k8sCache.replicas in the values.yaml file. to turn off the K8s Cache, set opentelemetry-ebpf-instrumentation.k8sCache.replicas to 0 in the values.yaml file. Turning off the k8s cache will still enrich data with k8s metadata, but it will do so by calling the K8s API Server directly from each replica of the OpenTelemetry EBPF Instrumentation.

Coralogix Operator

The Coralogix Operator provides Kubernetes-native deployment and management for Coralogix, designed to simplify and automate the configuration of Coralogix APIs through Kubernetes custom resources definitions and controllers.

Enabling Coralogix Operator

To enable the Coralogix Operator, set coralogix-operator.enabled to true in the values.yaml file.

Previous Helm

Next Validation