Are you looking for a way to improve traces observability without breaking the bank? Look no further!
In our previous tutorial, we showed you how to set up the new OpenTelemetry Community Demo Application and send telemetry data to Coralogix, giving you the ability to understand the interactions between your services and visualize, alert and query them on your Coralogix dashboard.
But what about cost? Ingesting all of your traces and spans can quickly add up and can be unnecessary in order to gain visibility into the health of your applications. That’s where trace sampling comes in – in particular, tail sampling.
By sampling your traces, you can significantly reduce the amount of data ingested into Coralogix, maintaining full visibility into your services without incurring heavy charges. Try it out with the OTel Demo App in this tutorial.
The following tutorial will demonstrate how to use the OTel Collector in a load balanced configuration with tail sampling enabled on the collector nodes, using the OTel Demo App.
The tail sampling processor and probabilistic sampling processor allow you to sample traces based on a set of rules at the collector level.
This allows you to define more advanced rules to keep accrued visibility over error or high latency traces.
Note: To achieve this in your environment your code should be instrumented with OpenTelemetry and emit the telemetry data to the OTel Collector.
Tail sampling is a method of trace sampling in which sampling decisions are made at the end of the workflow, allowing for a more accurate sampling decision. This is in contrast to head-based sampling, in which the the sampling decision is made at the beginning of a request and usually at random. Tail sampling grants you the option of filtering your traces based on specific criteria, a plus when compared with head-based sampling.
So why is tail sampling important, and why should you do it?
Follow the steps below to use the OTel Collector in a load balanced configuration with tail sampling enabled on the collector nodes, using the OTel Demo.
STEP 1 – Clone the opentelemetry-demo.git
to your local machine
git clone https://github.com/open-telemetry/opentelemetry-demo.git
STEP 2 – Edit the directory
.env
file. Add the following lines to the bottom and update the values to reflect your Coralogix account environment variables, as in the example below:
# ******************** # Exporter Configuration # ******************** CORALOGIX_DOMAIN=coralogix.com CORALOGIX_APP_NAME=otel CORALOGIX_SUBSYS_NAME=otel-demo CORALOGIX_PRIVATE_KEY=b3887c90-5e67-4249-e81b-EXAMPLEKE
STEP 3 – Configure the load balancing exporter
src/otelcollector/otelcol-config-extras.yml
file, paste the following:exporters: loadbalancing: protocol: otlp: # TLS configuration between the exporter LB and the exporters can be encrypted - for this environment we are instructing the exporter LB to ignore TLS errors tls: insecure: true # all options from the OTLP exporter are supported # except the endpoint timeout: 1s resolver: static: # These would be the corresponding docker hosts running the exporters hostnames: - otel-col-shipper:4317 - otel-col-shipper-2:4317 processors: batch/traces: timeout: 1s send_batch_size: 50 resourcedetection: detectors: [system, env] timeout: 5s override: true service: pipelines: traces: processors: [ batch/traces, batch] exporters: [logging, otlp, loadbalancing]
Wondering what we just did?
otlp
protocol to be used to communicate with the onward Collectors (we disable TLS for this connection)otel-col-shipper
& otel-col-shipper-2
.batch
settings for shipping the tracesSTEP 4 – Create a config file to be used by both of the load balanced OTel collectors
otel-standalone.yml
in the root of the opentelemetry-demo
folder and paste the following configuration:extensions: health_check: receivers: otlp: protocols: grpc: endpoint: 0.0.0.0:4317 processors: batch/traces: timeout: 1s send_batch_size: 50 resourcedetection: detectors: [system, env] timeout: 5s override: true tail_sampling: decision_wait: 10s num_traces: 100 expected_new_traces_per_sec: 10 policies: [ { name: errors-policy, type: status_code, status_code: {status_codes: [ERROR]} }, { name: randomized-policy, type: probabilistic, probabilistic: {sampling_percentage: 25} }, ] attributes/shipper: actions: - key: shipper action: insert value: '${SHIPPER_NAME}' exporters: logging: coralogix: # The Coralogix traces ingress endpoint traces: endpoint: "otel-traces.${CORALOGIX_DOMAIN}:443" metrics: endpoint: "otel-metrics.${CORALOGIX_DOMAIN}:443" logs: endpoint: "otel-logs.${CORALOGIX_DOMAIN}:443" # Your Coralogix private key is sensitive private_key: "${CORALOGIX_PRIVATE_KEY}" application_name: "${CORALOGIX_APP_NAME}" subsystem_name: "${CORALOGIX_SUBSYS_NAME}" # (Optional) Timeout is the timeout for every attempt to send data to the backend. timeout: 30s service: extensions: [health_check] pipelines: traces: receivers: [otlp] processors: [attributes/shipper, tail_sampling, batch/traces, resourcedetection] exporters: [coralogix, logging] logs: receivers: [otlp] exporters: [coralogix, logging]
Wondering what we just did?
batch
settings for shipping the tracestail_sampling
process with the following 2 policies:
attributes
process to inject an environment variable as a new field into each span. This allow us to identify which load balanced shipper sent a span to Coralogix.exporter
. We use environment variables from the .env
file to inject the required valuesSTEP 5 – Update the OTel Collector container definition
docker-compose.yml
file and update the otelcol
container definition (line 603) to match the environment variables that we defined earlier. Add the environment
section to file, as done in the example below.# OpenTelemetry Collector - Load Balancer otelcol: image: otel/opentelemetry-collector-contrib:0.68.0 container_name: otel-col deploy: resources: limits: memory: 100M restart: unless-stopped command: [ "--config=/etc/otelcol-config.yml", "--config=/etc/otelcol-config-extras.yml" ] volumes: - ./src/otelcollector/otelcol-config.yml:/etc/otelcol-config.yml - ./src/otelcollector/otelcol-config-extras.yml:/etc/otelcol-config-extras.yml ports: - "4317" # OTLP over gRPC receiver - "4318:4318" # OTLP over HTTP receiver - "9464" # Prometheus exporter - "8888" # metrics endpoint depends_on: - jaeger - otelcol_shipper - otelcol_shipper_2 logging: *logging # OpenTelemetry Collector - Shipper 1 otelcol_shipper: image: otel/opentelemetry-collector-contrib:0.68.0 container_name: otel-col-shipper deploy: resources: limits: memory: 100M restart: unless-stopped command: [ "--config=/etc/otelcol-config.yml" ] volumes: - ./otel-standalone.yml:/etc/otelcol-config.yml ports: - "4317" # OTLP over gRPC receiver - "4318" # OTLP over HTTP receiver - "9464" # Prometheus exporter - "8888" # metrics endpoint environment: - CORALOGIX_DOMAIN - CORALOGIX_APP_NAME - CORALOGIX_SUBSYS_NAME - CORALOGIX_PRIVATE_KEY - SHIPPER_NAME=shipper-1 logging: *logging # OpenTelemetry Collector - Shipper 2 otelcol_shipper_2: image: otel/opentelemetry-collector-contrib:0.68.0 container_name: otel-col-shipper-2 deploy: resources: limits: memory: 100M restart: unless-stopped command: [ "--config=/etc/otelcol-config.yml" ] volumes: - ./otel-standalone.yml:/etc/otelcol-config.yml ports: - "4317" # OTLP over gRPC receiver - "4318" # OTLP over HTTP receiver - "9464" # Prometheus exporter - "8888" # metrics endpoint environment: - CORALOGIX_DOMAIN - CORALOGIX_APP_NAME - CORALOGIX_SUBSYS_NAME - CORALOGIX_PRIVATE_KEY - SHIPPER_NAME=shipper-2 logging: *logging
Wondering what we just did?
depends_on
on our 2 new OTel Collector instancesotel-standalone.yml
), allowing them to both ship to Coralogix.env
in STEP 2 into the containers as environment variables to be used in the OTel configuration fileSHIPPER_NAME
that will be injected into each span to identify which collector shipped the span to CoralogixSTEP 6 – Build the Environment
docker compose up --no-build
docker compose up --build
STEP 7 – Validation & Testing
docker logs -f otel-col
docker logs -f otel-col-shipper
docker logs -f otel-col-shipper-2
The last 3 lines indicate that the collector is shipping spans.
productCatalogFailure
feature flag. This will instruct the environment to create errors, allowing our OTel collectors to ship them!STEP 8 – Finally… view your traces on your Coralogix dashboard!
The OpenTelemetry Community Demo application is an awesome tool for getting to know about OpenTelemetry and instrumentation best practices. In this tutorial, we showed you how to use the OTel Demo to use the OTel Collector in a load-balanced configuration with tail sampling enabled on collector nodes. Use this as inspiration for implementing tail sampling in your application traces.
Interested in learning more?