[Workshop Alert] Dynamic Scoring for WAF Actions and CloudFront Traffic - Save Your Seat Now!

What is Jaeger Distributed Tracing?

  • Coralogix
  • November 2, 2022
Share article
jaeger distributed tracing

Distributed tracing is the ability to follow a request through a software system from beginning to end. While that may sound trivial, a single request can easily spawn multiple child requests to different microservices with modern distributed architectures. These, in turn, trigger further sub-requests, resulting in a complex web of transactions to service a single originating request.

While each microservice can generate logs for the specific transactions they handle, those logs don’t describe the entire flow of a request. Piecing transactions together manually is a labor-intensive process. 

This is where distributed tracing comes in: by propagating identifiers to each child request (or “span”), tracing allows you to join the dots between transactions and map the entire chain of events. When you’re debugging a complex issue or looking for the source of a performance bottleneck in a distributed microservice-based architecture, distributed tracing provides the insights that logs and the metrics on their own cannot.

In response to the growth in popularity of microservice architectures, several distributed tracing tools have been developed, of which Jaeger is one. Jaeger distributed tracing is an open-source distributed tracing platform that allows you to collect, aggregate, and analyze trace data from software systems. 

Initially developed in 2015 by ride-share giant, Uber, Jaeger was adopted by the Cloud Native Computing Foundation (CNCF) in 2017. Two years later, the project was promoted from incubation to graduated status, reflecting its maturity as an established, widely used, and well-documented platform.

Jaeger Architecture

As you might expect from a CNCF project, Jaeger is designed for cloud-hosted, containerized, microservice-based systems. It consists of the following elements:

  • Instrumentation logic – To propagate identifiers and collect timestamps and other trace metadata, you first need to instrument your application code. Until recently, this was achieved using the Jaeger client libraries – language-specific implementations of the OpenTracing API. However, following the consolidation of OpenTracing and OpenCensus into OpenTelementry, the Jaeger client libraries have been deprecated in favor of the OpenTelemetry APIs and SDKs.
  • Jaeger agent – The agent listens for the individual spans that make up a complete trace and forwards them to the collector. While you don’t have to include the Jaeger agent, it’s helpful for larger, more complex systems as it takes care of service discovery for the Jaeger collectors.
  • Jaeger collector – The collector is a key part of the Jaeger platform. It’s responsible for receiving and processing traces before forwarding them to storage and sending sampling instructions back to the instrumentation logic.
  • Database – When you implement Jaeger, you need to set up a database to store your traces for analysis. Jaeger supports both Elasticsearch and Cassandra, and provides an extensible plugin framework so that you can implement a different storage mechanism.
    You can send traces data from the collector to the database directly, or – for larger loads – use Kafka to stream the data. If you use Kafka, you’ll also need to deploy the Jaeger ingester to write traces from Kafka to the database.
  • Jaeger query and UI – The Jaeger query service exposes an API that allows you to query trace data and start making sense of how your system is behaving. It ships with a GUI to search for traces based on various parameters, including the services involved and the trace duration.

Implementing tracing with Jaeger

When implementing jaeger distributed tracing, there are various considerations to bear in mind.

Instrumenting your application code

The first step towards distributed tracing is to instrument your application code. While this involves some initial effort, it’s an investment that renders your system more observable. The result is that you can later answer questions that you didn’t know you would want to ask. To facilitate the adoption of distributed tracing and avoid vendor lock-in, the industry has centered on an open standard for tracing instrumentation: OpenTelemetry.

Jaeger added native support for OpenTelemetry in 2022, meaning that if you’ve instrumented your application code using the OpenTelemetry Protocol (OTLP) API or SDKs, you can now send traces directly to the Jaeger collector. The Jaeger client libraries have been deprecated, so for new implementations, it’s best to use OpenTelemetry for instrumentation. Using this open standard also allows you to move to other tracing solutions without having to re-instrument your application code first.

Distributed vs. all-in-one deployment

Jaeger ships with an all-in-one deployment option, with the agent, collector, and query service in a single container image. However, as this design offers no resilience in the event of the node failing, it’s only suitable for proof-of-concept and demo implementations.

You’ll need to implement multiple collectors to provide resilience and scale for production deployments. This is where it’s beneficial to use the agent for service discovery. You can then send data directly to the storage backend or stream it via Kafka.

Deploying Jaeger on Kubernetes

If you’re using Kubernetes to orchestrate a containerized deployment, it’s relatively straightforward to add distributed tracing to your K8s cluster using the Jaeger operator. The Jaeger agent is deployed as a sidecar in each pod. You can specify whether to write traces directly to the database from the collector (production strategy) or stream them via Kafka (streaming strategy).

Sampling rates

Jaeger distributed tracing can add considerable overhead to your application, as trace identifiers are propagated to each sub-request, and the data from each span is then processed and written to storage. Sampling rates reduce processing and storage costs while still collecting a representative sub-set of trace data.

With Jaeger, sampling can either be configured on the client as part of the instrumentation logic or defined centrally and propagated to clients via the agent. The advantage of remote sampling is that you can apply sampling rates consistently across the system and update them easily.

Jaeger distributed tracing supports two forms of remote sampling: file-based and adaptive. With the former, you define sampling rates for each service or operation explicitly using either probability or rate-limiting. With adaptive sampling, Jaeger adjusts the sampling rate dynamically to meet a pre-determined target tracing rate, meaning it can adjust to changes in traffic.

Summary

Jaeger is a cloud-native distributed tracing platform designed to address the challenges of building observability into microservice-based systems. It offers native Kubernetes support via the Kubernetes operator, while support for OpenTelemetry ensures the flexibility to move to other tracing solutions without having to re-instrument your application code. 

Observability and Security
that Scale with You.