Back

OpenTelemetry Metrics: 3 Types of Metrics, Examples and Best Practices

Coralogix Team Apr 16, 2024

9 mins read

Understanding the OpenTelemetry Metrics Data Model

OpenTelemetry metrics can be represented by one of the following data types:

Events

Events in OpenTelemetry represent instances of measurements taken at specific points in time. The framework provides a flexible structure to capture not only the value but also the contextual information that accompanies the measurement. This concept is useful for pinpointing exact moments of interest within a system, such as spikes in load or errors during execution.

Using OpenTelemetry events, developers can track the occurrence and details of specific system events. This granularity is critical for in-depth analysis and troubleshooting. By aggregating these events over time, patterns can emerge, offering insights into system behavior and performance trends.

Data Streams

A data stream aggregates events over time, creating a continuous stream of measurement data. This model simplifies the tracking of metrics by providing a unified view of measurements that evolve over a period. It is particularly useful for monitoring metrics that represent ongoing activities, such as requests per second or CPU utilization. This helps in identifying long-term issues and assessing the impact of changes made to the system.

Time Series

OpenTelemetry expands on the concept of data streams by associating each data stream with a time series. This approach captures the evolution of metrics over time and preserves the sequential order of events. It is suitable for metrics that require detailed historical analysis and forecasting future trends.

How Metrics Are Collected in the OpenTelemetry Metrics API

Here is an overview of the metric collection process in OpenTelemetry, represented by the components and the order in which they are used.

Meter Provider

The Meter Provider acts as the entry point for the OpenTelemetry Metrics API. It is responsible for creating and managing Meter instances, which are used to capture metrics. The provider ensures that all Meters operate within a consistent configuration, facilitating the standardized collection of metric data across the application.

Working with the Meter Provider, developers can configure global settings for metric collection. This includes defining which metrics to collect, setting collection intervals, and specifying the destination for exported metrics.

Meter

A Meter is an instrument within the OpenTelemetry Metrics API that facilitates the capture of metrics. It provides various methods to record measurements, supporting different types of metrics such as counters, gauges, and histograms. Each Meter is associated with a specific component or library, allowing metrics to be collected in a granular and organized manner.

Utilizing Meters enables developers to precisely define what to measure and how. This fine-grained control over metric collection allows for tailored monitoring solutions, focusing on critical components and disregarding irrelevant data.

Instruments

Instruments are the tools used within the OpenTelemetry Metrics API to record specific measurements. They support various metric types, ensuring that developers can capture the data most relevant to their monitoring objectives. Instruments provide a simple interface for recording measurements, abstracting away the complexities of metric aggregation and export.

Leveraging different types of instruments, developers can capture a range of metrics, from basic counts and gauges to complex histograms and summaries.

Related content: Read our guide to OpenTelemetry collector

Types of Metrics in OpenTelemetry with Examples

Here are some of the metrics collected by OpenTelemetry.

1. Counters

Counters are a type of metric instrument used to capture a cumulative total, typically representing the number of occurrences of an event. They are best suited for tracking increments, such as the number of requests received or tasks completed. Counters provide a simple way to monitor system activity and workload.

Using counters, developers can gauge system throughput and detect anomalies in operational flow. For example, a sudden drop in the counter for processed transactions may indicate a bottleneck or failure in the system.

Example: Counting HTTP requests received by a server

To illustrate the use of counters in OpenTelemetry, consider a scenario where you want to track the number of HTTP requests received by a server.

Here’s a simple example in Python using the OpenTelemetry API. Before running this and the following examples, please first install the Python libraries opentelemetry-sdk and opentelemetry-api.

from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider

# Set the MeterProvider

metrics.set_meter_provider(MeterProvider())

# Obtain a meter

meter = metrics.get_meter("http_requests_meter")

# Define a counter

request_counter = meter.create_counter(

    "http_requests_total",

    description="Total number of HTTP requests received",

)

# Function to track requests

def track_request():

    request_counter.add(1, {"method": "GET", "endpoint": "/api/data"})

# Simulate receiving a request

track_request()

In this example, a counter named http_requests_total is created to track the total number of HTTP requests received by the server. The track_request function increments the counter by 1 each time it is called, simulating a new request. Labels like method and endpoint provide additional context for each measurement.

2. Gauges

Gauges measure the current value of a particular attribute, such as memory usage or queue depth. Unlike counters, gauges can increase or decrease, providing a snapshot of a system’s state at a given point in time. They are crucial for assessing the health and performance of resources.

Implementing gauges allows operators to monitor resource utilization and capacity. For example, tracking the gauge of available memory helps in preventing out-of-memory errors by facilitating timely scaling or optimization.

Example: Monitoring the size of a job queue

Here is how you could monitor the current size of a job queue in Python:

from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider

# Set the MeterProvider

metrics.set_meter_provider(MeterProvider())

# Obtain a meter

meter = metrics.get_meter("job_queue_meter")

# Data structure for which we will be monitoring size

queue = []

# Define the callback function for updating the gauge

def callback_function(observer):

    print("Size of Queue Updated")

    observer.observe(get_current_queue_size(), 2)

# Define a gauge

queue_size_gauge = meter.create_observable_gauge(

    "job_queue_size",

    description="Current size of the job queue",

    callbacks=[callback_function],

)

def get_current_queue_size():

    return len(queue)

In this example, an observable gauge named job_queue_size is created to monitor the size of a job queue. The callback function is used to monitor current size measurements, with a label indicating the priority of jobs in the queue.

3. Histograms

Histograms in OpenTelemetry collect and categorize data points into distinct buckets, enabling the visualization of the distribution of measured values over a period. This type of metric is useful for understanding the variability and outliers in system performance metrics, such as request latency or size of payloads.

By analyzing histograms, developers can identify performance bottlenecks and optimize system response times. For example, a histogram of request latencies might reveal a long tail of slow requests, prompting investigation and remediation measures.

Example: Using histograms to track latency

The following code shows how to use histograms to tracking request latencies:

from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider

# Set the MeterProvider

metrics.set_meter_provider(MeterProvider())

# Obtain a meter

meter = metrics.get_meter("request_latency_meter")

# Define a histogram

latency_histogram = meter.create_histogram(

    "request_latency",

    description="Distribution of request latencies",

)

# Function to track latency

def track_latency(latency):

    latency_histogram.record(latency, {"endpoint": "/api/data"})

# Simulate tracking a request latency

track_latency(123)  # latency in milliseconds

This example shows how to track the distribution of request latencies using a histogram named request_latency. The track_latency function records the latency of a request, with labels for additional context, such as the endpoint.

Best Practices for Using OpenTelemetry Metrics

Here are some best practices for making the most out of metrics in OpenTelemetry.

Apply Labels and Attributes Thoughtfully

Labels and attributes enrich metric data with contextual details, making them more specific and informative. However, using too many labels can lead to an explosion of metric dimensions, complicating analysis and storage. To strike a balance, select labels that provide meaningful differentiation without overwhelming the dataset.

Applying attributes thoughtfully enhances the utility of metrics. For example, adding a label for error codes to a counter of failed requests enables finer analysis of failure causes, facilitating targeted troubleshooting and resolution.

Establish a Naming Convention

Adopting a consistent naming convention for metrics is essential for avoiding confusion and ensuring easy identification and aggregation. Names should be descriptive, concise, and follow a predictable pattern across the application. This consistency aids in the discovery and analysis of metrics, enhancing the effectiveness of monitoring efforts.

A systematic approach to naming metrics simplifies their management, making it easier to correlate related metrics and interpret their significance. For example, using a standard prefix for all metrics related to database operations facilitates quick isolation of database-related performance issues.

Record Metrics with Precision and Purpose

When recording metrics, precision is important for capturing accurate and meaningful data. This involves selecting appropriate metric types and instruments, configuring suitable collection intervals, and ensuring reliable measurement methods. At the same time, reporting metrics should be focused and purposeful, prioritizing the most relevant and actionable information.

Precise recording and purposeful reporting maximize the value of collected metrics. For example, accurately tracking request latencies at a fine granularity supports detailed performance analysis. Focusing reports on key percentile values can highlight areas that require attention.

Monitor and Optimize the Telemetry Pipeline

Ensuring the efficiency and reliability of the telemetry pipeline is crucial for the effective use of OpenTelemetry metrics. This involves monitoring the pipeline for bottlenecks, data loss, or delays and optimizing its performance and scalability. Regularly reviewing and adjusting configuration, such as sampling rates and aggregation strategies, can enhance pipeline capabilities.

Active monitoring and continuous optimization of the telemetry pipeline ensure high-quality metric collection and reporting. This proactive approach helps in maintaining the responsiveness and accuracy of monitoring systems, enabling swift detection and resolution of issues.

Managed Application Observability with Coralogix

Coralogix sets itself apart in observability with its modern architecture, enabling real-time insights into logs, metrics, and traces with built-in cost optimization. Coralogix’s straightforward pricing covers all its platform offerings including APM, RUM, SIEM, infrastructure monitoring and much more. With unparalleled support that features less than 1 minute response times and 1 hour resolution times, Coralogix is a leading choice for thousands of organizations across the globe.

Learn more about the Coralogix platform

On this page