What Is Prometheus Monitoring? Architecture, Use Cases, and Limitations
A single Prometheus server can scrape millions of active time series from thousands of targets, and the binary itself is small enough that you can have a working deployment scraping live metrics within an hour of deciding to try it. That combination of simplicity and serious scale is why Prometheus has become the default metrics layer for cloud-native engineering teams, and why almost every commercial observability platform today supports its query language.
This guide covers what Prometheus is, how it works, common production use cases, and the limitations that lead teams to pair Prometheus with managed observability platforms like Coralogix.
What Is Prometheus?
Prometheus is an open-source tool for collecting, storing, and querying time-series metrics from your infrastructure and applications. It was originally built at SoundCloud and is now a graduated project under the Cloud Native Computing Foundation (CNCF), alongside Kubernetes. Numerical samples land in a local time-series database (TSDB), and you query them with Prometheus Query Language (PromQL) to build dashboards or trigger alerts. Prometheus only handles metrics, so teams that also need to dig into logs and traces during an incident usually pair it with other tools.
Say you run a payments API handling thousands of requests per second. Prometheus periodically asks each instance, “How many requests have you served, how many failed, and how long did each one take?” The instances respond with numerical counts, Prometheus stores those as time series, and PromQL lets you query them later to ask things like “what was the 99th percentile latency in the last 10 minutes” or “show me the error rate per endpoint over the last hour.”
How Prometheus Works
Prometheus uses a pull-based model where the server initiates HTTP requests to instrumented targets and collects metrics from a /metrics endpoint. That single design choice shapes everything about how the system behaves, scales, and fails.
Scraping and Pull-Based Architecture
Every service you want to monitor exposes a /metrics endpoint, and Prometheus visits that endpoint on a regular interval to pull the latest numbers. You define which targets to scrape and how often inside the scrape_configs block of your Prometheus config. A failed scrape also signals that the target is unreachable, giving you basic health checking for free.
A minimal scrape configuration looks like this:
scrape_configs:
- job_name: 'node'
scrape_interval: 15s
static_configs:
- targets: ['localhost:9100']
The global scrape_interval defaults to one minute, but most production teams drop it to 15 seconds for services where they want finer resolution and faster alerting.
Storage and the Time-Series Database
Every sample first lands in a write-ahead log (WAL), so nothing gets lost if the process crashes mid-write. Prometheus then groups those samples into two-hour blocks on disk and compacts older blocks over time, and the Prometheus storage documentation covers disk sizing based on ingestion rate and retention window. Local storage has no clustering or replication, so teams that need durability typically add a remote storage backend.
Service Discovery and Target Relabeling
Static target lists break down in dynamic environments like Kubernetes, where pods come and go all day. Service discovery solves that by auto-registering new targets as they appear, and Prometheus ships with native support for Consul, Kubernetes, and file-based discovery. Relabeling lets you filter and reshape what gets scraped, with relabel_configs running before the scrape to decide which targets are in scope and metric_relabel_configs running after the scrape to control which series actually make it into storage.
Querying and the PromQL Language
PromQL is how you ask Prometheus questions, and it supports instant queries for “what’s the value right now,” range queries for “what did this look like over the last hour,” and label-based filtering with aggregation operators like sum, avg, topk, and count. The two functions you’ll use constantly are rate(), which turns a raw cumulative counter into a per-second rate while handling counter resets for you, and histogram_quantile(), which calculates percentiles like p99 latency from histogram bucket data at query time.
A typical request rate query looks like this:
sum(rate(http_requests_total[5m])) by (status_code)
Always apply rate() before any aggregation, never after. Aggregating raw counter values first hides the resets that rate() needs to detect, and the result is numbers that look reasonable but are quietly wrong.
Types of Prometheus Metrics
Prometheus defines four metric types that suit different measurement patterns, and choosing the wrong one produces data that behaves incorrectly under aggregation:
- Counters: Only increase or reset to zero on restart. You’ll use these for request throughput, error rates, and bytes transmitted. The
rate()function converts raw cumulative values into per-second rates while handling resets automatically. - Gauges: Move up or down at any time, making them right for memory usage, connection counts, and queue depth. Never apply
rate()to a gauge. Usemax_over_time(),min_over_time(), andavg_over_time()instead. - Histograms: Sample observations into configurable cumulative buckets, exposing
_bucket,_sum, and_countseries. Thehistogram_quantile()function calculates percentiles at query time, and histograms are the right choice for any metric where you need cross-instance aggregation or SLO tracking. - Summaries: Calculate quantiles on the client side, producing pre-computed values like the 95th percentile directly from the application process. The catch is that summary quantiles can’t be aggregated across replicas, so most teams prefer histograms unless they need exact quantiles from a single process.
Most production services end up using counters for throughput and errors, gauges for resource utilization, and histograms for latency. Summaries are the least common because the inability to aggregate across replicas limits their usefulness in any service that runs more than one instance.
Core Features of Prometheus
Three capabilities define Prometheus in production: its multi-dimensional data model, Alertmanager, and the ecosystem of exporters and client libraries. Most production teams pair Prometheus with Grafana as a separate visualization layer for dashboards and alerts, which is the standard open-source pattern. The tradeoff is that Grafana runs as an extra service to operate and a separate data source to configure, and logs and traces still live in other tools during an incident.
Modeling with Multi-Dimensional Data
Every Prometheus time series is identified by a metric name plus a set of key-value label pairs, which lets you slice the same metric across any dimension you’ve added. A series like api_http_requests_total{method="POST", handler="/messages"} packs the metric and its labels into one expression, so you can later filter or group by method, handler, or status code. The total count of those unique label combinations is what people mean by cardinality, and a high-cardinality label like user_id can balloon a single metric from a few hundred series into millions overnight, which is why cardinality control is one of the most important things you’ll watch in production.
Alerting with Alertmanager
Prometheus splits alerting into two pieces: the Prometheus server evaluates your alerting rules and fires alerts, while a separate component called Alertmanager handles what to do with them. Alertmanager takes care of deduplication so you don’t get the same alert from every replica, grouping so a hundred related failures collapse into one notification, routing to the right team, plus silencing and inhibition so a known root-cause alert can suppress downstream noise.
Instrumenting with Exporters and Client Libraries
There are two ways to get data into Prometheus, depending on whether you control the source code. For third-party software, the Prometheus community maintains exporters that translate vendor-specific metrics into the Prometheus format, including node_exporter for Linux host metrics, blackbox_exporter for probing HTTP and TCP endpoints, and jmx_exporter for Java Virtual Machine (JVM) apps like Kafka and Cassandra. For your own services, official client libraries in Go, Java, Python, Ruby, and Rust let you define custom metrics in application code and expose them on a /metrics endpoint with a few lines of setup.
Common Use Cases for Prometheus Monitoring
The two most common use cases are Kubernetes monitoring and the broader mix of infrastructure, application, and CI/CD observability that most engineering teams need.
Kubernetes, Cluster, and Container Monitoring
Kubernetes monitoring is the dominant use case, and the standard stack combines several components that cover different cluster layers:
- nodeexporter: Collects host-level metrics like CPU utilization, memory availability, disk space, and network throughput from each node.
- kube-state-metrics: Exposes orchestration-layer data like pod status, deployment replica counts, and job completion states.
- cAdvisor: Reports container-level resource usage, including CPU and memory per container.
- Prometheus Operator: Automates deployment through Custom Resource Definitions (CRDs) like ServiceMonitor and PodMonitor, making scrape configurations declarative and GitOps-compatible.
For example, during a memory leak investigation, cAdvisor reports rising memory on a specific pod, kube-state-metrics shows the pod has restarted three times in the last hour, and nodeexporter confirms the underlying node is fine, pointing the investigation straight at the application code.
Infrastructure, Application, and CI/CD Monitoring
For host-level signals, node_exporter is usually the first thing teams deploy, and it pairs well with the [predict_linear()](https://prometheus.io/docs/prometheus/latest/querying/functions/) PromQL function for capacity planning, which projects when a disk will fill based on current growth, so you can alert two weeks ahead instead of getting paged the night it runs out. For application monitoring, client libraries let you instrument your code directly, and a good baseline is to track query count, errors, and latency for every external dependency your service calls.
CI/CD and batch jobs need a different pattern because they exit before Prometheus can scrape them, which is what the Pushgateway is for. A nightly database backup, for example, can push its duration and success status to the Pushgateway so your team gets an alert whenever backups start failing or running long.
Where Prometheus Runs Into Limitations
Prometheus was built as a near-real-time metrics system, not a cross-stack observability backend, and that focus creates gaps that widen as your deployment grows:
- Long-term storage and retention: Prometheus stores data locally with a default retention of 15 days, and extending that window increases disk and memory requirements proportionally. Local storage has no clustering, replication, or downsampling, so teams needing months or years of retention typically send the data to a remote storage backend.
- Cross-stack correlation: Prometheus handles metrics only, so correlating a latency spike with the specific error log and distributed trace that caused it requires separate systems with their own query languages and retention policies. Your on-call engineer ends up pivoting between tools and mentally stitching together context.
- Horizontal scaling: A single Prometheus instance has no built-in horizontal scaling or cross-instance querying. Federation and Agent Mode with
remote_writeforward metrics to centralized backends, but both approaches add architectural complexity your team has to operate.
None of these are dealbreakers on their own, but they tend to multiply, which is why most teams running Prometheus past a certain scale eventually pair it with a managed backend.
How Coralogix Addresses Each Limitation
Coralogix maps one answer to each of the three gaps above, and every answer comes from the underlying architecture rather than a feature bolted on after the fact:
- Retention and long-term storage: Streama writes Prometheus-compatible metrics to your own Amazon Simple Storage Service (S3) or Google Cloud Storage bucket in open Parquet format, which decouples retention cost from indexing cost. Years of historical data stay queryable at object-storage prices without the rehydration fees most vendors charge for archived data.
- Cross-stack correlation: DataPrime queries logs, metrics, and traces in one language, with native PromQL support so Prometheus users keep the queries they already wrote. On-call engineers stop pivoting between tools during an incident because every telemetry type lands in the same query surface.
- Horizontal scaling: Streama processes telemetry in flight rather than waiting on a single-node indexing step, and pricing is based on ingestion volume without per-host or per-series charges. At production scale, the architecture processes 3 million events per second across 500,000 applications worldwide.
The common thread across all three is that Coralogix extends what Prometheus does rather than replacing it, which lets teams keep the PromQL queries, recording rules, and instrumentation they already have.
Best Practices for Running Prometheus at Scale
Most Prometheus pain at scale traces back to cardinality decisions that looked fine at baseline and broke under autoscaling. A counter labeled with pod_id works cleanly at 20 pods and a few hundred time series, but when Kubernetes autoscales to 200 pods during a traffic spike, that single label multiplies through every other one already in use (method, status_code, endpoint, region). Prometheus memory climbs, queries slow down, and on-call engineers see out-of-memory (OOM) crashes on a server that was healthy an hour earlier.
The emergency fix is a relabel rule that drops pod_id before it reaches storage, which restores service but destroys the per-pod visibility the team added for a real reason. The durable fix is aggregating at the pipeline layer and keeping only the top pods by latency or error rate, not every permutation.
The three practices below trade the label-hygiene basics an experienced engineer already knows for the decisions that actually fail under load:
- Plan for cardinality under autoscaling, not at baseline: A metric that looks healthy at 20 pods can produce millions of series at 200 Aggregate noisy high-cardinality labels at the collection or pipeline layer, keeping top-N pods by a business-meaningful threshold rather than relabeling the dimension away entirely.
- Tie alerts to burn rates, not raw thresholds: Static thresholds drift out of date as traffic grows, so critical paths should page on service level objective (SLO) error-budget consumption instead. A fast-burn window catches sharp regressions in minutes, and a slow-burn window catches sustained degradation static thresholds miss.
- Instrument every external dependency from day one: Query count, errors, and latency for each downstream service is the three-signal pattern that pays off across every future incident. Cheap to add on day one, structural to add on day 180
The cost of getting these wrong shows up months later as slow queries, runaway bills, and alert fatigue, and the cost of getting them right is upfront discipline that holds across the life of the deployment.
Where to Take Your Prometheus Setup From Here
Prometheus gives you a solid metrics foundation, and the natural next question is what happens when the three gaps above start costing real engineering time. Coralogix covers each of them on one platform: full-stack observability that puts metrics, logs, and traces in one query surface for cross-signal correlation, customer-owned object storage for unlimited retention without per-query fees, and ingestion-based pricing that scales without per-host or per-series charges.
The platform accepts Prometheus metrics natively, so PromQL queries, recording rules, alerts, and Grafana dashboards keep working through the move. If you want to see what your existing scrape targets look like alongside logs and traces, you can start a free 14-day Coralogix trial with full feature access and no credit card required.
Frequently Asked Questions About Prometheus Monitoring
What is the difference between Prometheus and Grafana?
Prometheus collects, stores, and queries metrics through PromQL, while Grafana is a visualization layer that connects to Prometheus as one of many data sources. Teams commonly pair them for open-source dashboards, but the pairing leaves logs and traces in separate tools, which is why many teams eventually consolidate everything onto one platform.
Can Prometheus monitor Kubernetes clusters?
Kubernetes is the most common production use case for Prometheus. The typical stack combines node_exporter, kube-state-metrics, cAdvisor, and the Prometheus Operator, which makes scrape configurations declarative through CRDs like ServiceMonitor and PodMonitor.
How long does Prometheus retain data by default?
The default retention period is 15 days, and extending it increases local disk and memory requirements proportionally. Teams needing longer retention for capacity planning or compliance typically add a remote storage backend, and platforms like Coralogix accept Prometheus metrics natively and store them in your own cloud storage with unlimited retention.
Can Prometheus handle logs and traces?
Prometheus handles metrics exclusively and doesn’t support log collection or distributed tracing. Teams that need to correlate metrics with logs and traces either operate separate systems for each signal type or consolidate into a full-stack observability platform that ingests all three under a single query interface.
How do you scale Prometheus beyond a single instance?
A single Prometheus server has no built-in horizontal scaling, so teams typically scale by sharding scrape targets across multiple Prometheus instances and using federation or remote_write to forward metrics to a centralized backend. Managed observability platforms like Coralogix handle the long-term storage, replication, and cross-instance querying without requiring you to operate that infrastructure yourself.