Runtime metrics

Open in ChatGPT Open in Claude

The Runtime metrics tab in Service Catalog displays JVM runtime metrics — heap memory, garbage collection, thread states, CPU usage, and class loading — alongside service latency. It correlates JVM internals with user-facing performance, so you can diagnose memory leaks, GC pressure, and thread contention without leaving APM.

Limited availability

This feature is in preview and is subject to change.

Why it matters

When a Java service slows down, throws errors, or pegs CPU, the cause is often invisible from traces and span metrics alone. The usual suspects — long GC pauses, heap exhaustion, thread contention, classloader leaks — happen inside the JVM, one layer below where APM normally looks.

The Runtime metrics tab pulls JVM internals onto the same screen as your traces, so you can:

Tie latency spikes to GC pauses. The Service latency vs pauses widget overlays GC events on request latency. If a spike lines up with a pause marker, you have your answer.
Tell a memory leak apart from normal allocation pressure. Three separate memory widgets (Heap used vs memory, Live set - memory after GC, Heap by pool) let you read the heap properly instead of guessing from a single line.
Distinguish CPU saturation from lock contention. Looking at CPU usage and thread states side-by-side makes it obvious whether the service is busy doing work (CPU-bound) or stuck waiting (locks).
Isolate one bad pod. The JVM instances filter and the per-instance heatmap let you find the single misbehaving instance dragging down the cluster average.

What you need

OpenTelemetry semantic conventions only

The Runtime metrics tab reads only JVM metrics that follow the OpenTelemetry JVM semantic conventions — for example, jvm.memory.used, jvm.gc.duration, jvm.thread.count. Metrics that use other naming conventions (such as the Micrometer/Prometheus form jvm_memory_used_bytes) are not recognized and the tab does not show them. If your services emit JVM metrics under a different naming convention, migrate the instrumentation to the OpenTelemetry conventions before the tab can render data for them.

To make a Java, Scala, or Kotlin service report JVM metrics in the supported format:

Attach the OpenTelemetry Java agent (opentelemetry-javaagent.jar) to the service process. The agent collects all stable jvm.* metrics from the OpenTelemetry semantic conventions automatically. See Java OpenTelemetry instrumentation for installation steps, or follow Send JVM metrics for the end-to-end setup.
Required JDK: Java 8 or later for the stable metric set.
The metrics must include the service.name resource attribute so they correlate with the service in the Service Catalog.

For services that are not JVM-based (Python, Go, Node.js, and so on — identified by the telemetry.sdk.language resource attribute), the Runtime metrics tab is hidden entirely. For JVM-based services that have not started reporting jvm.* metrics yet, the tab is shown with an empty state (see When no JVM metrics are detected) that points you to instructions for sending the metrics.

How JVM metrics reach the tab

flowchart LR
    JVM["Java / Scala / Kotlin<br/>service (JVM process)"]
    Agent["OpenTelemetry Java agent<br/>opentelemetry-javaagent.jar"]
    Coralogix["Coralogix"]
    Tab["Runtime metrics tab<br/>filtered by service.name<br/>+ instance attribute"]

    JVM -->|JMX MBeans| Agent
    Agent -->|jvm.memory.* · jvm.gc.duration<br/>jvm.thread.count · jvm.cpu.*<br/>jvm.class.* + service.name| Coralogix
    Coralogix --> Tab

    class JVM entry
    class Tab success

The OpenTelemetry Java agent collects JVM metrics from your service and sends them to Coralogix tagged with service.name. The tab uses that tag, plus the instance attribute it detects (k8s.pod.name, host.name, and so on), to render the right panels.

Access the Runtime metrics tab

In your Coralogix toolbar, navigate to APM, then Service Catalog.
Select a JVM-based service to open the service drilldown.
Select the Runtime metrics tab.

The tab loads with the default time range and the JVM Metrics layout.

When no JVM metrics are detected

If you open the Runtime metrics tab for a JVM service that has not yet started sending JVM metrics in the OpenTelemetry format, the tab loads with an empty state instead of the widget grid. From the empty state, you can either start the JVM Observability extension flow directly in the tab, or open the Send JVM metrics guide to set up instrumentation yourself.

The tab continues to show the empty state until at least one jvm.* metric is observed for the service in the current time window. Once metrics start flowing, the widgets render automatically — no configuration change is needed inside the tab itself.

Note

Non-JVM services (such as Python or Go) do not show the Runtime metrics tab at all. The empty state is reserved for JVM-based services that are eligible to report jvm.* metrics but have not yet done so.

Layout

The JVM Metrics view is organized top to bottom as:

JVM instances: selector that scopes every panel below it to a subset of running JVM instances.
Instance heatmap: a row that shows the health of each instance at a glance.
JVM summary: four stat cards spanning full width.
Memory: Heap used vs memory, Live set - memory after GC, Heap by pool.
CPU: JVM CPU utilization, Thread count by state, Class loading.
GC: GC pause duration, GC event count, Service latency vs pauses.

The five content sections (Instance heatmap, JVM summary, Memory, CPU, GC) are collapsible. The JVM instances selector is a control bar and is always visible.

When JVM summary is collapsed, each card collapses into a chip that still shows its current value, trend arrow, and severity color, so the at-a-glance signal is preserved. When Memory, CPU, or GC is collapsed, the section header shows the widget titles as compact chips.

Hovering any chart syncs the crosshair across the other widgets in the tab, so you can correlate the same point in time across memory, CPU, and GC at once.

JVM instances selector

The instance selector sits at the top of the view and scopes every widget to a subset of the running JVM instances. The default is All instances (aggregated), which sums or averages across every JVM reporting metrics for the service.

Why per-instance filtering matters

JVM metrics are emitted per JVM process. In a horizontally scaled service, every pod runs its own JVM, and aggregate values hide the most common failure mode — one bad instance. A memory leak on a single pod, GC pauses isolated to one instance, or a thread leak on one node is invisible in the aggregate view until the pod fails. The selector lets you compare one suspect instance against the rest of the fleet.

When you select a single instance, every widget below filters to that instance only. When you select multiple instances, every widget shows one series per instance, color-coded by instance.

Instance heatmap

Below the instance selector is a collapsible per-instance overview row. It is collapsed by default during normal operations and expanded during incident triage.

When expanded, the row renders a compact grid where every running instance has one row and four columns — Heap used %, GC used time %, GC P99, CPU %. Each cell is color-coded by severity (green, amber, red), so an instance that is misbehaving on any single dimension is visually obvious without selecting each instance one by one.
Column Source metric Severity rule
Heap used % jvm.memory.used ÷ jvm.memory.limit Green at low utilization, amber as it climbs, red at sustained high utilization
GC used time % jvm.gc.duration summed as a fraction of wall time Green when the JVM spends little wall time paused for GC; red when pauses dominate
GC P99 jvm.gc.duration 99th percentile Green for short pauses, red for long pauses
CPU % rate(jvm.cpu.time) ÷ jvm.cpu.count (or jvm.cpu.recent_utilization × 100 as fallback) Green at low utilization, amber as it climbs, red near saturation

Selecting any row filters every widget below it to that instance. Large fleets are paginated — use the paginator below the heatmap to step through additional instances.

JVM summary

The summary strip displays four headline numbers an on-call engineer checks first. Each card shows an aggregated current value with a short context line and a directional delta versus the previous equivalent window (↗ red = worsening, ↘ green = improving, — neutral).
Card Source metric Aggregation (card subtitle)
Heap used jvm.memory.used filtered to jvm.memory.type=heap, summed across pools Avg across instances, in GB
GC overhead jvm.gc.duration as a fraction of wall time Avg of wall time spent in GC pauses
GC pause P99 jvm.gc.duration P99, averaged across instances, last 30 minutes
Thread count jvm.thread.count Avg platform threads across instances

What to look for:

A red ↗ trend arrow on any card means the metric got worse since the previous window — cross-check with the detailed widgets below to see what is driving it.
Heap used climbing toward the heap limit is a pre-OOM warning.
GC overhead above a few percent is unhealthy — the JVM is losing meaningful wall time to GC. Drill into GC pause duration and GC event count to see whether long pauses or frequent collections are responsible.
GC pause P99 rising means tail pauses are stretching, which directly hurts user-facing latency. Confirm against Service latency vs pauses.
Thread count drifting up without matching traffic growth is a thread leak. A sudden drop usually means a thread pool was resized at deployment.

Two cards surface contextual badges when a specific signal appears:

GC overhead shows a Driven by pod badge when one instance is responsible for most of the GC overhead — flags a single bad pod without needing to expand the heatmap.
Thread count shows a Stable badge when the thread count is steady across the window — confirms no thread leak or runaway pool growth.

What GC overhead measures

The GC overhead card reports the percentage of wall time the JVM was paused for garbage collection. It is not the same as CPU consumed by GC. Concurrent collectors such as ZGC and Shenandoah can spend significant CPU on garbage collection while showing low values here, because they do most of their work without stopping application threads.

Memory

The Memory row answers three distinct questions, each on its own widget: how much memory is the JVM using, how much is retained after each garbage collection, and how is that usage distributed across heap pools.

Heap used vs memory

Style: area + lines. Y-axis: bytes (auto-scaled GB/MB), single axis. Underlying data: jvm.memory.*, split by jvm.memory.type.

A toggle switches between Heap and Non-heap (Metaspace). The Y-axis automatically rescales on toggle — heap is typically in the GB range, non-heap (Metaspace) in the hundreds of MB — so the chart stays readable in both modes.
Series Source metric Style What it shows
used jvm.memory.used Filled area (blue) Currently allocated memory
committed jvm.memory.committed Dashed line (green) Memory the OS has reserved for the JVM
limit jvm.memory.limit Dashed reference line (red) Memory ceiling (-Xmx for heap, MaxMetaspaceSize for non-heap)

What to look for:

The gap between used and limit is your headroom before an out-of-memory error.
The gap between used and committed is memory the OS has reserved but the JVM is not using yet. A shrinking gap under load is an early warning that the JVM is running out of slack.
Switch to non-heap to surface Metaspace, a common source of OOM errors in services that load a lot of classes (plugin-heavy apps, frameworks that use a lot of reflection).

For leak detection, use Live set - memory after GC instead. The used line bounces with every GC cycle, which makes trends hard to read.

Live set - memory after GC

Style: stepped line. Y-axis: bytes, auto-ranged to the data — not zero-based. Underlying data: jvm.memory.used_after_last_gc summed across heap pools. Each horizontal segment represents memory between two GC events; each vertical drop is one GC firing.
Series Source What it shows
live set size jvm.memory.used_after_last_gc Memory retained after the most recent GC
GC event Derived from jvm.gc.duration Vertical markers where each GC event fired

Why this Y-axis is not zero-based

A zero-based Y-axis compresses every step into a narrow band at the top of the chart, which hides the rising-baseline pattern — the primary signal for leak detection. The axis automatically ranges to the data, so each step's change is visible, not just its absolute value.

This widget is the cleanest signal of a memory leak.

What to look for:

A roughly flat baseline means the service is healthy under steady load — the GC is reclaiming whatever is not needed.
A rising staircase, where each step starts higher than the previous, means more memory is being retained after every GC cycle. The chart highlights this pattern with a "rising baseline = leak pattern" label when it detects sustained growth.

Heap by pool

Style: stacked area. Y-axis: bytes (GB), single axis. Underlying data: jvm.memory.used split by jvm.memory.pool.name, filtered to jvm.memory.type=heap.

Pool names depend on the active GC algorithm — G1, ZGC, Shenandoah, and Parallel GC each report different pool sets. The widget shows whatever pools the JVM reports.
Series Style What it shows
old gen Stacked area (purple) Long-lived objects
survivor Stacked area (green) Objects that survived at least one minor GC
eden Stacked area (blue) Short-lived allocations

What to look for:

Eden is where new objects are allocated. It should rise and fall quickly with each young-generation collection — that is normal.
Old Gen holds long-lived objects. If it climbs steadily and never drops back down, the service is heading toward an out-of-memory error.
Survivor holds objects that survived at least one collection. If it stays unusually large, the JVM is keeping objects around longer than expected before promoting them to Old Gen.

CPU

The CPU row separates JVM CPU consumption, thread state composition, and class-loading activity into three widgets so each signal is readable on its own axis.

JVM CPU utilization

Style: line with reference line. Y-axis: cores (CPU usage expressed in core-equivalents). A 100% ceiling reference line marks the total available cores, so the saturation point is always in view.
Series Style What it shows
jvm.cpu.used Solid line (purple) CPU consumed by the JVM process, in core-equivalents
100% ceiling Reference line Total available cores — the saturation ceiling

What to look for:

Usage hitting the 100% ceiling line combined with mostly runnable threads in the next widget means the service is CPU-bound — typically a hot loop or heavy compute.
Usage well below the ceiling with a high blocked thread share means the service is stuck waiting on locks, not doing work.

Thread count by state

Style: stacked area. Y-axis: thread count (integer), single axis. Underlying data: jvm.thread.count grouped by jvm.thread.state.
Series Source Style What it shows
runnable jvm.thread.count filtered to jvm.thread.state=runnable Stacked area (blue) Threads currently running or ready to run
waiting jvm.thread.count filtered to jvm.thread.state=waiting Stacked area (amber) Threads waiting on another thread or condition
blocked jvm.thread.count filtered to jvm.thread.state=blocked Stacked area (coral) Threads waiting to acquire a monitor lock

The stacked composition matters as much as the total height.

What to look for:

A healthy service usually shows most threads in runnable (doing work) or waiting (idle between requests).
A spike in blocked threads means lock contention — threads are queued up waiting on a monitor.
A growing total stack height over time without matching traffic growth is a thread leak.

Class loading

Style: bars + line. Y-axis: classes / window (left), total (right). Underlying data: jvm.class.loaded, jvm.class.unloaded, jvm.class.count.
Series Source Style What it shows
load rate rate(jvm.class.loaded) Green bars Classes loaded per time bucket
unload rate rate(jvm.class.unloaded) Coral bars Classes unloaded per time bucket
total loaded jvm.class.count Green line Currently loaded classes

What to look for:

The total class count should level off after the application warms up. If it keeps growing without an increase in load rate, you have a classic classloader leak — common in OSGi containers, plugin-heavy applications, or services that hot-reload code in production.
Unload activity is normally near zero in a healthy JVM.

GC

The GC row separates pause duration from event frequency — they answer different questions and combining them on one chart obscures both signals — and pairs them with a latency overlay so GC pauses can be aligned with end-user impact.

GC pause duration

Style: multi-line. Y-axis: ms, single axis. Underlying data: jvm.gc.duration histogram. Filterable by jvm.gc.name (collector name).
Series Style What it shows
p50 Green line Median pause time
p95 Blue line 95th percentile pause time
p99 Red line 99th percentile pause time

Lines are grouped by GC collector — for example, G1 Young Generation or ZGC. If the JVM reports a more specific action (such as "end of minor GC"), the widget can break the data down further, but it does not force a minor-vs-major split: modern collectors like ZGC and Shenandoah do not separate collections that way, and the widget reflects whatever the JVM actually emits.

What to look for:

A widening gap between p50 and p99 means pauses are becoming unpredictable — most are short, but some run long. This usually points to a fragmented heap or a GC that is struggling to keep up with allocation.

GC event count

Style: bars. Y-axis: events / window, single axis. Underlying data: jvm.gc.duration event count, derived as rate(jvm.gc.duration_count).
Series Source Style What it shows
minor / major (G1GC) rate(jvm.gc.duration_count), grouped by jvm.gc.name and jvm.gc.action Bars per collector Collections per time bucket, split by collector and action when reported

When the JVM reports jvm.gc.action, the widget shows it as a secondary breakdown so you can see which kind of collection — young-gen vs full — is driving the rate. As with GC pause duration, the widget does not force a minor-vs-major split across collectors that do not have one.

What to look for:

A high event rate with low pause durations (in GC pause duration) is healthy GC — collections fire often but finish quickly.
A low event rate with high pause durations is the dangerous pattern: infrequent but expensive full GCs.
A step-change in event rate at a deployment timestamp means the new code allocates more memory per request than the previous version.

Service latency vs pauses

Style: line + vertical markers overlay (one of three widgets in the GC row, same width as the others). Y-axis: P99 latency (ms). Underlying data: P99 request latency from span metrics, with jvm.gc.duration events annotated as vertical markers. Both series are scoped to the same service and time window as the rest of the JVM Metrics view.
Series Source Style What it shows
p99 latency Span metrics for the service Purple line with shaded area End-user request latency over time
GC pause event jvm.gc.duration observations above the configured threshold (default 50 ms) Vertical dashed red line Each GC pause that exceeded the threshold

Span metrics and JVM metrics live on the same platform, so no cross-system correlation is needed.

What to look for:

A latency spike that lines up with a GC pause marker is GC-caused. The pause stopped all application threads, so any in-flight request piled up wait time during that window.
A latency spike with no nearby pause marker is not GC-related. Look at downstream calls in Dependencies or lock contention in Thread count by state instead.
A pause marker with no matching latency spike means requests were short enough, or concurrency low enough, that no request happened to span the pause.

Common use cases

Symptom	Where to look first
Intermittent latency spikes, traces look fine	Service latency vs pauses: align spikes with GC pause markers
Service throws an out-of-memory (OOM) error intermittently	Heap used vs memory: check `used` approaching `limit`, then confirm in Live set - memory after GC whether the live set is rising
Memory never returns to baseline after deployment	Live set - memory after GC: a rising staircase after the deployment timestamp indicates a leak introduced in the new version
Service is slow but spans show low self-time	Thread count by state: high `blocked` share with low CPU in JVM CPU utilization is lock contention
CPU pegged at the ceiling but throughput is low	JVM CPU utilization combined with Thread count by state: runnable threads dominating with high CPU is a hot loop or a GC pressure spiral; cross-reference with GC pause duration and GC event count
Metaspace or class count keeps growing	Class loading: total count rising after warm-up indicates a classloader leak; switch the Heap used vs memory widget to non-heap to confirm Metaspace pressure
GC overhead jumped after a code change	GC event count: step change in bar height at deployment time means the new code allocates more per request

Limitations

JVM metrics are emitted at the JVM process level. Multiple deployed applications inside a single JVM (Tomcat, JBoss, WebLogic) cannot be visualized separately — heap usage, GC behavior, and thread counts reflect the entire JVM process.
On hosts running more than one JVM outside Kubernetes, instances are distinguished by the combination of host.name and service.name.
Instances that have stopped reporting (terminated pods) are excluded from the filter selector after a short grace period.

Next steps

If your service is not yet sending JVM metrics, follow Send JVM metrics to enable the OpenTelemetry agent's metrics exporter.

Need help? Contact Support.

What's new? Find out here.

LLM? Read llms.txt.

Previous View external calls

Next Send JVM metrics

Column	Source metric	Severity rule
Heap used %	`jvm.memory.used` ÷ `jvm.memory.limit`	Green at low utilization, amber as it climbs, red at sustained high utilization
GC used time %	`jvm.gc.duration` summed as a fraction of wall time	Green when the JVM spends little wall time paused for GC; red when pauses dominate
GC P99	`jvm.gc.duration` 99th percentile	Green for short pauses, red for long pauses
CPU %	`rate(jvm.cpu.time)` ÷ `jvm.cpu.count` (or `jvm.cpu.recent_utilization` × 100 as fallback)	Green at low utilization, amber as it climbs, red near saturation

Card	Source metric	Aggregation (card subtitle)
Heap used	`jvm.memory.used` filtered to `jvm.memory.type=heap`, summed across pools	Avg across instances, in GB
GC overhead	`jvm.gc.duration` as a fraction of wall time	Avg of wall time spent in GC pauses
GC pause P99	`jvm.gc.duration`	P99, averaged across instances, last 30 minutes
Thread count	`jvm.thread.count`	Avg platform threads across instances

Series	Source metric	Style	What it shows
used	`jvm.memory.used`	Filled area (blue)	Currently allocated memory
committed	`jvm.memory.committed`	Dashed line (green)	Memory the OS has reserved for the JVM
limit	`jvm.memory.limit`	Dashed reference line (red)	Memory ceiling (`-Xmx` for heap, `MaxMetaspaceSize` for non-heap)

Series	Source	What it shows
live set size	`jvm.memory.used_after_last_gc`	Memory retained after the most recent GC
GC event	Derived from `jvm.gc.duration`	Vertical markers where each GC event fired

Series	Style	What it shows
old gen	Stacked area (purple)	Long-lived objects
survivor	Stacked area (green)	Objects that survived at least one minor GC
eden	Stacked area (blue)	Short-lived allocations

Series	Style	What it shows
jvm.cpu.used	Solid line (purple)	CPU consumed by the JVM process, in core-equivalents
100% ceiling	Reference line	Total available cores — the saturation ceiling

Series	Source	Style	What it shows
runnable	`jvm.thread.count` filtered to `jvm.thread.state=runnable`	Stacked area (blue)	Threads currently running or ready to run
waiting	`jvm.thread.count` filtered to `jvm.thread.state=waiting`	Stacked area (amber)	Threads waiting on another thread or condition
blocked	`jvm.thread.count` filtered to `jvm.thread.state=blocked`	Stacked area (coral)	Threads waiting to acquire a monitor lock

Series	Source	Style	What it shows
load rate	`rate(jvm.class.loaded)`	Green bars	Classes loaded per time bucket
unload rate	`rate(jvm.class.unloaded)`	Coral bars	Classes unloaded per time bucket
total loaded	`jvm.class.count`	Green line	Currently loaded classes

Series	Style	What it shows
p50	Green line	Median pause time
p95	Blue line	95th percentile pause time
p99	Red line	99th percentile pause time

Series	Source	Style	What it shows
p99 latency	Span metrics for the service	Purple line with shaded area	End-user request latency over time
GC pause event	`jvm.gc.duration` observations above the configured threshold (default 50 ms)	Vertical dashed red line	Each GC pause that exceeded the threshold

Runtime metrics

Why it matters

What you need

How JVM metrics reach the tab

Access the Runtime metrics tab

When no JVM metrics are detected

Layout

JVM instances selector

Instance heatmap

JVM summary

Memory

Heap used vs memory

Live set - memory after GC

Heap by pool

CPU

JVM CPU utilization

Thread count by state

Class loading

GC

GC pause duration

GC event count

Service latency vs pauses

Common use cases

Limitations

Next steps

Related resources