Prometheus Alertmanager is a component of the Prometheus ecosystem for handling alerts and supporting log analysis. It manages alerts generated by client applications like the Prometheus server, handling alert categorization, grouping alerts logically, and routing notifications to endpoints. Alertmanager’s aim is to reduce alert fatigue by configuring handling routes and silencing alerts as necessary.
Another functionality of Prometheus Alertmanager is redundancy and high availability, which can be achieved by running multiple instances. This setup ensures continuous monitoring and alert management even during failures or issues with one instance. It supports a variety of notification methods, including email, PagerDuty, and Slack, to deliver alerts to the right people.
This is part of a series of articles about Prometheus monitoring.
Grouping in Prometheus Alertmanager consolidates similar alerts into a single notification. This feature is critical during large-scale incidents when multiple systems may fail, triggering a high volume of alerts.
Instead of receiving hundreds of separate alerts, users receive a single, aggregated notification that still provides detailed information about which specific services are affected. Grouping is controlled through a routing tree in the configuration file, which defines how alerts are categorized, when the grouped notifications are sent, and to whom.
Inhibition prevents unnecessary notifications by muting certain alerts if others, indicating a broader issue, are already active. For example, if an entire cluster becomes unreachable, Alertmanager can suppress alerts from individual systems within that cluster to avoid overwhelming the user with redundant notifications.
Inhibitions are set up through the configuration file, allowing users to define which alerts should be suppressed based on the presence of other active alerts.
Silences allow users to temporarily mute specific alerts for a defined period. This is useful during planned maintenance or when addressing known issues that don’t require immediate attention.
Silences are configured using matchers, similar to routing rules, which determine which alerts are affected. If an incoming alert matches the conditions of an active silence, no notification will be sent. Users can set up and manage silences through Alertmanager’s web interface.
Prometheus Alertmanager supports high availability by allowing multiple instances to run in a cluster configuration. This setup ensures continuous alert management, even if one instance fails.
Prometheus is configured to send alerts to all Alertmanager instances, ensuring that alerts are handled reliably without the need for load balancing. This clustering is achieved by using flags in the configuration file.
Instructions in this tutorial are adapted from the Prometheus documentation.
The Prometheus Alertmanager configuration is primarily handled through a YAML file, which defines routing rules, receiver integrations, and inhibition logic, among other settings. To load a configuration file, the --config.file flag is used. For example, running Alertmanager with the following command will load alertmanager.yml:
./alertmanager --config.file=alertmanager.yml
This configuration can be dynamically reloaded without restarting the service. A reload is triggered by sending a SIGHUP signal or an HTTP POST request to /-/reload. If the file contains invalid syntax, changes won’t be applied, and the error will be logged.
The configuration starts with global settings that apply across the board. Here’s an example of some global parameters:
global:
smtp_from: '[email protected]'
smtp_smarthost: 'smtp.example.org:587'
smtp_auth_username: 'user'
smtp_auth_password: 'password'
resolve_timeout: '5m'
slack_api_url: 'https://hooks.slack.com/services/XXX/YYY/ZZZ'
In this block:
Routing rules determine how alerts are grouped, throttled, and delivered to receivers. Each alert passes through a routing tree, starting at the root. Here’s an example of a route configuration:
route:
group_by: ['alertname']
group_wait: 30s
group_interval: 10m
repeat_interval: 3h
receiver: 'email-team'
routes:
- matches:
- severity = critical
receiver: 'pagerduty-team'
In this configuration:
Inhibition rules allow suppressing certain alerts when others are already active. Here’s an example of an inhibition rule:
inhibit_rules:
- source_matchers:
- alertname = 'InstanceDown'
target_matchers:
- alertname = 'DiskSpaceLow'
equal: ['instance']
In this case:
Matchers define conditions for alerts to match routes or inhibitions. They support various operators like = for equality and != for inequality. Here’s an example:
matches:
- alertname = 'Watchdog'
- severity =~ 'critical|warning'
In this example:
Receivers define where notifications are sent. A configuration might look like this:
receivers:
- name: 'email-team'
email_configs:
- to: '[email protected]'
from: '[email protected]'
smarthost: 'smtp.example.org:587'
This configuration sends alert notifications via email to the specified address. Other integrations, such as Slack or PagerDuty, can also be configured under receivers.
Each receiver can have its own integration settings. For example, here’s how you might configure Slack:
receivers:
- name: 'slack-team'
slack_configs:
- channel: '#alerts'
send_resolved: true
http_config:
proxy_url: 'http://proxy.example.com'
In this setup:
Here are several examples of using Prometheus Alertmanager. These examples are adapted from the Prometheus documentation.
In Prometheus Alertmanager, Slack notifications can be customized to include additional information, such as links to internal documentation for resolving alerts. This can be done using Prometheus’ Go templating system. For example, the following configuration adds a URL that links to the organization’s wiki page for alerts based on their labels.
global:
slack_api_url: '<slack_webhook_url>'
route:
receiver: 'slack-alerts'
group_by: [alertname, datacenter, app]
receivers:
- name: 'slack-alerts'
slack_configs:
- channel: '#alerts'
text: 'https://internal.exampleco.net/wiki/notifications//'
In this configuration:
Alertmanager annotations, such as summary and description, can be included in Slack messages by accessing the CommonAnnotations field. This allows the notification to provide more detailed information about the alert, such as what has occurred and where.
groups:
- name: Instances
rules:
- alert: InstanceDown
expr: up == 0
for: 10m
labels:
severity: page
annotations:
description: 'The {{ $labels.job }} job’s {{ $labels.instance }} has been down for over 10 minutes.'
summary: 'Instance down'
receivers:
- name: 'my-team'
slack_configs:
- channel: '#alerts'
text: "<!channel> \nSummary: {{ .CommonAnnotations.summary }} \nDescription: {{ .CommonAnnotations.description }}"
In this example:
To simplify complex notifications, Alertmanager supports defining reusable templates. These templates can be stored in external files and loaded as needed, making the configuration cleaner and easier to maintain.
Here’s how you can define a reusable template for Slack notifications:
https://internal.exampleco.net/wiki/notifications
global:
slack_api_url: '<slack_webhook_url>'
route:
receiver: 'slack-alerts'
group_by: [alertname, datacenter, app]
receivers:
- name: 'slack-alerts'
slack_configs:
- channel: '#alerts'
text: '<REFERENCE THE NOTIFICATION TEMPLATE>'
templates:
- '/etc/alertmanager/templates/exampleco.tmpl'
In this setup:
Coralogix sets itself apart in observability with its modern architecture, enabling real-time insights into logs, metrics, and traces with built-in cost optimization. Coralogix’s straightforward pricing covers all its platform offerings including APM, RUM, SIEM, infrastructure monitoring and much more. With unparalleled support that features less than 1 minute response times and 1 hour resolution times, Coralogix is a leading choice for thousands of organizations across the globe.