Prometheus Alertmanager: The Basics and a Quick Tutorial

The Core Concepts of Prometheus Alertmanager
Grouping
Grouping in Prometheus Alertmanager consolidates similar alerts into a single notification. This feature is critical during large-scale incidents when multiple systems may fail, triggering a high volume of alerts.
Instead of receiving hundreds of separate alerts, users receive a single, aggregated notification that still provides detailed information about which specific services are affected. Grouping is controlled through a routing tree in the configuration file, which defines how alerts are categorized, when the grouped notifications are sent, and to whom.
Inhibition
Inhibition prevents unnecessary notifications by muting certain alerts if others, indicating a broader issue, are already active. For example, if an entire cluster becomes unreachable, Alertmanager can suppress alerts from individual systems within that cluster to avoid overwhelming the user with redundant notifications.
Inhibitions are set up through the configuration file, allowing users to define which alerts should be suppressed based on the presence of other active alerts.
Silences
Silences allow users to temporarily mute specific alerts for a defined period. This is useful during planned maintenance or when addressing known issues that don’t require immediate attention.
Silences are configured using matchers, similar to routing rules, which determine which alerts are affected. If an incoming alert matches the conditions of an active silence, no notification will be sent. Users can set up and manage silences through Alertmanager’s web interface.
High Availability
Prometheus Alertmanager supports high availability by allowing multiple instances to run in a cluster configuration. This setup ensures continuous alert management, even if one instance fails.
Prometheus is configured to send alerts to all Alertmanager instances, ensuring that alerts are handled reliably without the need for load balancing. This clustering is achieved by using flags in the configuration file.
Tutorial: Configuring Prometheus Alertmanager
Instructions in this tutorial are adapted from the Prometheus documentation.
The Prometheus Alertmanager configuration is primarily handled through a YAML file, which defines routing rules, receiver integrations, and inhibition logic, among other settings. To load a configuration file, the --config.file flag is used. For example, running Alertmanager with the following command will load alertmanager.yml:
./alertmanager --config.file=alertmanager.yml
This configuration can be dynamically reloaded without restarting the service. A reload is triggered by sending a SIGHUP signal or an HTTP POST request to /-/reload. If the file contains invalid syntax, changes won’t be applied, and the error will be logged.
File Layout and Global Settings
The configuration starts with global settings that apply across the board. Here’s an example of some global parameters:
global:
smtp_from: 'alerts@example.com'
smtp_smarthost: 'smtp.example.org:587'
smtp_auth_username: 'user'
smtp_auth_password: 'password'
resolve_timeout: '5m'
slack_api_url: 'https://hooks.slack.com/services/XXX/YYY/ZZZ'
In this block:
- smtp_from and smtp_smarthost define the email settings.
- smtp_auth_username and smtp_auth_password are the credentials used for email authentication.
- resolve_timeout defines the default timeout for alerts marked as resolved.
- slack_api_url configures the webhook URL used for Slack notifications.
Route-Related Settings
Routing rules determine how alerts are grouped, throttled, and delivered to receivers. Each alert passes through a routing tree, starting at the root. Here’s an example of a route configuration:
route:
group_by: ['alertname']
group_wait: 30s
group_interval: 10m
repeat_interval: 3h
receiver: 'email-team'
routes:
- matches:
- severity = critical
receiver: 'pagerduty-team'
In this configuration:
- group_by aggregates alerts with the same alertname label.
- group_wait defines how long to wait before sending a notification for a group of alerts.
- group_interval specifies how long to wait before sending a notification for newly added alerts to an existing group.
- repeat_interval is the time between repeated notifications.
- receiver sets the destination for alerts (in this case, email or PagerDuty).
Inhibition-Related Settings
Inhibition rules allow suppressing certain alerts when others are already active. Here’s an example of an inhibition rule:
inhibit_rules:
- source_matchers:
- alertname = 'InstanceDown'
target_matchers:
- alertname = 'DiskSpaceLow'
equal: ['instance']
In this case:
- If an InstanceDown alert is active for an instance, DiskSpaceLow alerts for the same instance will be suppressed.
- The equal field ensures that the inhibition only applies when the instance label is the same for both alerts.
Label Matchers
Matchers define conditions for alerts to match routes or inhibitions. They support various operators like = for equality and != for inequality. Here’s an example:
matches:
- alertname = 'Watchdog'
- severity =~ 'critical|warning'
In this example:
- The first matcher selects alerts where alertname is Watchdog.
- The second matcher uses a regular expression to match severity values that are either critical or warning.
General Receiver-Related Settings
Receivers define where notifications are sent. A configuration might look like this:
receivers:
- name: 'email-team'
email_configs:
- to: 'team@example.com'
from: 'alerts@example.com'
smarthost: 'smtp.example.org:587'
This configuration sends alert notifications via email to the specified address. Other integrations, such as Slack or PagerDuty, can also be configured under receivers.
Receiver Integration Settings
Each receiver can have its own integration settings. For example, here’s how you might configure Slack:
receivers:
- name: 'slack-team'
slack_configs:
- channel: '#alerts'
send_resolved: true
http_config:
proxy_url: 'http://proxy.example.com'
In this setup:
- Alerts are sent to the #alerts channel on Slack.
- The send_resolved flag ensures notifications are also sent when alerts are resolved.
- The http_config allows the use of a proxy for outgoing Slack requests.
Prometheus Alertmanager Examples
Here are several examples of using Prometheus Alertmanager. These examples are adapted from the Prometheus documentation.
Customizing Slack Notifications
In Prometheus Alertmanager, Slack notifications can be customized to include additional information, such as links to internal documentation for resolving alerts. This can be done using Prometheus’ Go templating system. For example, the following configuration adds a URL that links to the organization’s wiki page for alerts based on their labels.
global:
slack_api_url: '<slack_webhook_url>'
route:
receiver: 'slack-alerts'
group_by: [alertname, datacenter, app]
receivers:
- name: 'slack-alerts'
slack_configs:
- channel: '#alerts'
text: 'https://internal.exampleco.net/wiki/notifications//'
In this configuration:
- The slack_api_url specifies the webhook URL for Slack.
- Alerts are routed to the slack-alerts receiver, which sends notifications to the #alerts Slack channel.
- The text field uses Go templating to generate a dynamic URL pointing to the organization’s internal documentation. The placeholders .GroupLabels.app and .GroupLabels.alertname are replaced with actual values from the alert data, ensuring the link is context-specific.
Accessing Annotations in CommonAnnotations
Alertmanager annotations, such as summary and description, can be included in Slack messages by accessing the CommonAnnotations field. This allows the notification to provide more detailed information about the alert, such as what has occurred and where.
groups:
- name: Instances
rules:
- alert: InstanceDown
expr: up == 0
for: 10m
labels:
severity: page
annotations:
description: 'The {{ $labels.job }} job’s {{ $labels.instance }} has been down for over 10 minutes.'
summary: 'Instance down'
receivers:
- name: 'my-team'
slack_configs:
- channel: '#alerts'
text: "<!channel> nSummary: {{ .CommonAnnotations.summary }} nDescription: {{ .CommonAnnotations.description }}"
In this example:
- The alert InstanceDown triggers when an instance has been unreachable for more than 10 minutes.
- The annotations field defines a summary and description for the alert, which can be dynamically filled using alert labels like instance and job.
- The text field in the Slack configuration formats a message that includes the summary and description from CommonAnnotations.
Defining Reusable Templates
To simplify complex notifications, Alertmanager supports defining reusable templates. These templates can be stored in external files and loaded as needed, making the configuration cleaner and easier to maintain.
Here’s how you can define a reusable template for Slack notifications:
- First, create a file at /alertmanager/template/exampleco.tmpl and define a custom template:
https://internal.exampleco.net/wiki/notifications
- Update the Alertmanager configuration to load this template:
global:
slack_api_url: '<slack_webhook_url>'
route:
receiver: 'slack-alerts'
group_by: [alertname, datacenter, app]
receivers:
- name: 'slack-alerts'
slack_configs:
- channel: '#alerts'
text: '<REFERENCE THE NOTIFICATION TEMPLATE>'
templates:
- '/etc/alertmanager/templates/exampleco.tmpl'
In this setup:
- The custom template slack.exampleco.text is defined in a separate file and constructs a URL based on alert labels.
- The text field in the receiver configuration references this template using the template keyword.
- The templates field specifies the path to the template file, allowing Alertmanager to load and use it for notifications.
Managed Prometheus with Coralogix
Coralogix sets itself apart in observability with its modern architecture, enabling real-time insights into logs, metrics, and traces with built-in cost optimization. Coralogix’s straightforward pricing covers all its platform offerings including APM, RUM, SIEM, infrastructure monitoring and much more. With unparalleled support that features less than 1 minute response times and 1 hour resolution times, Coralogix is a leading choice for thousands of organizations across the globe.