Back

Prometheus Alertmanager: The Basics and a Quick Tutorial

Coralogix Team Dec 20, 2024

9 mins read

The Core Concepts of Prometheus Alertmanager

Grouping

Grouping in Prometheus Alertmanager consolidates similar alerts into a single notification. This feature is critical during large-scale incidents when multiple systems may fail, triggering a high volume of alerts.

Instead of receiving hundreds of separate alerts, users receive a single, aggregated notification that still provides detailed information about which specific services are affected. Grouping is controlled through a routing tree in the configuration file, which defines how alerts are categorized, when the grouped notifications are sent, and to whom.

Inhibition

Inhibition prevents unnecessary notifications by muting certain alerts if others, indicating a broader issue, are already active. For example, if an entire cluster becomes unreachable, Alertmanager can suppress alerts from individual systems within that cluster to avoid overwhelming the user with redundant notifications.

Inhibitions are set up through the configuration file, allowing users to define which alerts should be suppressed based on the presence of other active alerts.

Silences

Silences allow users to temporarily mute specific alerts for a defined period. This is useful during planned maintenance or when addressing known issues that don’t require immediate attention.

Silences are configured using matchers, similar to routing rules, which determine which alerts are affected. If an incoming alert matches the conditions of an active silence, no notification will be sent. Users can set up and manage silences through Alertmanager’s web interface.

High Availability

Prometheus Alertmanager supports high availability by allowing multiple instances to run in a cluster configuration. This setup ensures continuous alert management, even if one instance fails.

Prometheus is configured to send alerts to all Alertmanager instances, ensuring that alerts are handled reliably without the need for load balancing. This clustering is achieved by using flags in the configuration file.

Tutorial: Configuring Prometheus Alertmanager

Instructions in this tutorial are adapted from the Prometheus documentation.

The Prometheus Alertmanager configuration is primarily handled through a YAML file, which defines routing rules, receiver integrations, and inhibition logic, among other settings. To load a configuration file, the --config.file flag is used. For example, running Alertmanager with the following command will load alertmanager.yml:

./alertmanager --config.file=alertmanager.yml

This configuration can be dynamically reloaded without restarting the service. A reload is triggered by sending a SIGHUP signal or an HTTP POST request to /-/reload. If the file contains invalid syntax, changes won’t be applied, and the error will be logged.

File Layout and Global Settings

The configuration starts with global settings that apply across the board. Here’s an example of some global parameters:

global: smtp_from: 'alerts@example.com' smtp_smarthost: 'smtp.example.org:587' smtp_auth_username: 'user' smtp_auth_password: 'password' resolve_timeout: '5m' slack_api_url: 'https://hooks.slack.com/services/XXX/YYY/ZZZ'

In this block:

smtp_from and smtp_smarthost define the email settings.
smtp_auth_username and smtp_auth_password are the credentials used for email authentication.
resolve_timeout defines the default timeout for alerts marked as resolved.
slack_api_url configures the webhook URL used for Slack notifications.

Route-Related Settings

Routing rules determine how alerts are grouped, throttled, and delivered to receivers. Each alert passes through a routing tree, starting at the root. Here’s an example of a route configuration:

route: group_by: ['alertname'] group_wait: 30s group_interval: 10m repeat_interval: 3h receiver: 'email-team' routes: - matches: - severity = critical receiver: 'pagerduty-team'

In this configuration:

group_by aggregates alerts with the same alertname label.
group_wait defines how long to wait before sending a notification for a group of alerts.
group_interval specifies how long to wait before sending a notification for newly added alerts to an existing group.
repeat_interval is the time between repeated notifications.
receiver sets the destination for alerts (in this case, email or PagerDuty).

Inhibition-Related Settings

Inhibition rules allow suppressing certain alerts when others are already active. Here’s an example of an inhibition rule:

inhibit_rules: - source_matchers: - alertname = 'InstanceDown' target_matchers: - alertname = 'DiskSpaceLow' equal: ['instance']

In this case:

If an InstanceDown alert is active for an instance, DiskSpaceLow alerts for the same instance will be suppressed.
The equal field ensures that the inhibition only applies when the instance label is the same for both alerts.

Label Matchers

Matchers define conditions for alerts to match routes or inhibitions. They support various operators like = for equality and != for inequality. Here’s an example:

matches: - alertname = 'Watchdog' - severity =~ 'critical|warning'

In this example:

The first matcher selects alerts where alertname is Watchdog.
The second matcher uses a regular expression to match severity values that are either critical or warning.

General Receiver-Related Settings

Receivers define where notifications are sent. A configuration might look like this:

receivers: - name: 'email-team' email_configs: - to: 'team@example.com' from: 'alerts@example.com' smarthost: 'smtp.example.org:587'

This configuration sends alert notifications via email to the specified address. Other integrations, such as Slack or PagerDuty, can also be configured under receivers.

Receiver Integration Settings

Each receiver can have its own integration settings. For example, here’s how you might configure Slack:

receivers: - name: 'slack-team' slack_configs: - channel: '#alerts' send_resolved: true http_config: proxy_url: 'http://proxy.example.com'

In this setup:

Alerts are sent to the #alerts channel on Slack.
The send_resolved flag ensures notifications are also sent when alerts are resolved.
The http_config allows the use of a proxy for outgoing Slack requests.

Chris Cooney

Head of Developer Advocacy @ Coralogix

Chris Cooney wrote code every day for 10 years as a software engineer. Then, Chris led the technical strategy for a department of 200, for a few years. His role encompassed complex migrations from on-premise to the cloud, PaaS rollouts across the company, centralised provisioning and maintenance of cloud resources, assisting with the creation and execution of a tooling strategy, and more. Now, Chris talks about Observability at conferences, makes videos and still writes as much code as he can.

Tips from the expert:

In my experience, here are tips that can help you better leverage Prometheus Alertmanager:

Leverage “repeat_interval” strategically: Set longer repeat_interval for low-severity alerts to reduce notification noise but use a shorter one for critical incidents to ensure fast response without overwhelming teams.
Auto-silence based on deployment events: Integrate your CI/CD pipeline with Alertmanager to automatically silence alerts during deployments. This avoids triggering alerts during planned downtime or configuration changes.
Dynamic alert routing using external labels: Use Prometheus’ external labels to dynamically adjust alert routing. This is especially useful in multi-tenant or multi-cluster environments, ensuring alerts are sent to the correct team without modifying multiple configurations.
Set conditional silences during incident resolution: Set up silences that activate conditionally when specific alerts are resolved. This prevents alert re-triggers when recovering from a critical incident and transitioning back to normal operations.
Use alert dependencies for smarter inhibitions: Define dependencies between alerts, ensuring that child alerts (e.g., service-level failures) are inhibited only if the parent alert (e.g., network outage) is active, optimizing root-cause identification and reducing noise.

Prometheus Alertmanager Examples

Here are several examples of using Prometheus Alertmanager. These examples are adapted from the Prometheus documentation.

Customizing Slack Notifications

In Prometheus Alertmanager, Slack notifications can be customized to include additional information, such as links to internal documentation for resolving alerts. This can be done using Prometheus’ Go templating system. For example, the following configuration adds a URL that links to the organization’s wiki page for alerts based on their labels.

global: slack_api_url: '<slack_webhook_url>'

route: receiver: 'slack-alerts' group_by: [alertname, datacenter, app]

receivers: - name: 'slack-alerts' slack_configs: - channel: '#alerts' text: 'https://internal.exampleco.net/wiki/notifications//'

In this configuration:

The slack_api_url specifies the webhook URL for Slack.
Alerts are routed to the slack-alerts receiver, which sends notifications to the #alerts Slack channel.
The text field uses Go templating to generate a dynamic URL pointing to the organization’s internal documentation. The placeholders .GroupLabels.app and .GroupLabels.alertname are replaced with actual values from the alert data, ensuring the link is context-specific.

Accessing Annotations in CommonAnnotations

Alertmanager annotations, such as summary and description, can be included in Slack messages by accessing the CommonAnnotations field. This allows the notification to provide more detailed information about the alert, such as what has occurred and where.

groups: - name: Instances rules: - alert: InstanceDown expr: up == 0 for: 10m labels: severity: page annotations: description: 'The {{ $labels.job }} job’s {{ $labels.instance }} has been down for over 10 minutes.' summary: 'Instance down'

receivers: - name: 'my-team' slack_configs: - channel: '#alerts' text: "<!channel> nSummary: {{ .CommonAnnotations.summary }} nDescription: {{ .CommonAnnotations.description }}"

In this example:

The alert InstanceDown triggers when an instance has been unreachable for more than 10 minutes.
The annotations field defines a summary and description for the alert, which can be dynamically filled using alert labels like instance and job.
The text field in the Slack configuration formats a message that includes the summary and description from CommonAnnotations.

Defining Reusable Templates

To simplify complex notifications, Alertmanager supports defining reusable templates. These templates can be stored in external files and loaded as needed, making the configuration cleaner and easier to maintain.

Here’s how you can define a reusable template for Slack notifications:

First, create a file at /alertmanager/template/exampleco.tmpl and define a custom template:

https://internal.exampleco.net/wiki/notifications

Update the Alertmanager configuration to load this template:

global: slack_api_url: '<slack_webhook_url>'

route: receiver: 'slack-alerts' group_by: [alertname, datacenter, app]

receivers: - name: 'slack-alerts' slack_configs: - channel: '#alerts' text: '<REFERENCE THE NOTIFICATION TEMPLATE>'

templates: - '/etc/alertmanager/templates/exampleco.tmpl'

In this setup:

The custom template slack.exampleco.text is defined in a separate file and constructs a URL based on alert labels.
The text field in the receiver configuration references this template using the template keyword.
The templates field specifies the path to the template file, allowing Alertmanager to load and use it for notifications.

Managed Prometheus with Coralogix

Coralogix sets itself apart in observability with its modern architecture, enabling real-time insights into logs, metrics, and traces with built-in cost optimization. Coralogix’s straightforward pricing covers all its platform offerings including APM, RUM, SIEM, infrastructure monitoring and much more. With unparalleled support that features less than 1 minute response times and 1 hour resolution times, Coralogix is a leading choice for thousands of organizations across the globe.

Learn more about managed Prometheus with Coralogix

On this page