Alerting Overview

DC/OS Prometheus Alerting Overview

Alerting with Prometheus :

Alerting with Prometheus is divided into the following parts:

  1. Set up and configure the Alertmanager

  2. Configure Prometheus to talk to the Alertmanager

  3. Create alerting rules in Prometheus

  4. Send notification to Slack, PagerDuty, or email

Set up and configure the Alertmanager

Prometheus fires an alert to the Alertmanager, the Alertmanager then manages those alerts, then sends out notifications via methods such as Slack, email and PagerDuty.

The template for Alertmanager configuration is below:

  # ResolveTimeout is the time after which an alert is declared resolved
  # if it has not been updated.
  [ resolve_timeout: <duration> | default = 5m ]

  # The default SMTP From header field.
  [ smtp_from: <tmpl_string> ]
  # The default SMTP smarthost used for sending emails, including port number.
  # Port number usually is 25,
  # Example:
  [ smtp_smarthost: <string> ]
  # The default hostname to identify to the SMTP server.
  [ smtp_hello: <string> | default = "localhost" ]
  [ smtp_auth_username: <string> ]
  # SMTP Auth using LOGIN and PLAIN.
  [ smtp_auth_password: <secret> ]
  # SMTP Auth using PLAIN.
  [ smtp_auth_identity: <string> ]
  # SMTP Auth using CRAM-MD5.
  [ smtp_auth_secret: <secret> ]
  # The default SMTP TLS requirement.
  [ smtp_require_tls: <bool> | default = true ]

  # The API URL to use for Slack notifications.
  [ slack_api_url: <string> ]

  [ pagerduty_url: <string> | default = "" ]
  [ opsgenie_api_url: <string> | default = "" ]

  # The default HTTP client configuration
  [ http_config: <http_config> ]

# Files from which custom notification template definitions are read.
# The last component may use a wildcard matcher, such as 'templates/*.tmpl'.
  [ - <filepath> ... ]

# The root node of the routing tree.
route: <route>

# A list of notification receivers.
  - <receiver> ...

# A list of inhibition rules.
  [ - <inhibit_rule> ... ]-

Configure Prometheus to talk to the Alertmanager:

To initiate communication between Prometheus and Alertmanager, Alertmanager end points are added as targets to the Prometheus yml under alerting at the end of the following default configuration :

 scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
 evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
 # scrape_timeout is set to the global default (10s).# A scrape configuration containing exactly one endpoint to scrape:
# Here it is Prometheus itself.
 - job_name: 'dcos-metrics'    # All master nodes are available at master.mesos via their A record
     - names: ['master.mesos']
       type: 'A'
       port: 61091    # All agent nodes are written regularly to discovery/agents.json
     - files: ['discovery/agents.json']

   # set of rule files to read alerting rules from
   -  'rules.yml'      

   - static_configs:
     - targets: ['']

Create alerting rules in Prometheus

Alerting rules allow you to define alert conditions based on Prometheus language expressions and to send notifications about firing alerts to an external service like Slack, PagerDuty, and email.

The default alert rules are below. Rules files are accessed in the Prometheus configuration .yml.

- name: cpurule
 - alert: highcpu
   expr: cpu_total > 2
     DESCRIPTION: 'High CPU Utilization'
     SUMMARY: 'This is to notify for high cpu utilization'

The following example checks which instance is down :

alert: InstanceDown
expr: up == 0
for: 5m
  severity: page
  summary: "Instance {{$labels.instance}} down"
  description: "{{$labels.instance}} of job {{$labels.job}} has been down for more than 5 minutes."

Send notification to Slack, PagerDuty, and email

Alertmanager manages sending alerts to Slack, PagerDuty, and email.


To configure Slack with Alertmanager, the Alertmanager uses the Incoming Webhooks feature of Slack.

The default configuration below sends alerts to Slack to be configured under the Alertmanager configuration yml.

group_by: [cluster]
receiver: webh
group_interval: 1mreceivers:
- name: webh
 - url:


To configure PagerDuty with Alertmanager :

  1. Create a service in PagerDuty, and obtain an integration key.

  2. Go to the “Services” page in PagerDuty:

  3. Click “+ Add New Service”:

  4. Note down the Integration Key.

Here is a sample configuration for adding a PagerDuty setup to the Alertmanager configuration yml.

- name: team-pager
  - service_key: $INTEGRATION_KEY


Here is a sample configuration for adding an email alert setup to the Alertmanager configuration yml. # Substitute your full gmail address here.
GMAIL_AUTH_TOKEN=XXXX        # Substitute your app password

- name: email-me
    from: $GMAIL_ACCOUNT
    auth_username: "$GMAIL_ACCOUNT"
    auth_identity: "$GMAIL_ACCOUNT"
    auth_password: "$GMAIL_AUTH_TOKEN"