Configure advanced SNMP alerting with Prometheus Alertmanager for network monitoring

Advanced 45 min May 28, 2026 79 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Set up comprehensive SNMP monitoring with Prometheus exporters and create intelligent alerting rules in Alertmanager to proactively monitor network devices, interfaces, and performance metrics.

Prerequisites

  • Existing Prometheus installation
  • Network devices with SNMP enabled
  • Basic understanding of SNMP protocol
  • Email server for notifications

What this solves

SNMP monitoring provides deep visibility into network devices like switches, routers, and firewalls, but raw SNMP data becomes useful only when paired with intelligent alerting. This tutorial shows you how to configure Prometheus SNMP exporters to collect metrics from network devices and create sophisticated alerting rules in Alertmanager that trigger on specific conditions like interface downtime, high bandwidth utilization, or device failures.

Step-by-step configuration

Install Prometheus SNMP exporter

Download and install the SNMP exporter that translates SNMP queries into Prometheus metrics.

sudo apt update
wget https://github.com/prometheus/snmp_exporter/releases/download/v0.25.0/snmp_exporter-0.25.0.linux-amd64.tar.gz
tar xzf snmp_exporter-0.25.0.linux-amd64.tar.gz
sudo mv snmp_exporter-0.25.0.linux-amd64/snmp_exporter /usr/local/bin/
sudo chmod +x /usr/local/bin/snmp_exporter
sudo dnf update -y
wget https://github.com/prometheus/snmp_exporter/releases/download/v0.25.0/snmp_exporter-0.25.0.linux-amd64.tar.gz
tar xzf snmp_exporter-0.25.0.linux-amd64.tar.gz
sudo mv snmp_exporter-0.25.0.linux-amd64/snmp_exporter /usr/local/bin/
sudo chmod +x /usr/local/bin/snmp_exporter

Create SNMP exporter user and directories

Set up a dedicated user and directory structure for the SNMP exporter with proper security isolation.

sudo useradd --system --no-create-home --shell /bin/false snmp_exporter
sudo mkdir -p /etc/snmp_exporter /var/lib/snmp_exporter
sudo chown snmp_exporter:snmp_exporter /var/lib/snmp_exporter

Download SNMP exporter configuration

Get the default SNMP configuration file that includes MIB definitions for common network devices.

sudo wget https://raw.githubusercontent.com/prometheus/snmp_exporter/main/snmp.yml -O /etc/snmp_exporter/snmp.yml
sudo chown snmp_exporter:snmp_exporter /etc/snmp_exporter/snmp.yml

Create SNMP exporter systemd service

Configure the SNMP exporter to run as a system service with automatic restart capabilities.

[Unit]
Description=Prometheus SNMP Exporter
Documentation=https://github.com/prometheus/snmp_exporter
Wants=network-online.target
After=network-online.target

[Service]
Type=simple
User=snmp_exporter
Group=snmp_exporter
ExecReload=/bin/kill -HUP $MAINPID
ExecStart=/usr/local/bin/snmp_exporter \
  --config.file=/etc/snmp_exporter/snmp.yml \
  --web.listen-address=0.0.0.0:9116

SyslogIdentifier=snmp_exporter
Restart=always
RestartSec=1
StartLimitInterval=0

[Install]
WantedBy=multi-user.target

Enable and start SNMP exporter

Start the service and enable it to run at boot time.

sudo systemctl daemon-reload
sudo systemctl enable --now snmp_exporter
sudo systemctl status snmp_exporter

Configure Prometheus to scrape SNMP metrics

Add SNMP scraping configuration to your existing Prometheus setup. This example targets a Cisco switch and monitors critical interfaces.

global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "snmp_alerts.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - localhost:9093

scrape_configs:
  - job_name: 'snmp'
    static_configs:
      - targets:
        - 192.168.1.10  # Cisco switch IP
        - 192.168.1.11  # Router IP
    metrics_path: /snmp
    params:
      module: [if_mib]  # Use interface MIB module
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9116

Create advanced SNMP alerting rules

Define comprehensive alerting rules that monitor interface status, bandwidth utilization, and device availability.

groups:
  • name: snmp_network_alerts
rules: - alert: NetworkInterfaceDown expr: ifOperStatus{job="snmp"} == 2 for: 2m labels: severity: critical service: network annotations: summary: "Network interface {{ $labels.ifDescr }} is down on {{ $labels.instance }}" description: "Interface {{ $labels.ifDescr }} on device {{ $labels.instance }} has been down for more than 2 minutes. Current status: {{ $value }}" - alert: HighBandwidthUtilization expr: | ( rate(ifHCInOctets{job="snmp"}[5m]) * 8 + rate(ifHCOutOctets{job="snmp"}[5m]) * 8 ) / ifHighSpeed{job="snmp"} * 100 > 80 for: 5m labels: severity: warning service: network annotations: summary: "High bandwidth utilization on {{ $labels.ifDescr }} ({{ $labels.instance }})" description: "Interface {{ $labels.ifDescr }} on {{ $labels.instance }} is experiencing {{ $value | humanize }}% bandwidth utilization for more than 5 minutes." - alert: SNMPDeviceUnreachable expr: up{job="snmp"} == 0 for: 3m labels: severity: critical service: network annotations: summary: "SNMP device {{ $labels.instance }} is unreachable" description: "Cannot reach SNMP device {{ $labels.instance }} for more than 3 minutes. Check network connectivity and device status." - alert: InterfaceErrorRateHigh expr: | ( rate(ifInErrors{job="snmp"}[5m]) + rate(ifOutErrors{job="snmp"}[5m]) ) / ( rate(ifInUcastPkts{job="snmp"}[5m]) + rate(ifOutUcastPkts{job="snmp"}[5m]) ) * 100 > 1 for: 10m labels: severity: warning service: network annotations: summary: "High error rate on interface {{ $labels.ifDescr }} ({{ $labels.instance }})" description: "Interface {{ $labels.ifDescr }} on {{ $labels.instance }} has an error rate of {{ $value | humanize }}% for more than 10 minutes." - alert: DeviceMemoryUtilizationHigh expr: | (ciscoMemoryPoolUsed{job="snmp"} / (ciscoMemoryPoolUsed{job="snmp"} + ciscoMemoryPoolFree{job="snmp"})) * 100 > 85 for: 15m labels: severity: warning service: network annotations: summary: "High memory utilization on {{ $labels.instance }}" description: "Device {{ $labels.instance }} memory pool {{ $labels.ciscoMemoryPoolName }} is at {{ $value | humanize }}% utilization." - alert: DeviceCPUUtilizationHigh expr: ciscoCPUBusyPercentage{job="snmp"} > 80 for: 10m labels: severity: warning service: network annotations: summary: "High CPU utilization on {{ $labels.instance }}" description: "CPU utilization on {{ $labels.instance }} has been above 80% for more than 10 minutes. Current value: {{ $value }}%"

Install and configure Alertmanager

Set up Alertmanager to handle alert routing, grouping, and notifications for SNMP alerts.

wget https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-0.27.0.linux-amd64.tar.gz
tar xzf alertmanager-0.27.0.linux-amd64.tar.gz
sudo mv alertmanager-0.27.0.linux-amd64/alertmanager /usr/local/bin/
sudo mv alertmanager-0.27.0.linux-amd64/amtool /usr/local/bin/
sudo chmod +x /usr/local/bin/alertmanager /usr/local/bin/amtool
wget https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-0.27.0.linux-amd64.tar.gz
tar xzf alertmanager-0.27.0.linux-amd64.tar.gz
sudo mv alertmanager-0.27.0.linux-amd64/alertmanager /usr/local/bin/
sudo mv alertmanager-0.27.0.linux-amd64/amtool /usr/local/bin/
sudo chmod +x /usr/local/bin/alertmanager /usr/local/bin/amtool

Create Alertmanager user and directories

Set up the necessary user account and directory structure for Alertmanager.

sudo useradd --system --no-create-home --shell /bin/false alertmanager
sudo mkdir -p /etc/alertmanager /var/lib/alertmanager
sudo chown alertmanager:alertmanager /etc/alertmanager /var/lib/alertmanager

Configure Alertmanager for SNMP alerts

Create an Alertmanager configuration that routes SNMP alerts based on severity and service type with intelligent grouping and notification channels.

global:
  smtp_smarthost: 'localhost:587'
  smtp_from: 'alertmanager@example.com'
  smtp_auth_username: 'alertmanager@example.com'
  smtp_auth_password: 'your_email_password'

route:
  group_by: ['alertname', 'instance', 'severity']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'network-team'
  routes:
  - match:
      service: network
      severity: critical
    receiver: 'critical-network-alerts'
    group_wait: 5s
    repeat_interval: 15m
  - match:
      service: network
      severity: warning
    receiver: 'network-warnings'
    group_interval: 5m
    repeat_interval: 2h

receivers:
  • name: 'network-team'
email_configs: - to: 'network-team@example.com' subject: 'Network Alert: {{ range .Alerts }}{{ .Annotations.summary }}{{ end }}' body: | {{ range .Alerts }} Alert: {{ .Annotations.summary }} Description: {{ .Annotations.description }} Device: {{ .Labels.instance }} Severity: {{ .Labels.severity }} Started: {{ .StartsAt }} {{ end }}
  • name: 'critical-network-alerts'
email_configs: - to: 'network-oncall@example.com' subject: 'CRITICAL Network Alert: {{ range .Alerts }}{{ .Annotations.summary }}{{ end }}' body: | CRITICAL NETWORK ALERT {{ range .Alerts }} Alert: {{ .Annotations.summary }} Description: {{ .Annotations.description }} Device: {{ .Labels.instance }} Interface: {{ .Labels.ifDescr }} Started: {{ .StartsAt }} This requires immediate attention. {{ end }} webhook_configs: - url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK' send_resolved: true title: 'Critical Network Alert' text: '{{ range .Alerts }}{{ .Annotations.summary }}\n{{ .Annotations.description }}{{ end }}'
  • name: 'network-warnings'
email_configs: - to: 'network-team@example.com' subject: 'Network Warning: {{ range .Alerts }}{{ .Annotations.summary }}{{ end }}' body: | Network Warning Alert {{ range .Alerts }} Alert: {{ .Annotations.summary }} Description: {{ .Annotations.description }} Device: {{ .Labels.instance }} Severity: {{ .Labels.severity }} Started: {{ .StartsAt }} {{ end }} inhibit_rules:
  • source_match:
severity: 'critical' target_match: severity: 'warning' equal: ['instance', 'service']

Create Alertmanager systemd service

Configure Alertmanager to run as a system service with proper security settings.

[Unit]
Description=Alertmanager
Documentation=https://prometheus.io/docs/alerting/alertmanager/
Wants=network-online.target
After=network-online.target

[Service]
Type=simple
User=alertmanager
Group=alertmanager
ExecReload=/bin/kill -HUP $MAINPID
ExecStart=/usr/local/bin/alertmanager \
  --config.file=/etc/alertmanager/alertmanager.yml \
  --storage.path=/var/lib/alertmanager/ \
  --web.listen-address=0.0.0.0:9093 \
  --web.external-url=http://localhost:9093 \
  --cluster.listen-address=0.0.0.0:9094

SyslogIdentifier=alertmanager
Restart=always
RestartSec=1
StartLimitInterval=0

[Install]
WantedBy=multi-user.target

Configure firewall rules for SNMP monitoring

Open necessary ports for SNMP exporter and Alertmanager while maintaining security.

sudo ufw allow 9116/tcp comment 'SNMP Exporter'
sudo ufw allow 9093/tcp comment 'Alertmanager'
sudo ufw allow from 192.168.1.0/24 to any port 161 proto udp comment 'SNMP queries'
sudo firewall-cmd --permanent --add-port=9116/tcp
sudo firewall-cmd --permanent --add-port=9093/tcp
sudo firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="192.168.1.0/24" port protocol="udp" port="161" accept'
sudo firewall-cmd --reload

Set up custom SNMP community configuration

Create a secure SNMP v3 configuration for enhanced security. Replace with SNMPv2c only if your devices don't support v3.

modules:
  if_mib_v3:
    walk:
    - 1.3.6.1.2.1.2.2.1.1     # ifIndex
    - 1.3.6.1.2.1.2.2.1.2     # ifDescr
    - 1.3.6.1.2.1.2.2.1.8     # ifOperStatus
    - 1.3.6.1.2.1.2.2.1.10    # ifInOctets
    - 1.3.6.1.2.1.2.2.1.16    # ifOutOctets
    - 1.3.6.1.2.1.31.1.1.1.6  # ifHCInOctets
    - 1.3.6.1.2.1.31.1.1.1.10 # ifHCOutOctets
    - 1.3.6.1.2.1.31.1.1.1.15 # ifHighSpeed
    auth:
      security_level: authPriv
      username: snmpuser
      password: YourAuthPassword123!
      auth_protocol: SHA
      priv_protocol: AES
      priv_password: YourPrivPassword456!

Start and enable all services

Start Alertmanager and restart Prometheus to load the new configuration.

sudo systemctl enable --now alertmanager
sudo systemctl restart prometheus
sudo systemctl status alertmanager prometheus snmp_exporter

Verify your setup

Test that SNMP metrics are being collected and alerts can be triggered properly.

# Check SNMP exporter metrics
curl -s "http://localhost:9116/snmp?module=if_mib&target=192.168.1.10" | grep ifOperStatus

Verify Alertmanager is receiving alerts

curl -s http://localhost:9093/api/v1/alerts | jq '.data[] | {"alertname": .labels.alertname, "status": .status.state}'

Test alert rule syntax

sudo promtool check rules /etc/prometheus/snmp_alerts.yml

Check if Prometheus can reach targets

curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | {"job": .labels.job, "health": .health, "target": .scrapeUrl}'

Test Alertmanager configuration

sudo amtool config show --alertmanager.url=http://localhost:9093
Note: Replace 192.168.1.10 with your actual network device IP. If using SNMPv3, update the Prometheus scraping configuration to include auth parameters.

Common issues

SymptomCauseFix
No SNMP metrics appearingSNMP community string mismatchVerify snmp.yml community settings match device configuration
Alerts not triggeringPrometheus not loading rulesCheck sudo promtool check rules /etc/prometheus/snmp_alerts.yml
Devices showing as downFirewall blocking SNMP trafficEnsure UDP port 161 is accessible between Prometheus and devices
Email notifications not sentSMTP configuration incorrectTest with sudo amtool config show and verify SMTP settings
High bandwidth alerts false positiveInterface speed not detected properlyCheck ifHighSpeed OID is supported and returning correct values
SNMPv3 authentication failsUser not configured on deviceConfigure SNMPv3 user on network device with matching credentials

Next steps

Running this in production?

Want this handled for you? Running this at scale adds a second layer of work: capacity planning, failover drills, cost control, and on-call. See how we run infrastructure like this for European teams.

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle managed devops services for businesses that depend on uptime. From initial setup to ongoing operations.