Set up MySQL backup monitoring with Prometheus alerts and Grafana dashboards

Intermediate 45 min May 14, 2026 35 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Monitor MySQL backup health with automated mysqldump, Prometheus metrics collection, and Grafana dashboards. Get instant alerts when backups fail or take too long.

Prerequisites

  • MySQL server running
  • Prometheus and Grafana installed
  • Root or sudo access
  • Basic understanding of systemd timers

What this solves

Automated MySQL backups are critical, but they fail silently. This tutorial sets up comprehensive backup monitoring using mysqldump with systemd timers, Prometheus MySQL Exporter for metrics collection, and Grafana dashboards for visualization. You'll get instant alerts when backups fail, take too long, or when backup files are missing.

Step-by-step configuration

Install required packages

Install MySQL client tools, systemd timer utilities, and monitoring components.

sudo apt update
sudo apt install -y mysql-client prometheus-mysqld-exporter gzip curl
sudo dnf update -y
sudo dnf install -y mysql gzip curl
wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.15.1/mysqld_exporter-0.15.1.linux-amd64.tar.gz
tar xzf mysqld_exporter-0.15.1.linux-amd64.tar.gz
sudo mv mysqld_exporter-0.15.1.linux-amd64/mysqld_exporter /usr/local/bin/

Create backup directory structure

Set up directories for backup storage with proper permissions.

sudo mkdir -p /var/backups/mysql/{daily,logs}
sudo mkdir -p /var/lib/prometheus/mysql-backup-metrics
sudo chown mysql:mysql /var/backups/mysql
sudo chmod 755 /var/backups/mysql

Create MySQL backup user

Create a dedicated MySQL user with minimal required privileges for backups.

mysql -u root -p -e "CREATE USER 'backup_user'@'localhost' IDENTIFIED BY 'SecureBackupPass123!';"
mysql -u root -p -e "GRANT SELECT, SHOW DATABASES, LOCK TABLES, SHOW VIEW, EVENT, TRIGGER ON . TO 'backup_user'@'localhost';"
mysql -u root -p -e "FLUSH PRIVILEGES;"

Configure MySQL credentials

Store MySQL credentials securely for automated backup access.

[client]
user=backup_user
password=SecureBackupPass123!
host=localhost
port=3306
sudo chmod 600 /etc/mysql/backup.cnf
sudo chown mysql:mysql /etc/mysql/backup.cnf

Create backup script with metrics

Build a comprehensive backup script that generates Prometheus metrics.

#!/bin/bash

Configuration

BACKUP_DIR="/var/backups/mysql/daily" LOG_FILE="/var/backups/mysql/logs/backup-$(date +%Y%m%d-%H%M%S).log" METRICS_FILE="/var/lib/prometheus/mysql-backup-metrics/mysql_backup.prom" RETENTION_DAYS=7 MAX_BACKUP_TIME_SECONDS=3600

Start metrics

START_TIME=$(date +%s) echo "mysql_backup_start_timestamp $START_TIME" > $METRICS_FILE echo "mysql_backup_running 1" >> $METRICS_FILE

Function to update metrics on exit

cleanup() { END_TIME=$(date +%s) DURATION=$((END_TIME - START_TIME)) echo "mysql_backup_running 0" >> $METRICS_FILE echo "mysql_backup_end_timestamp $END_TIME" >> $METRICS_FILE echo "mysql_backup_duration_seconds $DURATION" >> $METRICS_FILE echo "mysql_backup_last_run_timestamp $END_TIME" >> $METRICS_FILE } trap cleanup EXIT

Log function

log() { echo "$(date '+%Y-%m-%d %H:%M:%S') $1" | tee -a $LOG_FILE } log "Starting MySQL backup process"

Get database list

DATABASES=$(mysql --defaults-file=/etc/mysql/backup.cnf -e "SHOW DATABASES;" | grep -Ev "(Database|information_schema|performance_schema|mysql|sys)") if [ -z "$DATABASES" ]; then log "ERROR: No databases found to backup" echo "mysql_backup_success 0" >> $METRICS_FILE echo "mysql_backup_databases_total 0" >> $METRICS_FILE exit 1 fi

Count databases

DB_COUNT=$(echo "$DATABASES" | wc -l) echo "mysql_backup_databases_total $DB_COUNT" >> $METRICS_FILE log "Found $DB_COUNT databases to backup"

Create timestamp for backup files

TIMESTAMP=$(date +%Y%m%d_%H%M%S)

Backup each database

SUCCESS_COUNT=0 TOTAL_SIZE=0 for DB in $DATABASES; do log "Backing up database: $DB" BACKUP_FILE="$BACKUP_DIR/${DB}_${TIMESTAMP}.sql.gz" # Perform backup with compression if mysqldump --defaults-file=/etc/mysql/backup.cnf \ --single-transaction \ --routines \ --triggers \ --events \ --hex-blob \ --opt \ "$DB" | gzip > "$BACKUP_FILE"; then # Verify backup file exists and has content if [ -s "$BACKUP_FILE" ]; then FILE_SIZE=$(stat -f%z "$BACKUP_FILE" 2>/dev/null || stat -c%s "$BACKUP_FILE") TOTAL_SIZE=$((TOTAL_SIZE + FILE_SIZE)) SUCCESS_COUNT=$((SUCCESS_COUNT + 1)) log "SUCCESS: $DB backed up successfully (${FILE_SIZE} bytes)" else log "ERROR: Backup file for $DB is empty" rm -f "$BACKUP_FILE" fi else log "ERROR: Failed to backup database $DB" fi done

Update success metrics

echo "mysql_backup_databases_success $SUCCESS_COUNT" >> $METRICS_FILE echo "mysql_backup_total_size_bytes $TOTAL_SIZE" >> $METRICS_FILE

Cleanup old backups

log "Cleaning up backups older than $RETENTION_DAYS days" find $BACKUP_DIR -name "*.sql.gz" -mtime +$RETENTION_DAYS -delete find /var/backups/mysql/logs -name "*.log" -mtime +$RETENTION_DAYS -delete

Final status

if [ $SUCCESS_COUNT -eq $DB_COUNT ]; then echo "mysql_backup_success 1" >> $METRICS_FILE log "SUCCESS: All $DB_COUNT databases backed up successfully" exit 0 else echo "mysql_backup_success 0" >> $METRICS_FILE log "ERROR: Only $SUCCESS_COUNT of $DB_COUNT databases backed up successfully" exit 1 fi
sudo chmod +x /usr/local/bin/mysql-backup.sh
sudo chown mysql:mysql /usr/local/bin/mysql-backup.sh

Create systemd service and timer

Set up systemd timer for automated daily backups with logging.

[Unit]
Description=MySQL Database Backup
Wants=mysql.service
After=mysql.service

[Service]
Type=oneshot
User=mysql
Group=mysql
ExecStart=/usr/local/bin/mysql-backup.sh
StandardOutput=journal
StandardError=journal
TimeoutSec=3600
[Unit]
Description=Run MySQL backup daily at 2 AM
Requires=mysql-backup.service

[Timer]
OnCalendar=--* 02:00:00
RandomizedDelaySec=300
Persistent=true

[Install]
WantedBy=timers.target
sudo systemctl daemon-reload
sudo systemctl enable mysql-backup.timer
sudo systemctl start mysql-backup.timer

Configure MySQL Exporter for monitoring

Set up Prometheus MySQL Exporter to collect database metrics.

[Unit]
Description=Prometheus MySQL Exporter
Wants=mysql.service
After=mysql.service

[Service]
Type=simple
Restart=always
User=prometheus
Group=prometheus
ExecStart=/usr/bin/mysqld_exporter \
  --config.my-cnf=/etc/mysql/backup.cnf \
  --web.listen-address=127.0.0.1:9104 \
  --collect.info_schema.tables \
  --collect.info_schema.innodb_metrics \
  --collect.global_status \
  --collect.binlog_size \
  --collect.info_schema.processlist

[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable mysql-exporter.service
sudo systemctl start mysql-exporter.service
sudo useradd --no-create-home --shell /bin/false prometheus
sudo systemctl daemon-reload
sudo systemctl enable mysql-exporter.service
sudo systemctl start mysql-exporter.service

Configure Node Exporter for backup metrics

Set up textfile collector to expose backup metrics to Prometheus.

sudo apt install -y prometheus-node-exporter
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xzf node_exporter-1.7.0.linux-amd64.tar.gz
sudo mv node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/
[Unit]
Description=Prometheus Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
Type=simple
Restart=always
User=prometheus
Group=prometheus
ExecStart=/usr/local/bin/node_exporter \
  --web.listen-address=127.0.0.1:9100 \
  --collector.textfile.directory=/var/lib/prometheus/mysql-backup-metrics

[Install]
WantedBy=multi-user.target
sudo mkdir -p /var/lib/prometheus/mysql-backup-metrics
sudo chown prometheus:prometheus /var/lib/prometheus/mysql-backup-metrics
sudo chmod 755 /var/lib/prometheus/mysql-backup-metrics
sudo systemctl daemon-reload
sudo systemctl enable node-exporter.service
sudo systemctl start node-exporter.service

Configure Prometheus scraping

Add MySQL and backup metrics to Prometheus configuration.

global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "/etc/prometheus/rules/mysql-backup.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - localhost:9093

scrape_configs:
  - job_name: 'mysql-exporter'
    static_configs:
      - targets: ['127.0.0.1:9104']
    scrape_interval: 30s
    metrics_path: /metrics

  - job_name: 'node-exporter-mysql-backup'
    static_configs:
      - targets: ['127.0.0.1:9100']
    scrape_interval: 60s
    metrics_path: /metrics

Create Prometheus alerting rules

Set up alert rules for backup failures and performance issues.

groups:
  - name: mysql_backup_alerts
    rules:
      - alert: MySQLBackupFailed
        expr: mysql_backup_success == 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "MySQL backup failed"
          description: "MySQL backup job failed. Last successful backup was {{ $value }} seconds ago."

      - alert: MySQLBackupTooLong
        expr: mysql_backup_duration_seconds > 3600
        for: 0m
        labels:
          severity: warning
        annotations:
          summary: "MySQL backup taking too long"
          description: "MySQL backup has been running for {{ $value }} seconds, exceeding the 1-hour threshold."

      - alert: MySQLBackupNotRunning
        expr: time() - mysql_backup_last_run_timestamp > 86400 * 2
        for: 10m
        labels:
          severity: critical
        annotations:
          summary: "MySQL backup hasn't run recently"
          description: "MySQL backup hasn't run for {{ $value | humanizeDuration }}. Check the systemd timer."

      - alert: MySQLBackupPartialFailure
        expr: mysql_backup_databases_success < mysql_backup_databases_total
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Some MySQL databases failed to backup"
          description: "Only {{ $labels.mysql_backup_databases_success }} out of {{ $labels.mysql_backup_databases_total }} databases were backed up successfully."

      - alert: MySQLBackupSizeUnusuallySmall
        expr: mysql_backup_total_size_bytes < 1000000  # Less than 1MB
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "MySQL backup size suspiciously small"
          description: "MySQL backup size is {{ $value }} bytes, which seems unusually small. Verify backup integrity."

  - name: mysql_performance_alerts
    rules:
      - alert: MySQLHighConnections
        expr: mysql_global_status_threads_connected / mysql_global_variables_max_connections > 0.8
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "MySQL connection usage high"
          description: "MySQL is using {{ $value | humanizePercentage }} of available connections."

      - alert: MySQLSlowQueries
        expr: rate(mysql_global_status_slow_queries[5m]) > 5
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "MySQL slow queries increasing"
          description: "MySQL slow queries rate is {{ $value }} queries/second over the last 5 minutes."
sudo mkdir -p /etc/prometheus/rules
sudo chown prometheus:prometheus /etc/prometheus/rules/mysql-backup.yml
sudo systemctl restart prometheus

Configure Alertmanager

Set up Alertmanager for notification routing and email alerts.

global:
  smtp_smarthost: 'localhost:587'
  smtp_from: 'alerts@example.com'
  smtp_auth_username: 'alerts@example.com'
  smtp_auth_password: 'your-email-password'

route:
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h
  receiver: 'mysql-backup-team'
  routes:
    - match:
        severity: critical
      receiver: 'mysql-backup-critical'

receivers:
  - name: 'mysql-backup-team'
    email_configs:
      - to: 'ops-team@example.com'
        subject: 'MySQL Backup Alert: {{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'
        body: |
          {{ range .Alerts }}
          Alert: {{ .Annotations.summary }}
          Description: {{ .Annotations.description }}
          Instance: {{ .Labels.instance }}
          Severity: {{ .Labels.severity }}
          {{ end }}

  - name: 'mysql-backup-critical'
    email_configs:
      - to: 'ops-team@example.com,dba-team@example.com'
        subject: 'CRITICAL MySQL Backup Issue: {{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'
        body: |
          CRITICAL ALERT - Immediate action required
          
          {{ range .Alerts }}
          Alert: {{ .Annotations.summary }}
          Description: {{ .Annotations.description }}
          Instance: {{ .Labels.instance }}
          Time: {{ .StartsAt }}
          {{ end }}

inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'instance']
sudo apt install -y prometheus-alertmanager
wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz
tar xzf alertmanager-0.26.0.linux-amd64.tar.gz
sudo mv alertmanager-0.26.0.linux-amd64/alertmanager /usr/local/bin/
sudo systemctl enable alertmanager
sudo systemctl start alertmanager

Create Grafana dashboard

Import a comprehensive MySQL backup monitoring dashboard into Grafana.

{
  "dashboard": {
    "id": null,
    "title": "MySQL Backup Monitoring",
    "tags": ["mysql", "backup", "monitoring"],
    "timezone": "browser",
    "panels": [
      {
        "id": 1,
        "title": "Backup Success Rate",
        "type": "stat",
        "targets": [
          {
            "expr": "mysql_backup_success",
            "legendFormat": "Success",
            "refId": "A"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "color": {
              "mode": "thresholds"
            },
            "thresholds": {
              "steps": [
                {"color": "red", "value": 0},
                {"color": "green", "value": 1}
              ]
            }
          }
        },
        "gridPos": {"h": 8, "w": 12, "x": 0, "y": 0}
      },
      {
        "id": 2,
        "title": "Backup Duration",
        "type": "graph",
        "targets": [
          {
            "expr": "mysql_backup_duration_seconds",
            "legendFormat": "Duration (seconds)",
            "refId": "A"
          }
        ],
        "yAxes": [
          {
            "label": "Seconds",
            "min": 0
          }
        ],
        "gridPos": {"h": 8, "w": 12, "x": 12, "y": 0}
      },
      {
        "id": 3,
        "title": "Database Backup Status",
        "type": "table",
        "targets": [
          {
            "expr": "mysql_backup_databases_total",
            "legendFormat": "Total DBs",
            "refId": "A"
          },
          {
            "expr": "mysql_backup_databases_success",
            "legendFormat": "Successful DBs",
            "refId": "B"
          }
        ],
        "gridPos": {"h": 8, "w": 24, "x": 0, "y": 8}
      },
      {
        "id": 4,
        "title": "Backup Size Trend",
        "type": "graph",
        "targets": [
          {
            "expr": "mysql_backup_total_size_bytes",
            "legendFormat": "Total Size (bytes)",
            "refId": "A"
          }
        ],
        "yAxes": [
          {
            "label": "Bytes",
            "min": 0,
            "logBase": 2
          }
        ],
        "gridPos": {"h": 8, "w": 12, "x": 0, "y": 16}
      },
      {
        "id": 5,
        "title": "MySQL Performance",
        "type": "graph",
        "targets": [
          {
            "expr": "mysql_global_status_threads_connected",
            "legendFormat": "Connected Threads",
            "refId": "A"
          },
          {
            "expr": "rate(mysql_global_status_slow_queries[5m])",
            "legendFormat": "Slow Queries/sec",
            "refId": "B"
          }
        ],
        "gridPos": {"h": 8, "w": 12, "x": 12, "y": 16}
      }
    ],
    "time": {
      "from": "now-24h",
      "to": "now"
    },
    "refresh": "30s"
  }
}
curl -X POST \
  http://admin:admin@localhost:3000/api/dashboards/db \
  -H 'Content-Type: application/json' \
  -d @/tmp/mysql-backup-dashboard.json

Verify your setup

Test the complete monitoring pipeline and verify all components are working.

# Check systemd timer status
sudo systemctl status mysql-backup.timer

Run backup manually to test

sudo systemctl start mysql-backup.service sudo systemctl status mysql-backup.service

Verify backup files were created

ls -la /var/backups/mysql/daily/

Check metrics are being generated

cat /var/lib/prometheus/mysql-backup-metrics/mysql_backup.prom

Verify exporters are running

curl http://127.0.0.1:9104/metrics | grep mysql_ curl http://127.0.0.1:9100/metrics | grep mysql_backup

Check Prometheus targets

curl http://localhost:9090/api/v1/targets

Test alert rules

curl http://localhost:9090/api/v1/rules

View recent backup logs

tail -20 /var/backups/mysql/logs/backup-*.log
Note: If you don't see backup metrics immediately, wait for the next scheduled backup or run the service manually. The MySQL exporter should show metrics immediately after starting.

Common issues

Symptom Cause Fix
Backup script fails with permission denied Incorrect file ownership or permissions sudo chown mysql:mysql /usr/local/bin/mysql-backup.sh && sudo chmod +x /usr/local/bin/mysql-backup.sh
MySQL exporter shows no metrics Credentials file not readable sudo chmod 600 /etc/mysql/backup.cnf && sudo chown mysql:mysql /etc/mysql/backup.cnf
Backup timer not triggering Timer not enabled or systemd not reloaded sudo systemctl daemon-reload && sudo systemctl enable mysql-backup.timer
Grafana dashboard shows no data Prometheus not scraping backup metrics Check Prometheus config and restart: sudo systemctl restart prometheus
Alerts not firing Alert rules not loaded or Alertmanager not configured curl http://localhost:9090/api/v1/rules and check rule syntax
Backup files are empty MySQL user lacks required privileges Grant additional privileges: GRANT SELECT, SHOW DATABASES, LOCK TABLES, SHOW VIEW, EVENT, TRIGGER ON . TO 'backup_user'@'localhost';
Node exporter metrics missing Textfile directory not configured Verify --collector.textfile.directory parameter and directory permissions

Next steps

Running this in production?

Want this handled for you? Setting this up once is straightforward. Keeping it patched, monitored, backed up and performant across environments is the harder part. See how we run infrastructure like this for European teams.

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle high availability infrastructure for businesses that depend on uptime. From initial setup to ongoing operations.