Monitor MariaDB Galera cluster with Prometheus and Grafana for high availability insights

Advanced 45 min Apr 14, 2026 215 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Configure comprehensive monitoring for MariaDB Galera clusters using Prometheus exporters and Grafana dashboards to track cluster health, replication status, and performance metrics with automated alerting for production environments.

Prerequisites

  • MariaDB Galera cluster already configured
  • Root or sudo access
  • Basic knowledge of SQL and system administration
  • Network connectivity between cluster nodes

What this solves

MariaDB Galera clusters require specialized monitoring to ensure high availability and detect split-brain scenarios, node failures, and replication lag before they impact your applications. This tutorial configures Prometheus with MariaDB exporters and Grafana dashboards to provide real-time visibility into cluster health, node synchronization status, and performance metrics with automated alerting for critical conditions.

Step-by-step installation

Update system packages

Start by updating your package manager to ensure you get the latest versions of monitoring tools.

sudo apt update && sudo apt upgrade -y
sudo apt install -y wget curl
sudo dnf update -y
sudo dnf install -y wget curl

Install Prometheus server

Download and install Prometheus to collect metrics from MariaDB Galera cluster nodes.

cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.48.0/prometheus-2.48.0.linux-amd64.tar.gz
tar xzf prometheus-2.48.0.linux-amd64.tar.gz
sudo mv prometheus-2.48.0.linux-amd64 /opt/prometheus
sudo useradd --no-create-home --shell /bin/false prometheus
sudo chown -R prometheus:prometheus /opt/prometheus

Create Prometheus directories and configuration

Set up the directory structure and main configuration file for Prometheus with proper permissions.

sudo mkdir -p /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /etc/prometheus /var/lib/prometheus
sudo ln -s /opt/prometheus/prometheus /usr/local/bin/
sudo ln -s /opt/prometheus/promtool /usr/local/bin/

Configure Prometheus for MariaDB monitoring

Create the main Prometheus configuration file with scraping targets for MariaDB Galera cluster nodes.

global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "galera_alerts.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - localhost:9093

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'mariadb-galera'
    static_configs:
      - targets: 
          - '203.0.113.10:9104'  # MariaDB node 1
          - '203.0.113.11:9104'  # MariaDB node 2
          - '203.0.113.12:9104'  # MariaDB node 3
    scrape_interval: 5s
    metrics_path: /metrics

  - job_name: 'node-exporter'
    static_configs:
      - targets:
          - '203.0.113.10:9100'
          - '203.0.113.11:9100'
          - '203.0.113.12:9100'

Install MariaDB exporter on cluster nodes

Download and configure the MariaDB exporter on each Galera cluster node to expose database metrics.

wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.15.0/mysqld_exporter-0.15.0.linux-amd64.tar.gz
tar xzf mysqld_exporter-0.15.0.linux-amd64.tar.gz
sudo mv mysqld_exporter-0.15.0.linux-amd64/mysqld_exporter /usr/local/bin/
sudo useradd --no-create-home --shell /bin/false mysqld_exporter

Create MariaDB monitoring user

Create a dedicated database user with minimal privileges for metrics collection on each cluster node.

mysql -u root -p
CREATE USER 'exporter'@'localhost' IDENTIFIED BY 'StrongPassword123!';
GRANT PROCESS, REPLICATION CLIENT, SELECT ON . TO 'exporter'@'localhost';
GRANT SELECT ON performance_schema.* TO 'exporter'@'localhost';
FLUSH PRIVILEGES;
EXIT;

Configure MariaDB exporter credentials

Create a secure configuration file for the MariaDB exporter with database connection details.

[client]
user=exporter
password=StrongPassword123!
host=localhost
port=3306
sudo chown mysqld_exporter:mysqld_exporter /etc/mysql/.mysqld_exporter.cnf
sudo chmod 600 /etc/mysql/.mysqld_exporter.cnf

Create systemd service for MariaDB exporter

Configure the MariaDB exporter as a system service with Galera-specific metrics enabled.

[Unit]
Description=MariaDB Exporter for Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=mysqld_exporter
Group=mysqld_exporter
Type=simple
Restart=always
Environment=DATA_SOURCE_NAME="exporter:StrongPassword123!@(localhost:3306)/"
ExecStart=/usr/local/bin/mysqld_exporter \
  --config.my-cnf=/etc/mysql/.mysqld_exporter.cnf \
  --collect.info_schema.innodb_metrics \
  --collect.info_schema.innodb_tablespaces \
  --collect.info_schema.innodb_cmp \
  --collect.info_schema.innodb_cmpmem \
  --collect.info_schema.processlist \
  --collect.info_schema.query_response_time \
  --collect.global_status \
  --collect.global_variables \
  --collect.slave_status \
  --collect.info_schema.tables \
  --web.listen-address=0.0.0.0:9104

[Install]
WantedBy=multi-user.target

Install Node Exporter for system metrics

Install Node Exporter on each cluster node to monitor system-level metrics like CPU, memory, and disk usage.

wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xzf node_exporter-1.7.0.linux-amd64.tar.gz
sudo mv node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/
sudo useradd --no-create-home --shell /bin/false node_exporter

Create Node Exporter systemd service

Configure Node Exporter as a system service to collect hardware and operating system metrics.

[Unit]
Description=Node Exporter for Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
Restart=always
ExecStart=/usr/local/bin/node_exporter \
  --web.listen-address=0.0.0.0:9100 \
  --collector.systemd \
  --collector.processes

[Install]
WantedBy=multi-user.target

Create Prometheus systemd service

Configure Prometheus as a system service with proper resource limits and storage configuration.

[Unit]
Description=Prometheus Server
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
Restart=always
ExecStart=/usr/local/bin/prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --storage.tsdb.path=/var/lib/prometheus/ \
  --web.console.templates=/opt/prometheus/consoles \
  --web.console.libraries=/opt/prometheus/console_libraries \
  --web.listen-address=0.0.0.0:9090 \
  --web.external-url=http://localhost:9090 \
  --storage.tsdb.retention.time=15d

[Install]
WantedBy=multi-user.target

Create Galera-specific alert rules

Configure Prometheus alerting rules to detect Galera cluster issues like node failures and split-brain scenarios.

groups:
  • name: mariadb-galera
rules: - alert: GaleraNodeDown expr: up{job="mariadb-galera"} == 0 for: 30s labels: severity: critical annotations: summary: "MariaDB Galera node is down" description: "MariaDB Galera node {{ $labels.instance }} has been down for more than 30 seconds." - alert: GaleraClusterSizeReduced expr: mysql_global_status_wsrep_cluster_size < 3 for: 1m labels: severity: warning annotations: summary: "Galera cluster size reduced" description: "Galera cluster size is {{ $value }}, expected 3 nodes." - alert: GaleraNodeNotReady expr: mysql_global_status_wsrep_ready == 0 for: 30s labels: severity: critical annotations: summary: "Galera node not ready" description: "Galera node {{ $labels.instance }} is not ready to accept connections." - alert: GaleraReplicationLag expr: mysql_global_status_wsrep_local_recv_queue > 100 for: 2m labels: severity: warning annotations: summary: "High Galera replication lag" description: "Galera node {{ $labels.instance }} has high replication lag: {{ $value }} queued writes." - alert: GaleraFlowControl expr: rate(mysql_global_status_wsrep_flow_control_paused[5m]) > 0.1 for: 2m labels: severity: warning annotations: summary: "Galera flow control activated" description: "Galera flow control is active on {{ $labels.instance }}, indicating performance issues." - alert: MariaDBHighConnections expr: mysql_global_status_threads_connected / mysql_global_variables_max_connections > 0.8 for: 5m labels: severity: warning annotations: summary: "MariaDB high connection usage" description: "MariaDB connection usage is above 80% on {{ $labels.instance }}." - alert: MariaDBSlowQueries expr: rate(mysql_global_status_slow_queries[5m]) > 5 for: 5m labels: severity: warning annotations: summary: "High number of slow queries" description: "MariaDB is experiencing {{ $value }} slow queries per second on {{ $labels.instance }}." - alert: MariaDBHighQPS expr: rate(mysql_global_status_queries[5m]) > 1000 for: 10m labels: severity: warning annotations: summary: "High query rate" description: "MariaDB is processing {{ $value }} queries per second on {{ $labels.instance }}."

Install Grafana

Install Grafana to create dashboards for visualizing MariaDB Galera cluster metrics and performance data.

sudo apt install -y software-properties-common
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt update
sudo apt install -y grafana
sudo tee /etc/yum.repos.d/grafana.repo <

Configure firewall rules

Open the necessary ports for Prometheus, Grafana, and the exporters to communicate across the cluster.

sudo ufw allow 9090/tcp comment 'Prometheus'
sudo ufw allow 3000/tcp comment 'Grafana'
sudo ufw allow 9104/tcp comment 'MariaDB Exporter'
sudo ufw allow 9100/tcp comment 'Node Exporter'
sudo firewall-cmd --permanent --add-port=9090/tcp --add-port=3000/tcp --add-port=9104/tcp --add-port=9100/tcp
sudo firewall-cmd --reload

Start and enable all services

Start all monitoring services and enable them to start automatically on system boot.

sudo systemctl daemon-reload
sudo systemctl enable --now prometheus
sudo systemctl enable --now mysqld_exporter
sudo systemctl enable --now node_exporter
sudo systemctl enable --now grafana-server

Configure Grafana data source

Access Grafana web interface and configure Prometheus as a data source for metrics visualization.

URL: http://localhost:9090
Access: Server (default)
Scrape interval: 15s
Query timeout: 60s
HTTP Method: POST
Note: Access Grafana at http://your-server:3000 with default credentials admin/admin. Change the password on first login.

Import MariaDB Galera dashboard

Create a comprehensive Grafana dashboard to monitor all aspects of your MariaDB Galera cluster performance and health.

{
  "dashboard": {
    "title": "MariaDB Galera Cluster Monitoring",
    "panels": [
      {
        "title": "Cluster Status",
        "type": "stat",
        "targets": [
          {
            "expr": "mysql_global_status_wsrep_cluster_size",
            "legendFormat": "Cluster Size"
          }
        ]
      },
      {
        "title": "Node Status",
        "type": "table",
        "targets": [
          {
            "expr": "mysql_global_status_wsrep_ready",
            "legendFormat": "{{instance}} Ready"
          }
        ]
      },
      {
        "title": "Replication Queue",
        "type": "graph",
        "targets": [
          {
            "expr": "mysql_global_status_wsrep_local_recv_queue",
            "legendFormat": "{{instance}} Receive Queue"
          }
        ]
      },
      {
        "title": "Query Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(mysql_global_status_queries[5m])",
            "legendFormat": "{{instance}} QPS"
          }
        ]
      }
    ]
  }
}

Verify your setup

Check that all services are running correctly and metrics are being collected from your MariaDB Galera cluster.

sudo systemctl status prometheus mysqld_exporter node_exporter grafana-server
curl http://localhost:9090/targets
curl http://localhost:9104/metrics | grep wsrep
curl http://localhost:9100/metrics | grep node_load1

Verify Prometheus can scrape MariaDB Galera metrics:

# Check Prometheus targets
wget -qO- http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | select(.job=="mariadb-galera") | .health'

Test Galera-specific metrics

wget -qO- http://localhost:9090/api/v1/query?query=mysql_global_status_wsrep_cluster_size

Verify alerting rules

wget -qO- http://localhost:9090/api/v1/rules

Configure alerting with Alertmanager

Install Alertmanager

Install and configure Alertmanager to handle alert notifications from Prometheus for Galera cluster issues.

wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz
tar xzf alertmanager-0.26.0.linux-amd64.tar.gz
sudo mv alertmanager-0.26.0.linux-amd64 /opt/alertmanager
sudo useradd --no-create-home --shell /bin/false alertmanager
sudo chown -R alertmanager:alertmanager /opt/alertmanager
sudo ln -s /opt/alertmanager/alertmanager /usr/local/bin/

Configure Alertmanager for email notifications

Set up Alertmanager to send email notifications when Galera cluster issues are detected.

global:
  smtp_smarthost: 'localhost:587'
  smtp_from: 'alerts@example.com'
  smtp_auth_username: 'alerts@example.com'
  smtp_auth_password: 'EmailPassword123!'

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'galera-alerts'

receivers:
  • name: 'galera-alerts'
email_configs: - to: 'dba@example.com' subject: 'MariaDB Galera Alert: {{ .GroupLabels.alertname }}' body: | {{ range .Alerts }} Alert: {{ .Annotations.summary }} Description: {{ .Annotations.description }} Instance: {{ .Labels.instance }} Severity: {{ .Labels.severity }} {{ end }}

Create Alertmanager systemd service

Configure Alertmanager as a system service to handle alert routing and notifications.

[Unit]
Description=Alertmanager for Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=alertmanager
Group=alertmanager
Type=simple
Restart=always
ExecStart=/usr/local/bin/alertmanager \
  --config.file=/opt/alertmanager/alertmanager.yml \
  --storage.path=/opt/alertmanager/data \
  --web.external-url=http://localhost:9093

[Install]
WantedBy=multi-user.target
sudo mkdir -p /opt/alertmanager/data
sudo chown alertmanager:alertmanager /opt/alertmanager/data
sudo systemctl daemon-reload
sudo systemctl enable --now alertmanager

Common issues

Symptom Cause Fix
MariaDB exporter fails to start Database connection or credentials issue Check /etc/mysql/.mysqld_exporter.cnf and test connection with mysql -u exporter -p
No Galera metrics in Prometheus Exporter not collecting wsrep status variables Verify MariaDB user has REPLICATION CLIENT privilege and wsrep is enabled
Prometheus targets showing as down Firewall blocking scraping or wrong port Check firewall rules and verify exporters are listening on correct ports
Grafana can't connect to Prometheus Network connectivity or Prometheus not running Verify Prometheus is accessible at http://localhost:9090 and check systemctl status
Alerts not firing despite issues Alert rules not loaded or Alertmanager misconfigured Check http://localhost:9090/rules and verify Alertmanager configuration
High memory usage by Prometheus Too many metrics or long retention period Reduce scrape frequency, limit metrics collection, or decrease retention time

Next steps

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle managed devops services for businesses that depend on uptime. From initial setup to ongoing operations.