Configure MariaDB Galera cluster 10.11 for multi-master replication with automatic failover

Advanced 45 min May 05, 2026 96 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Set up a highly available MariaDB Galera cluster with multi-master replication, automatic failover, and synchronous data replication across multiple database nodes for enterprise applications.

Prerequisites

  • At least 3 servers with 4GB RAM each
  • Network connectivity between all nodes
  • Root or sudo access on all servers
  • Basic understanding of MySQL/MariaDB administration

What this solves

MariaDB Galera cluster provides multi-master database replication with automatic failover, ensuring your applications remain available even when database nodes fail. This tutorial configures a three-node Galera cluster with synchronous replication, conflict detection, and automatic node recovery for production workloads requiring high availability.

Step-by-step installation

Update system packages

Start by updating all packages to ensure compatibility with MariaDB Galera cluster components.

sudo apt update && sudo apt upgrade -y
sudo dnf update -y

Install MariaDB and Galera packages

Install MariaDB server with Galera clustering support. The galera-4 package provides the wsrep library for cluster replication.

sudo apt install -y mariadb-server mariadb-client galera-4 rsync
sudo dnf install -y mariadb-server mariadb galera rsync

Stop MariaDB service

Stop the default MariaDB service before configuring clustering. We'll start it manually after configuration.

sudo systemctl stop mariadb
sudo systemctl disable mariadb

Configure MariaDB for Galera clustering

Create the Galera cluster configuration file. Replace the IP addresses with your actual node IPs.

[galera]

Mandatory settings

wsrep_on=ON wsrep_provider=/usr/lib/galera/libgalera_smm.so wsrep_cluster_name="galera_cluster" wsrep_cluster_address="gcomm://203.0.113.10,203.0.113.11,203.0.113.12" wsrep_node_address="203.0.113.10" wsrep_node_name="node1" wsrep_sst_method=rsync

Connection and replication settings

bind-address=0.0.0.0 default_storage_engine=InnoDB innodb_autoinc_lock_mode=2 innodb_doublewrite=1 innodb_flush_log_at_trx_commit=0

Cluster optimization

wsrep_slave_threads=4 wsrep_certify_nonPK=1 wsrep_max_ws_rows=131072 wsrep_max_ws_size=1073741824 wsrep_debug=0 wsrep_convert_LOCK_to_trx=0 wsrep_retry_autocommit=1 wsrep_auto_increment_control=1

Performance tuning

innodb_buffer_pool_size=512M innodb_log_file_size=100M innodb_flush_method=O_DIRECT max_connections=1000
Important: Update the wsrep_node_address and wsrep_node_name for each node. Node1 uses 203.0.113.10/node1, Node2 uses 203.0.113.11/node2, Node3 uses 203.0.113.12/node3.

Configure firewall rules

Open required ports for Galera cluster communication: 3306 (MySQL), 4567 (cluster replication), 4568 (incremental state transfer), 4444 (state snapshot transfer).

sudo ufw allow 3306/tcp
sudo ufw allow 4567/tcp
sudo ufw allow 4568/tcp
sudo ufw allow 4444/tcp
sudo ufw reload
sudo firewall-cmd --permanent --add-port=3306/tcp
sudo firewall-cmd --permanent --add-port=4567/tcp
sudo firewall-cmd --permanent --add-port=4568/tcp
sudo firewall-cmd --permanent --add-port=4444/tcp
sudo firewall-cmd --reload

Initialize the first cluster node

Bootstrap the Galera cluster on the first node. This creates the initial cluster that other nodes will join.

sudo galera_new_cluster

Secure MariaDB installation

Run the security script to set root password and remove test databases. This only needs to be done on one node as changes replicate automatically.

sudo mysql_secure_installation
Note: Choose a strong root password and answer 'Y' to remove anonymous users, disable remote root login, and remove test database.

Create cluster user for monitoring

Create a dedicated user for cluster status monitoring and application connections.

sudo mysql -u root -p -e "CREATE USER 'clustercheck'@'localhost' IDENTIFIED BY 'clusterpass';"
sudo mysql -u root -p -e "GRANT PROCESS ON . TO 'clustercheck'@'localhost';"
sudo mysql -u root -p -e "CREATE USER 'appuser'@'%' IDENTIFIED BY 'secure_app_password';"
sudo mysql -u root -p -e "GRANT ALL PRIVILEGES ON . TO 'appuser'@'%';"
sudo mysql -u root -p -e "FLUSH PRIVILEGES;"

Start remaining cluster nodes

On nodes 2 and 3, start MariaDB to join the existing cluster. Each node will synchronize with the cluster automatically.

# On node 2 and node 3
sudo systemctl start mariadb
sudo systemctl enable mariadb

Configure automatic cluster recovery

Create a systemd service to handle automatic cluster bootstrap on system restart.

[Unit]
Description=MariaDB Galera Cluster Recovery
After=network.target
Before=mariadb.service

[Service]
Type=oneshot
User=mysql
Group=mysql
ExecStart=/usr/bin/galera_recovery
RemainAfterExit=yes
TimeoutSec=300

[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable galera-recovery

Configure cluster monitoring script

Create a monitoring script to check cluster health and automatically handle split-brain scenarios.

#!/bin/bash

Galera cluster monitoring and recovery script

LOG_FILE="/var/log/galera-monitor.log" MYSQL_USER="clustercheck" MYSQL_PASS="clusterpass" log_message() { echo "$(date): $1" >> $LOG_FILE } check_cluster_status() { local cluster_size=$(mysql -u$MYSQL_USER -p$MYSQL_PASS -e "SHOW STATUS LIKE 'wsrep_cluster_size';" 2>/dev/null | grep wsrep_cluster_size | awk '{print $2}') local local_state=$(mysql -u$MYSQL_USER -p$MYSQL_PASS -e "SHOW STATUS LIKE 'wsrep_local_state';" 2>/dev/null | grep wsrep_local_state | awk '{print $2}') if [ "$cluster_size" -lt 2 ] || [ "$local_state" != "4" ]; then log_message "WARNING: Cluster issue detected. Size: $cluster_size, State: $local_state" return 1 fi log_message "Cluster healthy. Size: $cluster_size, State: $local_state" return 0 } if ! check_cluster_status; then log_message "Attempting cluster recovery..." systemctl restart mariadb fi
sudo chmod +x /usr/local/bin/galera-monitor.sh

Create monitoring cron job

Schedule the monitoring script to run every 5 minutes for proactive cluster health checking.

echo "/5    * /usr/local/bin/galera-monitor.sh" | sudo crontab -

Configure automatic failover with ProxySQL

Install ProxySQL for connection routing

ProxySQL provides intelligent connection routing and automatic failover for applications connecting to the Galera cluster.

wget https://github.com/sysown/proxysql/releases/download/v2.5.5/proxysql_2.5.5-ubuntu22_amd64.deb
sudo dpkg -i proxysql_2.5.5-ubuntu22_amd64.deb
wget https://github.com/sysown/proxysql/releases/download/v2.5.5/proxysql-2.5.5-1-centos8.x86_64.rpm
sudo rpm -i proxysql-2.5.5-1-centos8.x86_64.rpm

Configure ProxySQL for Galera

Configure ProxySQL to monitor Galera nodes and route connections to healthy servers.

mysql -u admin -padmin -h 127.0.0.1 -P6032 --prompt='ProxySQL> '
INSERT INTO mysql_galera_hostgroups(writer_hostgroup,backup_writer_hostgroup,reader_hostgroup,offline_hostgroup,active,max_writers,writer_is_also_reader,max_transactions_behind) VALUES (0,1,2,3,1,1,2,100);

INSERT INTO mysql_servers(hostgroup_id,hostname,port,weight,comment) VALUES
(0,'203.0.113.10',3306,1000,'node1'),
(0,'203.0.113.11',3306,900,'node2'),
(0,'203.0.113.12',3306,800,'node3');

INSERT INTO mysql_users(username,password,active,default_hostgroup,max_connections,comment) VALUES
('appuser','secure_app_password',1,0,200,'Application user');

LOAD MYSQL SERVERS TO RUNTIME;
LOAD MYSQL USERS TO RUNTIME;
SAVE MYSQL SERVERS TO DISK;
SAVE MYSQL USERS TO DISK;

Verify your setup

Test cluster synchronization and failover capabilities across all nodes.

# Check cluster status on each node
sudo mysql -u root -p -e "SHOW STATUS LIKE 'wsrep%';"

Verify cluster size

sudo mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size';"

Test replication by creating a database

sudo mysql -u root -p -e "CREATE DATABASE test_replication;" sudo mysql -u root -p -e "SHOW DATABASES;"

Check ProxySQL status

mysql -u admin -padmin -h 127.0.0.1 -P6032 -e "SELECT hostgroup_id,hostname,port,status,weight FROM stats_mysql_servers;"

Test application connection through ProxySQL

mysql -u appuser -psecure_app_password -h 127.0.0.1 -P6033 -e "SELECT @@hostname;"

Monitor cluster health

sudo tail -f /var/log/galera-monitor.log

Testing failover scenarios

Validate automatic failover by simulating node failures and network partitions.

Test node failure recovery

Simulate a node failure and verify automatic recovery.

# Stop MariaDB on one node
sudo systemctl stop mariadb

Check cluster adapts (run on remaining nodes)

sudo mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size';"

Restart failed node

sudo systemctl start mariadb

Verify node rejoins cluster

sudo mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_ready';" sudo mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size';"

Test split-brain protection

Verify the cluster handles network partitions correctly by testing quorum behavior.

# Check current primary component
sudo mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_status';"

Verify read-only protection during split-brain

sudo mysql -u root -p -e "SHOW VARIABLES LIKE 'read_only';"

Test application failover through ProxySQL

mysql -u admin -padmin -h 127.0.0.1 -P6032 -e "SELECT hostgroup_id,hostname,status FROM stats_mysql_servers ORDER BY hostgroup_id,hostname;"

Performance optimization

Fine-tune Galera cluster settings for production workloads and high throughput applications.

Optimize replication performance

Adjust Galera-specific parameters for better write performance and reduced certification conflicts.

[galera]

Write-set replication optimization

wsrep_provider_options="gcache.size=1G;gcache.keep_pages_size=1G;gcs.fc_limit=128;gcs.fc_factor=0.8"

Parallel applying

wsrep_slave_threads=8 innodb_flush_log_at_trx_commit=2

Flow control tuning

wsrep_provider_options="gcs.fc_debug=1;gcs.fc_master_slave=YES"

Memory allocation

innodb_buffer_pool_size=2G innodb_buffer_pool_instances=4 innodb_log_buffer_size=64M

Connection optimization

max_connections=2000 thread_cache_size=100 table_open_cache=4000
sudo systemctl restart mariadb

Monitoring and alerting

Set up comprehensive monitoring for cluster health, performance metrics, and automated alerting. You can integrate with existing monitoring solutions like Prometheus and Grafana for MariaDB Galera monitoring.

Configure cluster health alerts

Create email alerts for critical cluster events like node failures or split-brain scenarios.

#!/bin/bash

SMTP_SERVER="smtp.example.com"
ALERT_EMAIL="admin@example.com"
FROM_EMAIL="galera@example.com"

send_alert() {
    local subject="$1"
    local message="$2"
    
    echo "Subject: $subject
From: $FROM_EMAIL
To: $ALERT_EMAIL

$message" | \
    sendmail -S $SMTP_SERVER $ALERT_EMAIL
}

check_cluster_critical() {
    local cluster_size=$(mysql -u clustercheck -pclusterpass -e "SHOW STATUS LIKE 'wsrep_cluster_size';" 2>/dev/null | grep wsrep_cluster_size | awk '{print $2}')
    
    if [ "$cluster_size" -lt 2 ]; then
        send_alert "CRITICAL: Galera Cluster Size Alert" "Cluster size is $cluster_size. Immediate attention required."
    fi
}

check_cluster_critical
sudo chmod +x /usr/local/bin/galera-alert.sh

Backup and recovery procedures

Implement automated backup strategies specific to Galera cluster environments with point-in-time recovery capabilities.

Set up cluster-aware backups

Create backup scripts that work safely with Galera cluster replication.

#!/bin/bash

BACKUP_DIR="/backup/galera"
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="galera_backup_$DATE.sql"

Create backup directory

mkdir -p $BACKUP_DIR

Use mariabackup for consistent cluster backup

mariabackup --backup --target-dir=$BACKUP_DIR/mariabackup_$DATE \ --user=root --password=your_root_password

Create SQL dump as secondary backup

mysqldump --single-transaction --routines --triggers \ --all-databases --user=root --password=your_root_password \ > $BACKUP_DIR/$BACKUP_FILE

Compress backups

tar -czf $BACKUP_DIR/mariabackup_$DATE.tar.gz -C $BACKUP_DIR mariabackup_$DATE gzip $BACKUP_DIR/$BACKUP_FILE

Clean up uncompressed backup

rm -rf $BACKUP_DIR/mariabackup_$DATE echo "Backup completed: $BACKUP_DIR/mariabackup_$DATE.tar.gz"
sudo chmod +x /usr/local/bin/galera-backup.sh
echo "0 2   * /usr/local/bin/galera-backup.sh" | sudo crontab -

Common issues

SymptomCauseFix
Node fails to join cluster Firewall blocking ports or wrong cluster address Check firewall rules and verify wsrep_cluster_address in config
Split-brain scenario Network partition with no clear quorum Bootstrap primary component: SET GLOBAL wsrep_provider_options='pc.bootstrap=true';
High certification failure rate Write conflicts between nodes Review application logic and implement proper transaction isolation
Slow cluster performance Network latency or inadequate hardware Tune wsrep_provider_options and check network between nodes
Node stuck in 'Joining' state SST transfer issues or disk space Check disk space and SST method configuration

Next steps

Running this in production?

Need this managed? Running this at scale adds a second layer of work: capacity planning, failover drills, cost control, and on-call expertise. See how we run infrastructure like this for European teams.

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle high availability infrastructure for businesses that depend on uptime. From initial setup to ongoing operations.