Set up a highly available MariaDB Galera cluster with multi-master replication, automatic failover, and synchronous data replication across multiple database nodes for enterprise applications.
Prerequisites
- At least 3 servers with 4GB RAM each
- Network connectivity between all nodes
- Root or sudo access on all servers
- Basic understanding of MySQL/MariaDB administration
What this solves
MariaDB Galera cluster provides multi-master database replication with automatic failover, ensuring your applications remain available even when database nodes fail. This tutorial configures a three-node Galera cluster with synchronous replication, conflict detection, and automatic node recovery for production workloads requiring high availability.
Step-by-step installation
Update system packages
Start by updating all packages to ensure compatibility with MariaDB Galera cluster components.
sudo apt update && sudo apt upgrade -y
Install MariaDB and Galera packages
Install MariaDB server with Galera clustering support. The galera-4 package provides the wsrep library for cluster replication.
sudo apt install -y mariadb-server mariadb-client galera-4 rsync
Stop MariaDB service
Stop the default MariaDB service before configuring clustering. We'll start it manually after configuration.
sudo systemctl stop mariadb
sudo systemctl disable mariadb
Configure MariaDB for Galera clustering
Create the Galera cluster configuration file. Replace the IP addresses with your actual node IPs.
[galera]
Mandatory settings
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_name="galera_cluster"
wsrep_cluster_address="gcomm://203.0.113.10,203.0.113.11,203.0.113.12"
wsrep_node_address="203.0.113.10"
wsrep_node_name="node1"
wsrep_sst_method=rsync
Connection and replication settings
bind-address=0.0.0.0
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
innodb_doublewrite=1
innodb_flush_log_at_trx_commit=0
Cluster optimization
wsrep_slave_threads=4
wsrep_certify_nonPK=1
wsrep_max_ws_rows=131072
wsrep_max_ws_size=1073741824
wsrep_debug=0
wsrep_convert_LOCK_to_trx=0
wsrep_retry_autocommit=1
wsrep_auto_increment_control=1
Performance tuning
innodb_buffer_pool_size=512M
innodb_log_file_size=100M
innodb_flush_method=O_DIRECT
max_connections=1000
Configure firewall rules
Open required ports for Galera cluster communication: 3306 (MySQL), 4567 (cluster replication), 4568 (incremental state transfer), 4444 (state snapshot transfer).
sudo ufw allow 3306/tcp
sudo ufw allow 4567/tcp
sudo ufw allow 4568/tcp
sudo ufw allow 4444/tcp
sudo ufw reload
Initialize the first cluster node
Bootstrap the Galera cluster on the first node. This creates the initial cluster that other nodes will join.
sudo galera_new_cluster
Secure MariaDB installation
Run the security script to set root password and remove test databases. This only needs to be done on one node as changes replicate automatically.
sudo mysql_secure_installation
Create cluster user for monitoring
Create a dedicated user for cluster status monitoring and application connections.
sudo mysql -u root -p -e "CREATE USER 'clustercheck'@'localhost' IDENTIFIED BY 'clusterpass';"
sudo mysql -u root -p -e "GRANT PROCESS ON . TO 'clustercheck'@'localhost';"
sudo mysql -u root -p -e "CREATE USER 'appuser'@'%' IDENTIFIED BY 'secure_app_password';"
sudo mysql -u root -p -e "GRANT ALL PRIVILEGES ON . TO 'appuser'@'%';"
sudo mysql -u root -p -e "FLUSH PRIVILEGES;"
Start remaining cluster nodes
On nodes 2 and 3, start MariaDB to join the existing cluster. Each node will synchronize with the cluster automatically.
# On node 2 and node 3
sudo systemctl start mariadb
sudo systemctl enable mariadb
Configure automatic cluster recovery
Create a systemd service to handle automatic cluster bootstrap on system restart.
[Unit]
Description=MariaDB Galera Cluster Recovery
After=network.target
Before=mariadb.service
[Service]
Type=oneshot
User=mysql
Group=mysql
ExecStart=/usr/bin/galera_recovery
RemainAfterExit=yes
TimeoutSec=300
[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable galera-recovery
Configure cluster monitoring script
Create a monitoring script to check cluster health and automatically handle split-brain scenarios.
#!/bin/bash
Galera cluster monitoring and recovery script
LOG_FILE="/var/log/galera-monitor.log"
MYSQL_USER="clustercheck"
MYSQL_PASS="clusterpass"
log_message() {
echo "$(date): $1" >> $LOG_FILE
}
check_cluster_status() {
local cluster_size=$(mysql -u$MYSQL_USER -p$MYSQL_PASS -e "SHOW STATUS LIKE 'wsrep_cluster_size';" 2>/dev/null | grep wsrep_cluster_size | awk '{print $2}')
local local_state=$(mysql -u$MYSQL_USER -p$MYSQL_PASS -e "SHOW STATUS LIKE 'wsrep_local_state';" 2>/dev/null | grep wsrep_local_state | awk '{print $2}')
if [ "$cluster_size" -lt 2 ] || [ "$local_state" != "4" ]; then
log_message "WARNING: Cluster issue detected. Size: $cluster_size, State: $local_state"
return 1
fi
log_message "Cluster healthy. Size: $cluster_size, State: $local_state"
return 0
}
if ! check_cluster_status; then
log_message "Attempting cluster recovery..."
systemctl restart mariadb
fi
sudo chmod +x /usr/local/bin/galera-monitor.sh
Create monitoring cron job
Schedule the monitoring script to run every 5 minutes for proactive cluster health checking.
echo "/5 * /usr/local/bin/galera-monitor.sh" | sudo crontab -
Configure automatic failover with ProxySQL
Install ProxySQL for connection routing
ProxySQL provides intelligent connection routing and automatic failover for applications connecting to the Galera cluster.
wget https://github.com/sysown/proxysql/releases/download/v2.5.5/proxysql_2.5.5-ubuntu22_amd64.deb
sudo dpkg -i proxysql_2.5.5-ubuntu22_amd64.deb
Configure ProxySQL for Galera
Configure ProxySQL to monitor Galera nodes and route connections to healthy servers.
mysql -u admin -padmin -h 127.0.0.1 -P6032 --prompt='ProxySQL> '
INSERT INTO mysql_galera_hostgroups(writer_hostgroup,backup_writer_hostgroup,reader_hostgroup,offline_hostgroup,active,max_writers,writer_is_also_reader,max_transactions_behind) VALUES (0,1,2,3,1,1,2,100);
INSERT INTO mysql_servers(hostgroup_id,hostname,port,weight,comment) VALUES
(0,'203.0.113.10',3306,1000,'node1'),
(0,'203.0.113.11',3306,900,'node2'),
(0,'203.0.113.12',3306,800,'node3');
INSERT INTO mysql_users(username,password,active,default_hostgroup,max_connections,comment) VALUES
('appuser','secure_app_password',1,0,200,'Application user');
LOAD MYSQL SERVERS TO RUNTIME;
LOAD MYSQL USERS TO RUNTIME;
SAVE MYSQL SERVERS TO DISK;
SAVE MYSQL USERS TO DISK;
Verify your setup
Test cluster synchronization and failover capabilities across all nodes.
# Check cluster status on each node
sudo mysql -u root -p -e "SHOW STATUS LIKE 'wsrep%';"
Verify cluster size
sudo mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size';"
Test replication by creating a database
sudo mysql -u root -p -e "CREATE DATABASE test_replication;"
sudo mysql -u root -p -e "SHOW DATABASES;"
Check ProxySQL status
mysql -u admin -padmin -h 127.0.0.1 -P6032 -e "SELECT hostgroup_id,hostname,port,status,weight FROM stats_mysql_servers;"
Test application connection through ProxySQL
mysql -u appuser -psecure_app_password -h 127.0.0.1 -P6033 -e "SELECT @@hostname;"
Monitor cluster health
sudo tail -f /var/log/galera-monitor.log
Testing failover scenarios
Validate automatic failover by simulating node failures and network partitions.
Test node failure recovery
Simulate a node failure and verify automatic recovery.
# Stop MariaDB on one node
sudo systemctl stop mariadb
Check cluster adapts (run on remaining nodes)
sudo mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size';"
Restart failed node
sudo systemctl start mariadb
Verify node rejoins cluster
sudo mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_ready';"
sudo mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size';"
Test split-brain protection
Verify the cluster handles network partitions correctly by testing quorum behavior.
# Check current primary component
sudo mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_status';"
Verify read-only protection during split-brain
sudo mysql -u root -p -e "SHOW VARIABLES LIKE 'read_only';"
Test application failover through ProxySQL
mysql -u admin -padmin -h 127.0.0.1 -P6032 -e "SELECT hostgroup_id,hostname,status FROM stats_mysql_servers ORDER BY hostgroup_id,hostname;"
Performance optimization
Fine-tune Galera cluster settings for production workloads and high throughput applications.
Optimize replication performance
Adjust Galera-specific parameters for better write performance and reduced certification conflicts.
[galera]
Write-set replication optimization
wsrep_provider_options="gcache.size=1G;gcache.keep_pages_size=1G;gcs.fc_limit=128;gcs.fc_factor=0.8"
Parallel applying
wsrep_slave_threads=8
innodb_flush_log_at_trx_commit=2
Flow control tuning
wsrep_provider_options="gcs.fc_debug=1;gcs.fc_master_slave=YES"
Memory allocation
innodb_buffer_pool_size=2G
innodb_buffer_pool_instances=4
innodb_log_buffer_size=64M
Connection optimization
max_connections=2000
thread_cache_size=100
table_open_cache=4000
sudo systemctl restart mariadb
Monitoring and alerting
Set up comprehensive monitoring for cluster health, performance metrics, and automated alerting. You can integrate with existing monitoring solutions like Prometheus and Grafana for MariaDB Galera monitoring.
Configure cluster health alerts
Create email alerts for critical cluster events like node failures or split-brain scenarios.
#!/bin/bash
SMTP_SERVER="smtp.example.com"
ALERT_EMAIL="admin@example.com"
FROM_EMAIL="galera@example.com"
send_alert() {
local subject="$1"
local message="$2"
echo "Subject: $subject
From: $FROM_EMAIL
To: $ALERT_EMAIL
$message" | \
sendmail -S $SMTP_SERVER $ALERT_EMAIL
}
check_cluster_critical() {
local cluster_size=$(mysql -u clustercheck -pclusterpass -e "SHOW STATUS LIKE 'wsrep_cluster_size';" 2>/dev/null | grep wsrep_cluster_size | awk '{print $2}')
if [ "$cluster_size" -lt 2 ]; then
send_alert "CRITICAL: Galera Cluster Size Alert" "Cluster size is $cluster_size. Immediate attention required."
fi
}
check_cluster_critical
sudo chmod +x /usr/local/bin/galera-alert.sh
Backup and recovery procedures
Implement automated backup strategies specific to Galera cluster environments with point-in-time recovery capabilities.
Set up cluster-aware backups
Create backup scripts that work safely with Galera cluster replication.
#!/bin/bash
BACKUP_DIR="/backup/galera"
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="galera_backup_$DATE.sql"
Create backup directory
mkdir -p $BACKUP_DIR
Use mariabackup for consistent cluster backup
mariabackup --backup --target-dir=$BACKUP_DIR/mariabackup_$DATE \
--user=root --password=your_root_password
Create SQL dump as secondary backup
mysqldump --single-transaction --routines --triggers \
--all-databases --user=root --password=your_root_password \
> $BACKUP_DIR/$BACKUP_FILE
Compress backups
tar -czf $BACKUP_DIR/mariabackup_$DATE.tar.gz -C $BACKUP_DIR mariabackup_$DATE
gzip $BACKUP_DIR/$BACKUP_FILE
Clean up uncompressed backup
rm -rf $BACKUP_DIR/mariabackup_$DATE
echo "Backup completed: $BACKUP_DIR/mariabackup_$DATE.tar.gz"
sudo chmod +x /usr/local/bin/galera-backup.sh
echo "0 2 * /usr/local/bin/galera-backup.sh" | sudo crontab -
Common issues
| Symptom | Cause | Fix |
|---|---|---|
| Node fails to join cluster | Firewall blocking ports or wrong cluster address | Check firewall rules and verify wsrep_cluster_address in config |
| Split-brain scenario | Network partition with no clear quorum | Bootstrap primary component: SET GLOBAL wsrep_provider_options='pc.bootstrap=true'; |
| High certification failure rate | Write conflicts between nodes | Review application logic and implement proper transaction isolation |
| Slow cluster performance | Network latency or inadequate hardware | Tune wsrep_provider_options and check network between nodes |
| Node stuck in 'Joining' state | SST transfer issues or disk space | Check disk space and SST method configuration |
Next steps
- Monitor MariaDB Galera cluster with Prometheus and Grafana
- Implement MariaDB connection pooling with ProxySQL
- Configure MariaDB SSL encryption and authentication
- Set up MariaDB Galera cluster backup automation
- Optimize MariaDB Galera performance for high-traffic applications
Running this in production?
Automated install script
Run this to automate the entire setup
#!/usr/bin/env bash
set -euo pipefail
# MariaDB Galera Cluster Installation Script
# Configures a 3-node Galera cluster for high availability
# Color codes
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'
# Cleanup on error
trap cleanup ERR
cleanup() {
echo -e "${RED}[ERROR] Installation failed. Cleaning up...${NC}"
systemctl stop mariadb 2>/dev/null || true
}
usage() {
echo "Usage: $0 NODE_IP NODE_NAME NODE2_IP NODE3_IP"
echo "Example: $0 203.0.113.10 node1 203.0.113.11 203.0.113.12"
exit 1
}
# Check arguments
if [ $# -ne 4 ]; then
usage
fi
NODE_IP="$1"
NODE_NAME="$2"
NODE2_IP="$3"
NODE3_IP="$4"
# Validate IP format
for ip in "$NODE_IP" "$NODE2_IP" "$NODE3_IP"; do
if ! [[ $ip =~ ^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$ ]]; then
echo -e "${RED}Invalid IP address: $ip${NC}"
exit 1
fi
done
# Check root privileges
if [ "$EUID" -ne 0 ]; then
echo -e "${RED}Please run as root or with sudo${NC}"
exit 1
fi
# Detect distribution
if [ -f /etc/os-release ]; then
. /etc/os-release
case "$ID" in
ubuntu|debian)
PKG_MGR="apt"
PKG_UPDATE="apt update && apt upgrade -y"
PKG_INSTALL="apt install -y"
GALERA_LIB="/usr/lib/galera/libgalera_smm.so"
MARIADB_CONF="/etc/mysql/mariadb.conf.d/60-galera.cnf"
FIREWALL_CMD="ufw"
;;
almalinux|rocky|centos|rhel|ol|fedora)
PKG_MGR="dnf"
PKG_UPDATE="dnf update -y"
PKG_INSTALL="dnf install -y"
GALERA_LIB="/usr/lib64/galera-4/libgalera_smm.so"
MARIADB_CONF="/etc/my.cnf.d/galera.cnf"
FIREWALL_CMD="firewall-cmd"
;;
amzn)
PKG_MGR="yum"
PKG_UPDATE="yum update -y"
PKG_INSTALL="yum install -y"
GALERA_LIB="/usr/lib64/galera-4/libgalera_smm.so"
MARIADB_CONF="/etc/my.cnf.d/galera.cnf"
FIREWALL_CMD="firewall-cmd"
;;
*)
echo -e "${RED}Unsupported distribution: $ID${NC}"
exit 1
;;
esac
else
echo -e "${RED}Cannot detect distribution${NC}"
exit 1
fi
echo -e "${GREEN}[1/9] Updating system packages...${NC}"
$PKG_UPDATE
echo -e "${GREEN}[2/9] Installing MariaDB and Galera packages...${NC}"
if [ "$PKG_MGR" = "apt" ]; then
$PKG_INSTALL mariadb-server mariadb-client galera-4 rsync
else
$PKG_INSTALL mariadb-server mariadb galera rsync
fi
echo -e "${GREEN}[3/9] Stopping and disabling MariaDB service...${NC}"
systemctl stop mariadb || true
systemctl disable mariadb
echo -e "${GREEN}[4/9] Configuring Galera cluster settings...${NC}"
mkdir -p $(dirname "$MARIADB_CONF")
cat > "$MARIADB_CONF" << EOF
[galera]
# Mandatory settings
wsrep_on=ON
wsrep_provider=$GALERA_LIB
wsrep_cluster_name="galera_cluster"
wsrep_cluster_address="gcomm://$NODE_IP,$NODE2_IP,$NODE3_IP"
wsrep_node_address="$NODE_IP"
wsrep_node_name="$NODE_NAME"
wsrep_sst_method=rsync
# Connection and replication settings
bind-address=0.0.0.0
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
innodb_doublewrite=1
innodb_flush_log_at_trx_commit=0
# Cluster optimization
wsrep_slave_threads=4
wsrep_certify_nonPK=1
wsrep_max_ws_rows=131072
wsrep_max_ws_size=1073741824
wsrep_debug=0
wsrep_convert_LOCK_to_trx=0
wsrep_retry_autocommit=1
wsrep_auto_increment_control=1
# Performance tuning
innodb_buffer_pool_size=512M
innodb_log_file_size=100M
innodb_flush_method=O_DIRECT
max_connections=1000
EOF
chown mysql:mysql "$MARIADB_CONF"
chmod 644 "$MARIADB_CONF"
echo -e "${GREEN}[5/9] Configuring firewall rules...${NC}"
if [ "$FIREWALL_CMD" = "ufw" ]; then
if command -v ufw >/dev/null 2>&1; then
ufw allow 3306/tcp
ufw allow 4567/tcp
ufw allow 4568/tcp
ufw allow 4444/tcp
ufw --force enable || true
fi
else
if command -v firewall-cmd >/dev/null 2>&1; then
systemctl enable firewalld --now || true
firewall-cmd --permanent --add-port=3306/tcp
firewall-cmd --permanent --add-port=4567/tcp
firewall-cmd --permanent --add-port=4568/tcp
firewall-cmd --permanent --add-port=4444/tcp
firewall-cmd --reload
fi
fi
# Check if this is the first node (bootstrap)
if [ "$NODE_NAME" = "node1" ]; then
echo -e "${GREEN}[6/9] Bootstrapping Galera cluster (first node)...${NC}"
galera_new_cluster
echo -e "${GREEN}[7/9] Waiting for cluster to initialize...${NC}"
sleep 10
echo -e "${GREEN}[8/9] Running security configuration...${NC}"
echo -e "${YELLOW}Setting up root password and securing installation...${NC}"
# Generate random passwords
ROOT_PASS=$(openssl rand -base64 32)
CLUSTER_PASS=$(openssl rand -base64 16)
APP_PASS=$(openssl rand -base64 24)
# Secure the installation
mysql << EOF
UPDATE mysql.user SET Password=PASSWORD('$ROOT_PASS') WHERE User='root';
DELETE FROM mysql.user WHERE User='';
DELETE FROM mysql.user WHERE User='root' AND Host NOT IN ('localhost', '127.0.0.1', '::1');
DROP DATABASE IF EXISTS test;
DELETE FROM mysql.db WHERE Db='test' OR Db='test\\_%';
CREATE USER 'clustercheck'@'localhost' IDENTIFIED BY '$CLUSTER_PASS';
GRANT PROCESS ON *.* TO 'clustercheck'@'localhost';
CREATE USER 'appuser'@'%' IDENTIFIED BY '$APP_PASS';
GRANT ALL PRIVILEGES ON *.* TO 'appuser'@'%';
FLUSH PRIVILEGES;
EOF
# Save passwords securely
cat > /root/.galera_passwords << EOF
ROOT_PASSWORD=$ROOT_PASS
CLUSTER_PASSWORD=$CLUSTER_PASS
APP_PASSWORD=$APP_PASS
EOF
chmod 600 /root/.galera_passwords
echo -e "${YELLOW}Passwords saved to /root/.galera_passwords${NC}"
else
echo -e "${GREEN}[6/9] Starting MariaDB to join cluster...${NC}"
systemctl start mariadb
systemctl enable mariadb
echo -e "${GREEN}[7/9] Waiting for node to join cluster...${NC}"
sleep 15
echo -e "${YELLOW}[8/9] Skipping security setup (run only on first node)${NC}"
fi
echo -e "${GREEN}[9/9] Verifying cluster status...${NC}"
if systemctl is-active --quiet mariadb; then
echo -e "${GREEN}MariaDB service is running${NC}"
# Check cluster status
if mysql -e "SHOW STATUS LIKE 'wsrep_cluster_size';" 2>/dev/null; then
CLUSTER_SIZE=$(mysql -sN -e "SHOW STATUS LIKE 'wsrep_cluster_size';" | awk '{print $2}')
NODE_STATUS=$(mysql -sN -e "SHOW STATUS LIKE 'wsrep_local_state_comment';" | awk '{print $2}')
echo -e "${GREEN}Cluster size: $CLUSTER_SIZE${NC}"
echo -e "${GREEN}Node status: $NODE_STATUS${NC}"
if [ "$NODE_STATUS" = "Synced" ]; then
echo -e "${GREEN}✓ Node successfully joined cluster${NC}"
else
echo -e "${YELLOW}⚠ Node status: $NODE_STATUS (may still be syncing)${NC}"
fi
fi
else
echo -e "${RED}MariaDB service failed to start${NC}"
exit 1
fi
echo -e "${GREEN}MariaDB Galera cluster node setup completed!${NC}"
echo -e "${YELLOW}Next steps:${NC}"
echo "1. Run this script on the other nodes with their respective IPs"
echo "2. Verify cluster status: mysql -e \"SHOW STATUS LIKE 'wsrep_%';\""
echo "3. Test replication by creating a database on one node"
if [ "$NODE_NAME" = "node1" ]; then
echo -e "${YELLOW}Passwords are saved in /root/.galera_passwords${NC}"
fi
Review the script before running. Execute with: bash install.sh