Set up a highly available MariaDB Galera cluster with multi-master replication and automatic failover for production database workloads. This tutorial covers cluster initialization, node configuration, and monitoring setup across multiple servers.
Prerequisites
- At least 3 servers with 4GB RAM each
- Network connectivity between cluster nodes
- Root or sudo access on all nodes
What this solves
MariaDB Galera cluster provides synchronous multi-master replication for database high availability. This eliminates single points of failure by allowing read and write operations on any cluster node with automatic failover when nodes become unavailable.
Step-by-step installation
Prepare cluster nodes
You'll need at least 3 nodes for a production cluster. Update all systems and configure hostnames for cluster communication.
sudo apt update && sudo apt upgrade -y
sudo hostnamectl set-hostname galera-node1
Configure /etc/hosts on all nodes with cluster member IP addresses:
203.0.113.10 galera-node1
203.0.113.11 galera-node2
203.0.113.12 galera-node3
Install MariaDB Galera packages
Install MariaDB server and Galera cluster components on all nodes. The galera package provides the wsrep provider for synchronous replication.
sudo apt install -y mariadb-server galera-4 mariadb-client mariadb-backup rsync
Stop MariaDB services
Stop MariaDB on all nodes before configuring the cluster. This prevents individual node startup during cluster configuration.
sudo systemctl stop mariadb
sudo systemctl disable mariadb
Configure Galera cluster settings
Create the Galera configuration file on all nodes. This defines cluster membership, replication settings, and wsrep provider options.
[galera]
wsrep_on = ON
wsrep_provider = /usr/lib/galera/libgalera_smm.so
wsrep_cluster_name = "galera_cluster"
wsrep_cluster_address = "gcomm://203.0.113.10,203.0.113.11,203.0.113.12"
wsrep_sst_method = rsync
wsrep_node_address = "203.0.113.10"
wsrep_node_name = "galera-node1"
binlog_format = row
default_storage_engine = InnoDB
innodb_autoinc_lock_mode = 2
innodb_doublewrite = 1
query_cache_size = 0
query_cache_type = 0
[mysqld]
bind-address = 0.0.0.0
wsrep_node_address and wsrep_node_name for each node to match their respective IP addresses and hostnames.Configure cluster authentication
Create a cluster user for state snapshot transfers (SST). This user enables nodes to synchronize data during startup and recovery.
[galera]
wsrep_sst_auth = sst_user:secure_sst_password
Initialize the bootstrap node
Start the first node in bootstrap mode to create the initial cluster. This establishes the primary component for other nodes to join.
sudo galera_new_cluster
Verify the bootstrap node started successfully:
sudo systemctl status mariadb
Secure MariaDB installation
Run the security script on the bootstrap node to set root password and remove test databases.
sudo mysql_secure_installation
Create cluster SST user
Connect to MariaDB and create the state snapshot transfer user referenced in your Galera configuration.
sudo mysql -u root -p
CREATE USER 'sst_user'@'localhost' IDENTIFIED BY 'secure_sst_password';
GRANT PROCESS, RELOAD, LOCK TABLES, REPLICATION CLIENT ON . TO 'sst_user'@'localhost';
FLUSH PRIVILEGES;
EXIT;
Join additional nodes
Start MariaDB on the remaining nodes. They will automatically connect to the cluster and synchronize data from the bootstrap node.
sudo systemctl start mariadb
sudo systemctl enable mariadb
Enable automatic startup
Enable MariaDB service on the bootstrap node after successful cluster formation.
sudo systemctl enable mariadb
Configure cluster monitoring
Install cluster status monitoring
Set up cluster health monitoring to track node status and detect split-brain scenarios. This is crucial for production cluster management.
sudo mysql -u root -p -e "SHOW STATUS LIKE 'wsrep%'"
Create monitoring user
Create a dedicated user for monitoring tools and health checks with minimal required privileges.
CREATE USER 'monitoring'@'%' IDENTIFIED BY 'monitor_password';
GRANT PROCESS, REPLICATION CLIENT ON . TO 'monitoring'@'%';
FLUSH PRIVILEGES;
Configure cluster recovery
Set up automatic cluster recovery by configuring the safe_to_bootstrap flag. This prevents split-brain conditions during cluster restarts.
[Unit]
After=network.target
[Service]
ExecStartPre=/bin/sh -c "systemctl unset-environment _WSREP_START_POSITION"
ExecStartPre=/bin/sh -c "[ ! -e /var/lib/mysql/grastate.dat ] || systemctl set-environment _WSREP_START_POSITION=--wsrep_start_position=$(sudo -u mysql grep -E '^seqno:' /var/lib/mysql/grastate.dat | cut -d: -f2)"
ExecStart=
ExecStart=/usr/sbin/mysqld $MYSQLD_OPTS $_WSREP_NEW_CLUSTER $_WSREP_START_POSITION
KillMode=process
sudo systemctl daemon-reload
Verify your setup
Check cluster status and node synchronization across all members:
# Check cluster size and status
sudo mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size';"
sudo mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_local_state_comment';"
Test write operations on different nodes
sudo mysql -u root -p -e "CREATE DATABASE test_cluster; USE test_cluster; CREATE TABLE test (id INT PRIMARY KEY, data VARCHAR(100)); INSERT INTO test VALUES (1, 'Node 1 write');"
Verify replication on other nodes
sudo mysql -u root -p -e "USE test_cluster; SELECT * FROM test;"
Implement automatic failover
Configure application connection pooling
Set up connection pooling with automatic failover using ProxySQL or HAProxy. This ensures applications seamlessly handle node failures. For detailed ProxySQL configuration, see our MySQL connection pooling guide.
global
maxconn 4096
log stdout local0
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660
stats timeout 30s
user haproxy
group haproxy
daemon
defaults
mode tcp
timeout connect 5000ms
timeout client 50000ms
timeout server 50000ms
option tcplog
listen galera_cluster
bind *:3306
mode tcp
option tcpka
option mysql-check user monitoring password monitor_password
balance roundrobin
server galera-node1 203.0.113.10:3306 check weight 1
server galera-node2 203.0.113.11:3306 check weight 1
server galera-node3 203.0.113.12:3306 check weight 1
listen stats
bind *:8404
stats enable
stats uri /stats
stats refresh 30s
Test failover scenarios
Simulate node failures to verify automatic failover functionality and application resilience.
# Stop one node to test failover
sudo systemctl stop mariadb
Check cluster status on remaining nodes
sudo mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size';"
Restart failed node
sudo systemctl start mariadb
Verify node rejoined cluster
sudo mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_local_state_comment';"
Common issues
| Symptom | Cause | Fix |
|---|---|---|
| Node fails to join cluster | Incorrect cluster address or firewall blocking ports 4567-4568 | sudo ufw allow 4567:4568/tcp and verify wsrep_cluster_address |
| Split-brain condition | Network partition or simultaneous node restart | Check grastate.dat safe_to_bootstrap flag, bootstrap from most recent node |
| SST failure during node startup | Insufficient disk space or incorrect SST user permissions | Verify disk space and recreate SST user with correct privileges |
| High replication lag | Network latency or heavy write workload | Monitor wsrep_local_recv_queue and consider increasing wsrep_slave_threads |
| Cluster startup fails after shutdown | All nodes shutdown gracefully, need bootstrap node | Find node with highest seqno in grastate.dat and start with galera_new_cluster |
Next steps
- Monitor MariaDB Galera cluster with Prometheus and Grafana
- Implement MariaDB connection pooling with ProxySQL
- Configure MariaDB backup automation with systemd timers
- Setup MariaDB SSL encryption and security hardening
Running this in production?
Automated install script
Run this to automate the entire setup
#!/usr/bin/env bash
set -euo pipefail
# MariaDB Galera Cluster Installation Script
# Usage: ./install_galera.sh <node_number> <node1_ip> <node2_ip> <node3_ip>
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color
# Print colored output
print_status() {
echo -e "${GREEN}[INFO]${NC} $1"
}
print_warning() {
echo -e "${YELLOW}[WARNING]${NC} $1"
}
print_error() {
echo -e "${RED}[ERROR]${NC} $1"
}
# Usage message
usage() {
echo "Usage: $0 <node_number> <node1_ip> <node2_ip> <node3_ip>"
echo "Example: $0 1 192.168.1.10 192.168.1.11 192.168.1.12"
echo "node_number: 1, 2, or 3 (which node this is)"
exit 1
}
# Cleanup function
cleanup() {
print_error "Installation failed. Cleaning up..."
systemctl stop mariadb 2>/dev/null || true
systemctl disable mariadb 2>/dev/null || true
if [ "$PKG_MGR" = "apt" ]; then
apt remove --purge -y mariadb-server galera-4 2>/dev/null || true
else
$PKG_INSTALL remove -y mariadb-server galera 2>/dev/null || true
fi
rm -f /etc/mysql/mariadb.conf.d/60-galera.cnf /etc/my.cnf.d/galera.cnf 2>/dev/null || true
}
trap cleanup ERR
# Check arguments
if [ $# -ne 4 ]; then
usage
fi
NODE_NUM=$1
NODE1_IP=$2
NODE2_IP=$3
NODE3_IP=$4
# Validate node number
if [[ ! "$NODE_NUM" =~ ^[1-3]$ ]]; then
print_error "Node number must be 1, 2, or 3"
exit 1
fi
# Validate IP addresses
for ip in "$NODE1_IP" "$NODE2_IP" "$NODE3_IP"; do
if ! [[ "$ip" =~ ^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$ ]]; then
print_error "Invalid IP address: $ip"
exit 1
fi
done
# Set current node IP based on node number
case $NODE_NUM in
1) CURRENT_IP=$NODE1_IP; HOSTNAME="galera-node1" ;;
2) CURRENT_IP=$NODE2_IP; HOSTNAME="galera-node2" ;;
3) CURRENT_IP=$NODE3_IP; HOSTNAME="galera-node3" ;;
esac
# Check if running as root
if [ "$EUID" -ne 0 ]; then
print_error "This script must be run as root"
exit 1
fi
echo "[1/10] Detecting distribution and package manager..."
if [ -f /etc/os-release ]; then
. /etc/os-release
case "$ID" in
ubuntu|debian)
PKG_MGR="apt"
PKG_INSTALL="apt install -y"
PKG_UPDATE="apt update && apt upgrade -y"
GALERA_CONF="/etc/mysql/mariadb.conf.d/60-galera.cnf"
GALERA_PKG="galera-4"
;;
almalinux|rocky|centos|rhel|ol|fedora)
PKG_MGR="dnf"
PKG_INSTALL="dnf install -y"
PKG_UPDATE="dnf update -y"
GALERA_CONF="/etc/my.cnf.d/galera.cnf"
GALERA_PKG="galera"
;;
amzn)
PKG_MGR="yum"
PKG_INSTALL="yum install -y"
PKG_UPDATE="yum update -y"
GALERA_CONF="/etc/my.cnf.d/galera.cnf"
GALERA_PKG="galera"
;;
*)
print_error "Unsupported distribution: $ID"
exit 1
;;
esac
else
print_error "Cannot detect distribution"
exit 1
fi
print_status "Detected: $PRETTY_NAME using $PKG_MGR"
echo "[2/10] Updating system packages..."
$PKG_UPDATE
echo "[3/10] Setting hostname..."
hostnamectl set-hostname "$HOSTNAME"
echo "[4/10] Configuring /etc/hosts..."
# Remove existing galera entries
sed -i '/galera-node/d' /etc/hosts
# Add cluster nodes
cat >> /etc/hosts << EOF
$NODE1_IP galera-node1
$NODE2_IP galera-node2
$NODE3_IP galera-node3
EOF
echo "[5/10] Installing MariaDB and Galera packages..."
if [ "$PKG_MGR" = "apt" ]; then
$PKG_INSTALL mariadb-server $GALERA_PKG mariadb-client mariadb-backup rsync
else
$PKG_INSTALL mariadb-server $GALERA_PKG mariadb rsync
fi
echo "[6/10] Configuring firewall..."
if command -v firewall-cmd &> /dev/null; then
# RHEL-based firewall
systemctl enable --now firewalld
firewall-cmd --permanent --add-port=3306/tcp # MariaDB
firewall-cmd --permanent --add-port=4567/tcp # Galera cluster replication
firewall-cmd --permanent --add-port=4568/tcp # IST
firewall-cmd --permanent --add-port=4444/tcp # SST
firewall-cmd --reload
elif command -v ufw &> /dev/null; then
# Ubuntu/Debian firewall
ufw allow 3306/tcp
ufw allow 4567/tcp
ufw allow 4568/tcp
ufw allow 4444/tcp
fi
echo "[7/10] Stopping MariaDB service..."
systemctl stop mariadb 2>/dev/null || true
systemctl disable mariadb
echo "[8/10] Creating Galera configuration..."
# Create directory if it doesn't exist
mkdir -p "$(dirname "$GALERA_CONF")"
# Determine wsrep provider path
if [ -f /usr/lib64/galera-4/libgalera_smm.so ]; then
WSREP_PROVIDER="/usr/lib64/galera-4/libgalera_smm.so"
elif [ -f /usr/lib/galera/libgalera_smm.so ]; then
WSREP_PROVIDER="/usr/lib/galera/libgalera_smm.so"
elif [ -f /usr/lib64/galera/libgalera_smm.so ]; then
WSREP_PROVIDER="/usr/lib64/galera/libgalera_smm.so"
else
print_error "Galera wsrep provider not found"
exit 1
fi
cat > "$GALERA_CONF" << EOF
[galera]
wsrep_on = ON
wsrep_provider = $WSREP_PROVIDER
wsrep_cluster_name = "galera_cluster"
wsrep_cluster_address = "gcomm://$NODE1_IP,$NODE2_IP,$NODE3_IP"
wsrep_sst_method = rsync
wsrep_node_address = "$CURRENT_IP"
wsrep_node_name = "$HOSTNAME"
wsrep_sst_auth = sst_user:$(openssl rand -base64 12)
binlog_format = row
default_storage_engine = InnoDB
innodb_autoinc_lock_mode = 2
innodb_doublewrite = 1
query_cache_size = 0
query_cache_type = 0
[mysqld]
bind-address = 0.0.0.0
EOF
chmod 644 "$GALERA_CONF"
echo "[9/10] Node-specific setup..."
if [ "$NODE_NUM" = "1" ]; then
print_status "Initializing bootstrap node..."
galera_new_cluster
# Wait for MariaDB to start
sleep 5
print_status "Securing MariaDB installation..."
# Generate random root password
ROOT_PASS=$(openssl rand -base64 16)
mysql -u root << EOF
UPDATE mysql.user SET Password=PASSWORD('$ROOT_PASS') WHERE User='root';
DELETE FROM mysql.user WHERE User='';
DELETE FROM mysql.user WHERE User='root' AND Host NOT IN ('localhost', '127.0.0.1', '::1');
DROP DATABASE IF EXISTS test;
DELETE FROM mysql.db WHERE Db='test' OR Db='test\\_%';
CREATE USER 'sst_user'@'localhost' IDENTIFIED BY '$(grep wsrep_sst_auth $GALERA_CONF | cut -d: -f2)';
GRANT PROCESS, RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'sst_user'@'localhost';
CREATE USER 'monitoring'@'%' IDENTIFIED BY '$(openssl rand -base64 12)';
GRANT PROCESS, REPLICATION CLIENT ON *.* TO 'monitoring'@'%';
FLUSH PRIVILEGES;
EOF
echo "MariaDB root password: $ROOT_PASS" > /root/.mariadb_root_password
chmod 600 /root/.mariadb_root_password
systemctl enable mariadb
print_status "Bootstrap node configured. Root password saved to /root/.mariadb_root_password"
else
print_warning "This is node $NODE_NUM. Start MariaDB after the bootstrap node (node 1) is running:"
print_warning "systemctl start mariadb && systemctl enable mariadb"
fi
echo "[10/10] Verifying installation..."
if systemctl is-active --quiet mariadb; then
print_status "MariaDB is running"
mysql -u root -p$(cat /root/.mariadb_root_password 2>/dev/null || echo '') -e "SHOW STATUS LIKE 'wsrep_cluster_size';" 2>/dev/null || true
else
print_status "MariaDB is configured but not running (normal for non-bootstrap nodes)"
fi
print_status "Galera cluster node $NODE_NUM installation complete!"
print_status "Configuration file: $GALERA_CONF"
Review the script before running. Execute with: bash install.sh