Configure MariaDB Galera cluster for multi-master replication with automatic failover

Advanced 45 min Jun 03, 2026 114 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Set up a highly available MariaDB Galera cluster with multi-master replication and automatic failover for production database workloads. This tutorial covers cluster initialization, node configuration, and monitoring setup across multiple servers.

Prerequisites

  • At least 3 servers with 4GB RAM each
  • Network connectivity between cluster nodes
  • Root or sudo access on all nodes

What this solves

MariaDB Galera cluster provides synchronous multi-master replication for database high availability. This eliminates single points of failure by allowing read and write operations on any cluster node with automatic failover when nodes become unavailable.

Step-by-step installation

Prepare cluster nodes

You'll need at least 3 nodes for a production cluster. Update all systems and configure hostnames for cluster communication.

sudo apt update && sudo apt upgrade -y
sudo hostnamectl set-hostname galera-node1
sudo dnf update -y
sudo hostnamectl set-hostname galera-node1

Configure /etc/hosts on all nodes with cluster member IP addresses:

203.0.113.10  galera-node1
203.0.113.11  galera-node2
203.0.113.12  galera-node3

Install MariaDB Galera packages

Install MariaDB server and Galera cluster components on all nodes. The galera package provides the wsrep provider for synchronous replication.

sudo apt install -y mariadb-server galera-4 mariadb-client mariadb-backup rsync
sudo dnf install -y mariadb-server galera mariadb rsync

Stop MariaDB services

Stop MariaDB on all nodes before configuring the cluster. This prevents individual node startup during cluster configuration.

sudo systemctl stop mariadb
sudo systemctl disable mariadb

Configure Galera cluster settings

Create the Galera configuration file on all nodes. This defines cluster membership, replication settings, and wsrep provider options.

[galera]
wsrep_on = ON
wsrep_provider = /usr/lib/galera/libgalera_smm.so
wsrep_cluster_name = "galera_cluster"
wsrep_cluster_address = "gcomm://203.0.113.10,203.0.113.11,203.0.113.12"
wsrep_sst_method = rsync
wsrep_node_address = "203.0.113.10"
wsrep_node_name = "galera-node1"

binlog_format = row
default_storage_engine = InnoDB
innodb_autoinc_lock_mode = 2
innodb_doublewrite = 1
query_cache_size = 0
query_cache_type = 0

[mysqld]
bind-address = 0.0.0.0
Important: Change wsrep_node_address and wsrep_node_name for each node to match their respective IP addresses and hostnames.

Configure cluster authentication

Create a cluster user for state snapshot transfers (SST). This user enables nodes to synchronize data during startup and recovery.

[galera]
wsrep_sst_auth = sst_user:secure_sst_password

Initialize the bootstrap node

Start the first node in bootstrap mode to create the initial cluster. This establishes the primary component for other nodes to join.

sudo galera_new_cluster

Verify the bootstrap node started successfully:

sudo systemctl status mariadb

Secure MariaDB installation

Run the security script on the bootstrap node to set root password and remove test databases.

sudo mysql_secure_installation

Create cluster SST user

Connect to MariaDB and create the state snapshot transfer user referenced in your Galera configuration.

sudo mysql -u root -p
CREATE USER 'sst_user'@'localhost' IDENTIFIED BY 'secure_sst_password';
GRANT PROCESS, RELOAD, LOCK TABLES, REPLICATION CLIENT ON . TO 'sst_user'@'localhost';
FLUSH PRIVILEGES;
EXIT;

Join additional nodes

Start MariaDB on the remaining nodes. They will automatically connect to the cluster and synchronize data from the bootstrap node.

sudo systemctl start mariadb
sudo systemctl enable mariadb

Enable automatic startup

Enable MariaDB service on the bootstrap node after successful cluster formation.

sudo systemctl enable mariadb

Configure cluster monitoring

Install cluster status monitoring

Set up cluster health monitoring to track node status and detect split-brain scenarios. This is crucial for production cluster management.

sudo mysql -u root -p -e "SHOW STATUS LIKE 'wsrep%'"

Create monitoring user

Create a dedicated user for monitoring tools and health checks with minimal required privileges.

CREATE USER 'monitoring'@'%' IDENTIFIED BY 'monitor_password';
GRANT PROCESS, REPLICATION CLIENT ON . TO 'monitoring'@'%';
FLUSH PRIVILEGES;

Configure cluster recovery

Set up automatic cluster recovery by configuring the safe_to_bootstrap flag. This prevents split-brain conditions during cluster restarts.

[Unit]
After=network.target

[Service]
ExecStartPre=/bin/sh -c "systemctl unset-environment _WSREP_START_POSITION"
ExecStartPre=/bin/sh -c "[ ! -e /var/lib/mysql/grastate.dat ] || systemctl set-environment _WSREP_START_POSITION=--wsrep_start_position=$(sudo -u mysql grep -E '^seqno:' /var/lib/mysql/grastate.dat | cut -d: -f2)"
ExecStart=
ExecStart=/usr/sbin/mysqld $MYSQLD_OPTS $_WSREP_NEW_CLUSTER $_WSREP_START_POSITION
KillMode=process
sudo systemctl daemon-reload

Verify your setup

Check cluster status and node synchronization across all members:

# Check cluster size and status
sudo mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size';"
sudo mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_local_state_comment';"

Test write operations on different nodes

sudo mysql -u root -p -e "CREATE DATABASE test_cluster; USE test_cluster; CREATE TABLE test (id INT PRIMARY KEY, data VARCHAR(100)); INSERT INTO test VALUES (1, 'Node 1 write');"

Verify replication on other nodes

sudo mysql -u root -p -e "USE test_cluster; SELECT * FROM test;"

Implement automatic failover

Configure application connection pooling

Set up connection pooling with automatic failover using ProxySQL or HAProxy. This ensures applications seamlessly handle node failures. For detailed ProxySQL configuration, see our MySQL connection pooling guide.

global
    maxconn 4096
    log stdout local0
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660
    stats timeout 30s
    user haproxy
    group haproxy
    daemon

defaults
    mode tcp
    timeout connect 5000ms
    timeout client 50000ms
    timeout server 50000ms
    option tcplog

listen galera_cluster
    bind *:3306
    mode tcp
    option tcpka
    option mysql-check user monitoring password monitor_password
    balance roundrobin
    server galera-node1 203.0.113.10:3306 check weight 1
    server galera-node2 203.0.113.11:3306 check weight 1
    server galera-node3 203.0.113.12:3306 check weight 1

listen stats
    bind *:8404
    stats enable
    stats uri /stats
    stats refresh 30s

Test failover scenarios

Simulate node failures to verify automatic failover functionality and application resilience.

# Stop one node to test failover
sudo systemctl stop mariadb

Check cluster status on remaining nodes

sudo mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size';"

Restart failed node

sudo systemctl start mariadb

Verify node rejoined cluster

sudo mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_local_state_comment';"

Common issues

Symptom Cause Fix
Node fails to join cluster Incorrect cluster address or firewall blocking ports 4567-4568 sudo ufw allow 4567:4568/tcp and verify wsrep_cluster_address
Split-brain condition Network partition or simultaneous node restart Check grastate.dat safe_to_bootstrap flag, bootstrap from most recent node
SST failure during node startup Insufficient disk space or incorrect SST user permissions Verify disk space and recreate SST user with correct privileges
High replication lag Network latency or heavy write workload Monitor wsrep_local_recv_queue and consider increasing wsrep_slave_threads
Cluster startup fails after shutdown All nodes shutdown gracefully, need bootstrap node Find node with highest seqno in grastate.dat and start with galera_new_cluster

Next steps

Running this in production?

Want this handled for you? Running this at scale adds a second layer of work: capacity planning, failover drills, cost control, and on-call. See how we run infrastructure like this for European teams.

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle high availability infrastructure for businesses that depend on uptime. From initial setup to ongoing operations.