Setup S3-compatible disaster recovery with cross-region replication using MinIO

Advanced 45 min Jun 11, 2026 87 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Configure MinIO clusters across multiple regions with automated cross-region bucket replication, SSL encryption, and comprehensive monitoring for enterprise-grade disaster recovery.

Prerequisites

  • At least 4 servers (2 per region)
  • Valid SSL certificates or ability to generate them
  • Network connectivity between regions
  • Root or sudo access

What this solves

Enterprise applications need reliable disaster recovery with geographically distributed storage that can survive regional outages. MinIO provides S3-compatible object storage that you can deploy across multiple data centers with automated replication. This tutorial shows you how to build a production-ready disaster recovery system using MinIO clusters in different regions with SSL encryption, automated failover testing, and comprehensive monitoring through Prometheus and Grafana.

Step-by-step configuration

Install MinIO server on both regions

Download and install MinIO on your primary and disaster recovery servers. We'll set up a minimum of 4 nodes per region for high availability.

wget https://dl.min.io/server/minio/release/linux-amd64/minio
sudo chmod +x minio
sudo mv minio /usr/local/bin/
sudo useradd -r minio-user
sudo mkdir -p /etc/minio /opt/minio/data{1..4}
sudo chown -R minio-user:minio-user /opt/minio /etc/minio
wget https://dl.min.io/server/minio/release/linux-amd64/minio
sudo chmod +x minio
sudo mv minio /usr/local/bin/
sudo useradd -r minio-user
sudo mkdir -p /etc/minio /opt/minio/data{1..4}
sudo chown -R minio-user:minio-user /opt/minio /etc/minio

Create SSL certificates for secure communication

Generate SSL certificates for encrypted communication between MinIO clusters. Replace example.com with your actual domain.

sudo mkdir -p /etc/minio/certs
sudo openssl req -x509 -nodes -days 365 -newkey rsa:4096 \
  -keyout /etc/minio/certs/private.key \
  -out /etc/minio/certs/public.crt \
  -subj "/C=US/ST=State/L=City/O=Organization/CN=minio1.example.com" \
  -addext "subjectAltName=DNS:minio1.example.com,DNS:minio2.example.com,DNS:minio3.example.com,DNS:minio4.example.com"
sudo chown -R minio-user:minio-user /etc/minio/certs
sudo chmod 600 /etc/minio/certs/private.key
sudo chmod 644 /etc/minio/certs/public.crt

Configure MinIO environment for primary region

Set up the primary MinIO cluster configuration with strong credentials and distributed storage.

MINIO_ROOT_USER=minioadmin
MINIO_ROOT_PASSWORD=SuperSecureMinIOPassword123!
MINIO_VOLUMES="https://minio1.example.com:9000/opt/minio/data1 https://minio2.example.com:9000/opt/minio/data2 https://minio3.example.com:9000/opt/minio/data3 https://minio4.example.com:9000/opt/minio/data4"
MINIO_OPTS="--console-address :9001 --certs-dir /etc/minio/certs"
MINIO_PROMETHEUS_AUTH_TYPE=public
sudo chown minio-user:minio-user /etc/minio/minio.conf
sudo chmod 640 /etc/minio/minio.conf

Create systemd service for MinIO primary cluster

Configure MinIO to run as a systemd service with automatic restart on failure.

[Unit]
Description=MinIO Object Storage Server
Documentation=https://docs.min.io
Wants=network-online.target
After=network-online.target
AssertFileIsExecutable=/usr/local/bin/minio

[Service]
WorkingDirectory=/usr/local/
User=minio-user
Group=minio-user
ProtectProc=invisible
EnvironmentFile=/etc/minio/minio.conf
ExecStartPre=/bin/bash -c "if [ -z \"${MINIO_VOLUMES}\" ]; then echo \"Variable MINIO_VOLUMES not set in /etc/minio/minio.conf\"; exit 1; fi"
ExecStart=/usr/local/bin/minio server $MINIO_OPTS $MINIO_VOLUMES
Restart=always
LimitNOFILE=65536
TimeoutStopSec=infinity
SendSIGKILL=no

[Install]
WantedBy=multi-user.target

Configure MinIO disaster recovery region

Set up the secondary MinIO cluster in your disaster recovery region with different endpoints.

MINIO_ROOT_USER=minioadmin
MINIO_ROOT_PASSWORD=SuperSecureMinIOPassword123!
MINIO_VOLUMES="https://minio-dr1.example.com:9000/opt/minio/data1 https://minio-dr2.example.com:9000/opt/minio/data2 https://minio-dr3.example.com:9000/opt/minio/data3 https://minio-dr4.example.com:9000/opt/minio/data4"
MINIO_OPTS="--console-address :9001 --certs-dir /etc/minio/certs"
MINIO_PROMETHEUS_AUTH_TYPE=public

Start MinIO services on both regions

Enable and start MinIO on all nodes in both the primary and disaster recovery regions.

sudo systemctl daemon-reload
sudo systemctl enable minio
sudo systemctl start minio
sudo systemctl status minio

Install and configure MinIO client

Install the MinIO client to manage both clusters and configure replication.

wget https://dl.min.io/client/mc/release/linux-amd64/mc
sudo chmod +x mc
sudo mv mc /usr/local/bin/
mc alias set primary https://minio1.example.com:9000 minioadmin SuperSecureMinIOPassword123!
mc alias set disaster-recovery https://minio-dr1.example.com:9000 minioadmin SuperSecureMinIOPassword123!

Create buckets and configure cross-region replication

Set up buckets on both clusters and configure automated replication from primary to disaster recovery.

mc mb primary/production-data
mc mb primary/backups
mc mb disaster-recovery/production-data
mc mb disaster-recovery/backups

mc replicate add primary/production-data --remote-bucket disaster-recovery/production-data
mc replicate add primary/backups --remote-bucket disaster-recovery/backups

Configure bucket versioning and lifecycle policies

Enable versioning for data protection and set up lifecycle policies for automatic cleanup.

mc version enable primary/production-data
mc version enable primary/backups
mc version enable disaster-recovery/production-data
mc version enable disaster-recovery/backups
{
  "Rules": [
    {
      "ID": "DeleteOldVersions",
      "Status": "Enabled",
      "NoncurrentVersionExpiration": {
        "NoncurrentDays": 30
      }
    },
    {
      "ID": "AbortIncompleteMultipartUploads",
      "Status": "Enabled",
      "AbortIncompleteMultipartUpload": {
        "DaysAfterInitiation": 7
      }
    }
  ]
}
mc ilm import primary/production-data < /tmp/lifecycle-policy.json
mc ilm import disaster-recovery/production-data < /tmp/lifecycle-policy.json

Set up Prometheus monitoring for MinIO clusters

Configure Prometheus to monitor both MinIO clusters with custom metrics collection.

wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
tar xvfz prometheus-*.tar.gz
sudo mv prometheus-2.45.0.linux-amd64/prometheus /usr/local/bin/
sudo mv prometheus-2.45.0.linux-amd64/promtool /usr/local/bin/
sudo useradd -r prometheus
sudo mkdir -p /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus
wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
tar xvfz prometheus-*.tar.gz
sudo mv prometheus-2.45.0.linux-amd64/prometheus /usr/local/bin/
sudo mv prometheus-2.45.0.linux-amd64/promtool /usr/local/bin/
sudo useradd -r prometheus
sudo mkdir -p /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus

Configure Prometheus for MinIO monitoring

Set up Prometheus configuration to scrape metrics from both MinIO clusters.

global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "minio_alerts.yml"

scrape_configs:
  - job_name: 'minio-primary'
    metrics_path: /minio/v2/metrics/cluster
    scheme: https
    tls_config:
      insecure_skip_verify: true
    static_configs:
      - targets: ['minio1.example.com:9000']
    relabel_configs:
      - source_labels: [__address__]
        target_label: cluster
        replacement: 'primary'

  - job_name: 'minio-disaster-recovery'
    metrics_path: /minio/v2/metrics/cluster
    scheme: https
    tls_config:
      insecure_skip_verify: true
    static_configs:
      - targets: ['minio-dr1.example.com:9000']
    relabel_configs:
      - source_labels: [__address__]
        target_label: cluster
        replacement: 'disaster-recovery'
sudo chown prometheus:prometheus /etc/prometheus/prometheus.yml

Create MinIO alerting rules

Configure Prometheus alerting rules to monitor replication health and cluster status.

groups:
  • name: minio_alerts
rules: - alert: MinIOClusterDown expr: up{job=~"minio.*"} == 0 for: 2m labels: severity: critical annotations: summary: "MinIO cluster {{ $labels.cluster }} is down" description: "MinIO cluster {{ $labels.cluster }} has been down for more than 2 minutes." - alert: MinIOReplicationFailure expr: increase(minio_replication_failed_bytes_total[5m]) > 0 for: 1m labels: severity: warning annotations: summary: "MinIO replication failure detected" description: "MinIO replication has failed for cluster {{ $labels.cluster }}." - alert: MinIOHighDiskUsage expr: (minio_cluster_disk_total_bytes - minio_cluster_disk_free_bytes) / minio_cluster_disk_total_bytes * 100 > 80 for: 5m labels: severity: warning annotations: summary: "MinIO cluster disk usage is high" description: "MinIO cluster {{ $labels.cluster }} disk usage is above 80%." - alert: MinIOReplicationLag expr: time() - minio_replication_last_activity_time > 3600 for: 5m labels: severity: warning annotations: summary: "MinIO replication lag detected" description: "MinIO replication for cluster {{ $labels.cluster }} has been inactive for over 1 hour."
sudo chown prometheus:prometheus /etc/prometheus/minio_alerts.yml

Create Prometheus systemd service

Configure Prometheus to run as a system service with automatic restart capabilities.

[Unit]
Description=Prometheus Monitoring System
Documentation=https://prometheus.io/docs/
After=network.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --storage.tsdb.path=/var/lib/prometheus/ \
  --web.console.templates=/etc/prometheus/consoles \
  --web.console.libraries=/etc/prometheus/console_libraries \
  --web.listen-address=0.0.0.0:9090 \
  --web.enable-lifecycle
ExecReload=/bin/kill -HUP $MAINPID
Restart=always

[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable prometheus
sudo systemctl start prometheus

Install and configure Grafana for visualization

Set up Grafana to create dashboards for monitoring MinIO cluster health and replication status.

wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt update
sudo apt install -y grafana
sudo dnf install -y https://dl.grafana.com/oss/release/grafana-10.2.0-1.x86_64.rpm
sudo systemctl enable grafana-server
sudo systemctl start grafana-server

Create disaster recovery testing automation

Set up automated scripts to test disaster recovery procedures and validate replication integrity.

#!/bin/bash

MinIO Disaster Recovery Test Script

set -e TEST_BUCKET="production-data" TEST_FILE="dr-test-$(date +%Y%m%d-%H%M%S).txt" TEST_CONTENT="Disaster recovery test at $(date)" MAX_WAIT=300 echo "Starting disaster recovery test..."

Create test file in primary cluster

echo "$TEST_CONTENT" | mc pipe primary/$TEST_BUCKET/$TEST_FILE echo "Test file uploaded to primary cluster: $TEST_FILE"

Wait for replication to DR cluster

echo "Waiting for replication to disaster recovery cluster..." start_time=$(date +%s) while true; do if mc stat disaster-recovery/$TEST_BUCKET/$TEST_FILE >/dev/null 2>&1; then echo "File replicated successfully to DR cluster" break fi current_time=$(date +%s) elapsed=$((current_time - start_time)) if [ $elapsed -gt $MAX_WAIT ]; then echo "ERROR: Replication timeout after $MAX_WAIT seconds" exit 1 fi sleep 5 done

Verify file integrity

PRIMARY_HASH=$(mc cat primary/$TEST_BUCKET/$TEST_FILE | sha256sum | cut -d' ' -f1) DR_HASH=$(mc cat disaster-recovery/$TEST_BUCKET/$TEST_FILE | sha256sum | cut -d' ' -f1) if [ "$PRIMARY_HASH" = "$DR_HASH" ]; then echo "SUCCESS: File integrity verified across clusters" else echo "ERROR: File integrity check failed" exit 1 fi

Cleanup test files

mc rm primary/$TEST_BUCKET/$TEST_FILE mc rm disaster-recovery/$TEST_BUCKET/$TEST_FILE echo "Disaster recovery test completed successfully" echo "Replication time: $elapsed seconds"
sudo chmod +x /opt/minio/dr-test.sh
sudo chown minio-user:minio-user /opt/minio/dr-test.sh

Schedule automated disaster recovery testing

Create a systemd timer to run disaster recovery tests automatically every 6 hours.

[Unit]
Description=MinIO Disaster Recovery Test
After=minio.service

[Service]
Type=oneshot
User=minio-user
Group=minio-user
WorkingDirectory=/opt/minio
ExecStart=/opt/minio/dr-test.sh
StandardOutput=journal
StandardError=journal
[Unit]
Description=Run MinIO Disaster Recovery Test
Requires=minio-dr-test.service

[Timer]
OnCalendar=--* 00,06,12,18:00:00
Persistent=true

[Install]
WantedBy=timers.target
sudo systemctl daemon-reload
sudo systemctl enable minio-dr-test.timer
sudo systemctl start minio-dr-test.timer

Configure MinIO access policies for applications

Create specific access policies for applications to use the MinIO clusters securely.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::production-data",
        "arn:aws:s3:::production-data/*",
        "arn:aws:s3:::backups",
        "arn:aws:s3:::backups/*"
      ]
    }
  ]
}
mc admin policy create primary app-access-policy /tmp/app-policy.json
mc admin policy create disaster-recovery app-access-policy /tmp/app-policy.json

mc admin user add primary app-user SecureAppPassword123!
mc admin user add disaster-recovery app-user SecureAppPassword123!

mc admin policy attach primary app-access-policy --user app-user
mc admin policy attach disaster-recovery app-access-policy --user app-user

Set up automated failover procedures

Create scripts for automated failover to the disaster recovery cluster during outages.

#!/bin/bash

MinIO Failover Script

set -e PRIMARY_ENDPOINT="https://minio1.example.com:9000" DR_ENDPOINT="https://minio-dr1.example.com:9000" HEALTH_CHECK_TIMEOUT=10 check_cluster_health() { local endpoint=$1 local cluster_name=$2 if timeout $HEALTH_CHECK_TIMEOUT mc admin info "$cluster_name" >/dev/null 2>&1; then return 0 else return 1 fi } echo "Checking primary cluster health..." if check_cluster_health "$PRIMARY_ENDPOINT" "primary"; then echo "Primary cluster is healthy - no failover needed" exit 0 fi echo "Primary cluster unhealthy - initiating failover to DR cluster"

Check DR cluster health

if ! check_cluster_health "$DR_ENDPOINT" "disaster-recovery"; then echo "ERROR: Disaster recovery cluster is also unhealthy!" exit 1 fi

Update application configuration to use DR cluster

echo "Updating application endpoints to DR cluster..."

This would typically update load balancer configuration

or application configuration files

Log failover event

echo "$(date): Failover completed - applications now using DR cluster" >> /var/log/minio-failover.log

Send notification (configure with your notification system)

echo "Failover to disaster recovery cluster completed at $(date)" | \ mail -s "MinIO Failover Alert" admin@example.com 2>/dev/null || true echo "Failover procedure completed successfully"
sudo chmod +x /opt/minio/failover.sh
sudo chown minio-user:minio-user /opt/minio/failover.sh
sudo touch /var/log/minio-failover.log
sudo chown minio-user:minio-user /var/log/minio-failover.log

Verify your setup

Test your disaster recovery configuration to ensure everything works correctly.

# Check MinIO cluster status
mc admin info primary
mc admin info disaster-recovery

Verify replication configuration

mc replicate ls primary/production-data

Test replication with a sample file

echo "Test data" | mc pipe primary/production-data/test.txt sleep 30 mc cat disaster-recovery/production-data/test.txt

Check Prometheus metrics

curl -s https://minio1.example.com:9000/minio/v2/metrics/cluster | grep minio_replication

Run disaster recovery test

sudo -u minio-user /opt/minio/dr-test.sh

Check systemd timer status

sudo systemctl status minio-dr-test.timer

Common issues

SymptomCauseFix
Replication not workingNetwork connectivity or SSL certificate issuesCheck mc admin trace primary and verify SSL certificates
High replication lagNetwork bandwidth limitations or disk I/O bottlenecksMonitor bandwidth with iftop and disk I/O with iotop
Prometheus not scraping metricsFirewall blocking port 9000 or SSL certificate validationCheck firewall rules and use insecure_skip_verify: true in Prometheus config
DR test script failingIncorrect MinIO client aliases or permission issuesVerify aliases with mc config host ls and check script permissions
Cluster nodes not joiningTime synchronization issues or incorrect MINIO_VOLUMES configurationSync time with chrony and verify all endpoints are reachable

Next steps

Running this in production?

Want this handled for you? Running this at scale adds a second layer of work: capacity planning, failover drills, cost control, and on-call. Our managed platform covers monitoring, backups and 24/7 response by default.

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle managed devops services for businesses that depend on uptime. From initial setup to ongoing operations.