Implement Consul multi-datacenter replication with WAN federation

Advanced 45 min Jun 09, 2026
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Set up Consul WAN federation to replicate services and configuration across multiple datacenters with ACL token replication, health monitoring, and automatic failover capabilities.

Prerequisites

  • Multiple servers in different locations
  • Root or sudo access
  • Network connectivity between datacenters
  • Basic understanding of Consul architecture

What this solves

Consul WAN federation connects multiple Consul datacenters for service discovery, configuration replication, and cross-datacenter communication. This setup provides geographic redundancy, disaster recovery capabilities, and centralized service mesh management across distributed infrastructure.

Step-by-step configuration

Install Consul on all datacenter nodes

Install Consul on each node that will participate in the WAN federation. We'll use the official HashiCorp repository for consistent versions.

curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt update && sudo apt install -y consul
sudo dnf install -y dnf-plugins-core
sudo dnf config-manager --add-repo https://rpm.releases.hashicorp.com/RHEL/hashicorp.repo
sudo dnf install -y consul

Create Consul user and directories

Set up the required user account and directory structure for Consul to run securely.

sudo useradd --system --home /etc/consul --shell /bin/false consul
sudo mkdir -p /opt/consul /etc/consul.d /var/lib/consul
sudo chown -R consul:consul /opt/consul /etc/consul.d /var/lib/consul
sudo chmod 755 /opt/consul /etc/consul.d /var/lib/consul

Generate encryption keys and certificates

Create the gossip encryption key and generate TLS certificates for secure communication between datacenters.

consul keygen

Save this encryption key for use in all datacenter configurations. Next, generate TLS certificates:

consul tls ca create
consul tls cert create -server -dc dc1
consul tls cert create -server -dc dc2
Note: Replace dc1 and dc2 with your actual datacenter names. Copy the CA certificate to all nodes and the appropriate server certificates to each datacenter.

Configure primary datacenter (DC1)

Create the Consul configuration for the primary datacenter that will be the source of truth for ACL replication.

datacenter = "dc1"
data_dir = "/var/lib/consul"
log_level = "INFO"
node_name = "consul-dc1-01"
bind_addr = "203.0.113.10"
client_addr = "0.0.0.0"

server = true
bootstrap_expect = 3

ui_config {
  enabled = true
}

connect {
  enabled = true
}

encrypt = "your-gossip-encryption-key-here"

tls {
  defaults {
    ca_file = "/etc/consul.d/consul-agent-ca.pem"
    cert_file = "/etc/consul.d/dc1-server-consul-0.pem"
    key_file = "/etc/consul.d/dc1-server-consul-0-key.pem"
    verify_incoming = true
    verify_outgoing = true
  }
  internal_rpc {
    verify_server_hostname = true
  }
}

acl = {
  enabled = true
  default_policy = "deny"
  enable_token_persistence = true
  tokens = {
    initial_management = "your-bootstrap-token-here"
  }
}

retry_join = ["203.0.113.11", "203.0.113.12"]
retry_join_wan = ["203.0.113.20", "203.0.113.21", "203.0.113.22"]

ports {
  grpc = 8502
  grpc_tls = 8503
}

performance {
  raft_multiplier = 1
}

Configure secondary datacenter (DC2)

Configure the secondary datacenter to replicate from the primary and participate in WAN federation.

datacenter = "dc2"
data_dir = "/var/lib/consul"
log_level = "INFO"
node_name = "consul-dc2-01"
bind_addr = "203.0.113.20"
client_addr = "0.0.0.0"

server = true
bootstrap_expect = 3

ui_config {
  enabled = true
}

connect {
  enabled = true
}

encrypt = "your-gossip-encryption-key-here"

tls {
  defaults {
    ca_file = "/etc/consul.d/consul-agent-ca.pem"
    cert_file = "/etc/consul.d/dc2-server-consul-0.pem"
    key_file = "/etc/consul.d/dc2-server-consul-0-key.pem"
    verify_incoming = true
    verify_outgoing = true
  }
  internal_rpc {
    verify_server_hostname = true
  }
}

acl = {
  enabled = true
  default_policy = "deny"
  enable_token_persistence = true
  enable_token_replication = true
  tokens = {
    replication = "your-replication-token-here"
  }
}

primary_datacenter = "dc1"

retry_join = ["203.0.113.21", "203.0.113.22"]
retry_join_wan = ["203.0.113.10", "203.0.113.11", "203.0.113.12"]

ports {
  grpc = 8502
  grpc_tls = 8503
}

performance {
  raft_multiplier = 1
}

Set up systemd service files

Create systemd unit files to manage Consul as a system service with proper resource limits.

[Unit]
Description=Consul
Documentation=https://www.consul.io/
Requires=network-online.target
After=network-online.target
ConditionFileNotEmpty=/etc/consul.d/consul.hcl

[Service]
Type=notify
User=consul
Group=consul
ExecStart=/usr/bin/consul agent -config-dir=/etc/consul.d/
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
LimitNOFILE=65536
TimeoutStopSec=30

[Install]
WantedBy=multi-user.target

Configure firewall rules

Open the required ports for Consul communication between datacenters.

sudo ufw allow 8300/tcp comment "Consul server RPC"
sudo ufw allow 8301/tcp comment "Consul serf LAN"
sudo ufw allow 8301/udp comment "Consul serf LAN"
sudo ufw allow 8302/tcp comment "Consul serf WAN"
sudo ufw allow 8302/udp comment "Consul serf WAN"
sudo ufw allow 8500/tcp comment "Consul HTTP API"
sudo ufw allow 8501/tcp comment "Consul HTTPS API"
sudo ufw allow 8502/tcp comment "Consul gRPC"
sudo ufw allow 8503/tcp comment "Consul gRPC TLS"
sudo ufw reload
sudo firewall-cmd --permanent --add-port=8300/tcp
sudo firewall-cmd --permanent --add-port=8301/tcp
sudo firewall-cmd --permanent --add-port=8301/udp
sudo firewall-cmd --permanent --add-port=8302/tcp
sudo firewall-cmd --permanent --add-port=8302/udp
sudo firewall-cmd --permanent --add-port=8500/tcp
sudo firewall-cmd --permanent --add-port=8501/tcp
sudo firewall-cmd --permanent --add-port=8502/tcp
sudo firewall-cmd --permanent --add-port=8503/tcp
sudo firewall-cmd --reload

Start Consul services

Enable and start Consul on all nodes, starting with the primary datacenter first.

sudo systemctl daemon-reload
sudo systemctl enable consul
sudo systemctl start consul
sudo systemctl status consul

Bootstrap ACL system

Initialize the ACL system on the primary datacenter and create replication tokens.

consul acl bootstrap

Save the bootstrap token and create a replication token for the secondary datacenter:

export CONSUL_HTTP_TOKEN="your-bootstrap-token-here"
consul acl policy create -name "replication" -rules 'acl = "write" operator = "write" service_prefix "" { policy = "read" intentions = "read" } node_prefix "" { policy = "write" } namespace_prefix "" { policy = "read" }'
consul acl token create -description "ACL Token Replication" -policy-name "replication"

Configure ACL token replication

Set up automatic ACL token replication from primary to secondary datacenter.

consul acl replication-token create -description "DC2 Replication Token"

Update the secondary datacenter configuration to include the replication token and restart Consul:

sudo systemctl restart consul

Join datacenters via WAN

Connect the datacenters using WAN federation to enable cross-datacenter service discovery.

consul join -wan 203.0.113.20

Verify the WAN federation status:

consul members -wan

Configure health monitoring

Set up cross-datacenter health checks and monitoring for service failover capabilities.

{
  "checks": [
    {
      "id": "wan-connectivity",
      "name": "WAN Connectivity Check",
      "script": "consul members -wan | grep -q alive",
      "interval": "30s",
      "timeout": "10s"
    },
    {
      "id": "acl-replication",
      "name": "ACL Replication Status",
      "http": "https://localhost:8501/v1/acl/replication?token=your-token-here",
      "tls_skip_verify": false,
      "interval": "60s",
      "timeout": "10s"
    }
  ]
}

Configure automatic failover

Set up prepared queries for automatic service failover between datacenters.

consul prepared-query create -name="web-failover" -service="web" -failover-datacenters="dc2" -token="your-token-here"

Create a sample service registration for testing:

{
  "service": {
    "name": "web",
    "tags": ["v1"],
    "port": 80,
    "check": {
      "http": "http://localhost:80/health",
      "interval": "10s"
    }
  }
}

Verify your setup

Check that WAN federation is working correctly and services are replicating across datacenters:

consul members -wan
consul catalog services
consul acl replication-status
consul operator raft list-peers

Test cross-datacenter service discovery:

dig @127.0.0.1 -p 8600 web.service.dc2.consul
consul catalog services -datacenter=dc2

Monitor health checks and replication status:

consul monitor -log-level=INFO
curl -k https://127.0.0.1:8501/v1/health/state/any

Common issues

SymptomCauseFix
WAN join failsFirewall blocking portsEnsure ports 8302 TCP/UDP are open between datacenters
ACL replication not workingMissing or invalid replication tokenCheck token permissions with consul acl token read -id TOKEN
Service discovery fails across DCDNS configuration incorrectVerify DNS forwarding to port 8600 and check service registration
TLS certificate errorsHostname verification failingEnsure certificates match server hostnames and CA is properly distributed
Raft leader election issuesNetwork partitions or clock driftCheck NTP synchronization and network connectivity between nodes
High memory usageLarge number of services/nodesTune performance.raft_multiplier and enable metrics monitoring

Next steps

Running this in production?

Want this handled for you? Running this at scale adds a second layer of work: capacity planning, failover drills, cost control, and on-call. Our managed platform covers monitoring, backups and 24/7 response by default.

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle private cloud infrastructure for businesses that depend on uptime. From initial setup to ongoing operations.