Set up Jaeger multi-datacenter replication for disaster recovery and high availability

Advanced 90 min Apr 11, 2026 22 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Configure Jaeger distributed tracing with multi-datacenter replication for high availability and disaster recovery. Learn to set up primary and secondary datacenters with automated failover and cross-region data synchronization.

Prerequisites

  • At least 16GB RAM per datacenter
  • Network connectivity between datacenters
  • Docker and Docker Compose installed
  • Basic understanding of Elasticsearch and distributed systems

What this solves

Jaeger multi-datacenter replication provides high availability and disaster recovery for your distributed tracing infrastructure. This setup ensures that trace data remains available even if an entire datacenter fails, preventing loss of critical observability data during outages or disasters.

Step-by-step configuration

Update system packages

Start by updating your package manager on all servers in both datacenters.

sudo apt update && sudo apt upgrade -y
sudo apt install -y wget curl gnupg2 software-properties-common
sudo dnf update -y
sudo dnf install -y wget curl gnupg2 epel-release

Install Docker and Docker Compose

Docker is required for running Jaeger components with consistent configuration across datacenters.

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin
sudo usermod -aG docker $USER
sudo dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
sudo dnf install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin
sudo systemctl enable --now docker
sudo usermod -aG docker $USER

Configure Elasticsearch cluster for primary datacenter

Set up a three-node Elasticsearch cluster in the primary datacenter to store trace data with high availability.

version: '3.8'
services:
  elasticsearch-1:
    image: elasticsearch:8.11.0
    container_name: elasticsearch-1
    environment:
      - node.name=elasticsearch-1
      - cluster.name=jaeger-primary
      - cluster.initial_master_nodes=elasticsearch-1,elasticsearch-2,elasticsearch-3
      - discovery.seed_hosts=elasticsearch-2:9300,elasticsearch-3:9300
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms2g -Xmx2g"
      - xpack.security.enabled=true
      - xpack.security.http.ssl.enabled=false
      - xpack.security.transport.ssl.enabled=true
      - xpack.security.transport.ssl.verification_mode=certificate
      - xpack.security.transport.ssl.keystore.path=/usr/share/elasticsearch/config/elastic-certificates.p12
      - xpack.security.transport.ssl.truststore.path=/usr/share/elasticsearch/config/elastic-certificates.p12
      - ELASTIC_PASSWORD=YourSecurePassword123
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - elasticsearch-1-data:/usr/share/elasticsearch/data
      - ./certs:/usr/share/elasticsearch/config
    ports:
      - "9200:9200"
      - "9300:9300"
    networks:
      - jaeger-network

  elasticsearch-2:
    image: elasticsearch:8.11.0
    container_name: elasticsearch-2
    environment:
      - node.name=elasticsearch-2
      - cluster.name=jaeger-primary
      - cluster.initial_master_nodes=elasticsearch-1,elasticsearch-2,elasticsearch-3
      - discovery.seed_hosts=elasticsearch-1:9300,elasticsearch-3:9300
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms2g -Xmx2g"
      - xpack.security.enabled=true
      - xpack.security.http.ssl.enabled=false
      - xpack.security.transport.ssl.enabled=true
      - xpack.security.transport.ssl.verification_mode=certificate
      - xpack.security.transport.ssl.keystore.path=/usr/share/elasticsearch/config/elastic-certificates.p12
      - xpack.security.transport.ssl.truststore.path=/usr/share/elasticsearch/config/elastic-certificates.p12
      - ELASTIC_PASSWORD=YourSecurePassword123
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - elasticsearch-2-data:/usr/share/elasticsearch/data
      - ./certs:/usr/share/elasticsearch/config
    ports:
      - "9201:9200"
      - "9301:9300"
    networks:
      - jaeger-network

  elasticsearch-3:
    image: elasticsearch:8.11.0
    container_name: elasticsearch-3
    environment:
      - node.name=elasticsearch-3
      - cluster.name=jaeger-primary
      - cluster.initial_master_nodes=elasticsearch-1,elasticsearch-2,elasticsearch-3
      - discovery.seed_hosts=elasticsearch-1:9300,elasticsearch-2:9300
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms2g -Xmx2g"
      - xpack.security.enabled=true
      - xpack.security.http.ssl.enabled=false
      - xpack.security.transport.ssl.enabled=true
      - xpack.security.transport.ssl.verification_mode=certificate
      - xpack.security.transport.ssl.keystore.path=/usr/share/elasticsearch/config/elastic-certificates.p12
      - xpack.security.transport.ssl.truststore.path=/usr/share/elasticsearch/config/elastic-certificates.p12
      - ELASTIC_PASSWORD=YourSecurePassword123
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - elasticsearch-3-data:/usr/share/elasticsearch/data
      - ./certs:/usr/share/elasticsearch/config
    ports:
      - "9202:9200"
      - "9302:9300"
    networks:
      - jaeger-network

volumes:
  elasticsearch-1-data:
  elasticsearch-2-data:
  elasticsearch-3-data:

networks:
  jaeger-network:
    driver: bridge

Generate Elasticsearch security certificates

Create SSL certificates for secure communication between Elasticsearch nodes.

mkdir -p /opt/jaeger/primary-dc/certs
cd /opt/jaeger/primary-dc

Generate CA and node certificates

docker run --rm -v $(pwd)/certs:/certs --user $(id -u):$(id -g) \ elasticsearch:8.11.0 \ bin/elasticsearch-certutil ca --out /certs/elastic-ca.p12 --pass "" docker run --rm -v $(pwd)/certs:/certs --user $(id -u):$(id -g) \ elasticsearch:8.11.0 \ bin/elasticsearch-certutil cert --ca /certs/elastic-ca.p12 --ca-pass "" \ --out /certs/elastic-certificates.p12 --pass "" sudo chown -R $USER:$USER /opt/jaeger/primary-dc/certs sudo chmod 755 /opt/jaeger/primary-dc/certs sudo chmod 644 /opt/jaeger/primary-dc/certs/*

Start primary datacenter Elasticsearch cluster

Launch the Elasticsearch cluster in the primary datacenter and verify cluster health.

cd /opt/jaeger/primary-dc
docker compose up -d

Wait for cluster to be ready

sleep 60

Verify cluster health

curl -u elastic:YourSecurePassword123 "http://localhost:9200/_cluster/health?pretty" curl -u elastic:YourSecurePassword123 "http://localhost:9200/_nodes?pretty"

Configure Jaeger components for primary datacenter

Add Jaeger collector, query service, and agent to the primary datacenter configuration.

# Add these services to the existing docker-compose.yml file

  jaeger-collector:
    image: jaegertracing/jaeger-collector:1.52
    container_name: jaeger-collector-primary
    environment:
      - SPAN_STORAGE_TYPE=elasticsearch
      - ES_SERVER_URLS=http://elasticsearch-1:9200,http://elasticsearch-2:9200,http://elasticsearch-3:9200
      - ES_USERNAME=elastic
      - ES_PASSWORD=YourSecurePassword123
      - ES_INDEX_PREFIX=jaeger-primary
      - COLLECTOR_ZIPKIN_HOST_PORT=:9411
      - COLLECTOR_GRPC_TLS=false
      - LOG_LEVEL=info
    ports:
      - "9411:9411"
      - "14250:14250"
      - "14268:14268"
    depends_on:
      - elasticsearch-1
      - elasticsearch-2
      - elasticsearch-3
    networks:
      - jaeger-network

  jaeger-query:
    image: jaegertracing/jaeger-query:1.52
    container_name: jaeger-query-primary
    environment:
      - SPAN_STORAGE_TYPE=elasticsearch
      - ES_SERVER_URLS=http://elasticsearch-1:9200,http://elasticsearch-2:9200,http://elasticsearch-3:9200
      - ES_USERNAME=elastic
      - ES_PASSWORD=YourSecurePassword123
      - ES_INDEX_PREFIX=jaeger-primary
      - QUERY_BASE_PATH=/jaeger
      - LOG_LEVEL=info
    ports:
      - "16686:16686"
    depends_on:
      - elasticsearch-1
      - elasticsearch-2
      - elasticsearch-3
    networks:
      - jaeger-network

  jaeger-agent:
    image: jaegertracing/jaeger-agent:1.52
    container_name: jaeger-agent-primary
    environment:
      - REPORTER_GRPC_HOST_PORT=jaeger-collector:14250
      - LOG_LEVEL=info
    ports:
      - "6831:6831/udp"
      - "6832:6832/udp"
      - "5778:5778"
    depends_on:
      - jaeger-collector
    networks:
      - jaeger-network

Set up secondary datacenter infrastructure

Create the secondary datacenter configuration with cross-cluster replication settings.

mkdir -p /opt/jaeger/secondary-dc/certs
cd /opt/jaeger/secondary-dc

Copy certificates from primary datacenter

scp -r primary-server:/opt/jaeger/primary-dc/certs/* ./certs/
version: '3.8'
services:
  elasticsearch-1:
    image: elasticsearch:8.11.0
    container_name: elasticsearch-secondary-1
    environment:
      - node.name=elasticsearch-secondary-1
      - cluster.name=jaeger-secondary
      - cluster.initial_master_nodes=elasticsearch-secondary-1,elasticsearch-secondary-2,elasticsearch-secondary-3
      - discovery.seed_hosts=elasticsearch-secondary-2:9300,elasticsearch-secondary-3:9300
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms2g -Xmx2g"
      - xpack.security.enabled=true
      - xpack.security.http.ssl.enabled=false
      - xpack.security.transport.ssl.enabled=true
      - xpack.security.transport.ssl.verification_mode=certificate
      - xpack.security.transport.ssl.keystore.path=/usr/share/elasticsearch/config/elastic-certificates.p12
      - xpack.security.transport.ssl.truststore.path=/usr/share/elasticsearch/config/elastic-certificates.p12
      - ELASTIC_PASSWORD=YourSecurePassword123
      - xpack.ccr.enabled=true
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - elasticsearch-secondary-1-data:/usr/share/elasticsearch/data
      - ./certs:/usr/share/elasticsearch/config
    ports:
      - "9200:9200"
      - "9300:9300"
    networks:
      - jaeger-network

  elasticsearch-2:
    image: elasticsearch:8.11.0
    container_name: elasticsearch-secondary-2
    environment:
      - node.name=elasticsearch-secondary-2
      - cluster.name=jaeger-secondary
      - cluster.initial_master_nodes=elasticsearch-secondary-1,elasticsearch-secondary-2,elasticsearch-secondary-3
      - discovery.seed_hosts=elasticsearch-secondary-1:9300,elasticsearch-secondary-3:9300
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms2g -Xmx2g"
      - xpack.security.enabled=true
      - xpack.security.http.ssl.enabled=false
      - xpack.security.transport.ssl.enabled=true
      - xpack.security.transport.ssl.verification_mode=certificate
      - xpack.security.transport.ssl.keystore.path=/usr/share/elasticsearch/config/elastic-certificates.p12
      - xpack.security.transport.ssl.truststore.path=/usr/share/elasticsearch/config/elastic-certificates.p12
      - ELASTIC_PASSWORD=YourSecurePassword123
      - xpack.ccr.enabled=true
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - elasticsearch-secondary-2-data:/usr/share/elasticsearch/data
      - ./certs:/usr/share/elasticsearch/config
    ports:
      - "9201:9200"
      - "9301:9300"
    networks:
      - jaeger-network

  elasticsearch-3:
    image: elasticsearch:8.11.0
    container_name: elasticsearch-secondary-3
    environment:
      - node.name=elasticsearch-secondary-3
      - cluster.name=jaeger-secondary
      - cluster.initial_master_nodes=elasticsearch-secondary-1,elasticsearch-secondary-2,elasticsearch-secondary-3
      - discovery.seed_hosts=elasticsearch-secondary-1:9300,elasticsearch-secondary-2:9300
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms2g -Xmx2g"
      - xpack.security.enabled=true
      - xpack.security.http.ssl.enabled=false
      - xpack.security.transport.ssl.enabled=true
      - xpack.security.transport.ssl.verification_mode=certificate
      - xpack.security.transport.ssl.keystore.path=/usr/share/elasticsearch/config/elastic-certificates.p12
      - xpack.security.transport.ssl.truststore.path=/usr/share/elasticsearch/config/elastic-certificates.p12
      - ELASTIC_PASSWORD=YourSecurePassword123
      - xpack.ccr.enabled=true
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - elasticsearch-secondary-3-data:/usr/share/elasticsearch/data
      - ./certs:/usr/share/elasticsearch/config
    ports:
      - "9202:9200"
      - "9302:9300"
    networks:
      - jaeger-network

  jaeger-collector:
    image: jaegertracing/jaeger-collector:1.52
    container_name: jaeger-collector-secondary
    environment:
      - SPAN_STORAGE_TYPE=elasticsearch
      - ES_SERVER_URLS=http://elasticsearch-1:9200,http://elasticsearch-2:9200,http://elasticsearch-3:9200
      - ES_USERNAME=elastic
      - ES_PASSWORD=YourSecurePassword123
      - ES_INDEX_PREFIX=jaeger-secondary
      - COLLECTOR_ZIPKIN_HOST_PORT=:9411
      - COLLECTOR_GRPC_TLS=false
      - LOG_LEVEL=info
    ports:
      - "9411:9411"
      - "14250:14250"
      - "14268:14268"
    depends_on:
      - elasticsearch-1
      - elasticsearch-2
      - elasticsearch-3
    networks:
      - jaeger-network

  jaeger-query:
    image: jaegertracing/jaeger-query:1.52
    container_name: jaeger-query-secondary
    environment:
      - SPAN_STORAGE_TYPE=elasticsearch
      - ES_SERVER_URLS=http://elasticsearch-1:9200,http://elasticsearch-2:9200,http://elasticsearch-3:9200
      - ES_USERNAME=elastic
      - ES_PASSWORD=YourSecurePassword123
      - ES_INDEX_PREFIX=jaeger-secondary
      - QUERY_BASE_PATH=/jaeger
      - LOG_LEVEL=info
    ports:
      - "16686:16686"
    depends_on:
      - elasticsearch-1
      - elasticsearch-2
      - elasticsearch-3
    networks:
      - jaeger-network

volumes:
  elasticsearch-secondary-1-data:
  elasticsearch-secondary-2-data:
  elasticsearch-secondary-3-data:

networks:
  jaeger-network:
    driver: bridge

Configure cross-cluster replication

Set up Elasticsearch cross-cluster replication to sync data from primary to secondary datacenter.

# Start secondary datacenter cluster
cd /opt/jaeger/secondary-dc
docker compose up -d

Wait for secondary cluster to be ready

sleep 60

Configure remote cluster connection on secondary

curl -X PUT -u elastic:YourSecurePassword123 \ "http://localhost:9200/_cluster/settings" \ -H "Content-Type: application/json" \ -d '{ "persistent": { "cluster": { "remote": { "primary-cluster": { "seeds": ["203.0.113.10:9300", "203.0.113.11:9300", "203.0.113.12:9300"] } } } } }'

Create follower indices for cross-cluster replication

curl -X PUT -u elastic:YourSecurePassword123 \ "http://localhost:9200/jaeger-secondary-span-*/_ccr/follow" \ -H "Content-Type: application/json" \ -d '{ "remote_cluster": "primary-cluster", "leader_index": "jaeger-primary-span-*", "settings": { "index.number_of_replicas": 1 } }' curl -X PUT -u elastic:YourSecurePassword123 \ "http://localhost:9200/jaeger-secondary-service-*/_ccr/follow" \ -H "Content-Type: application/json" \ -d '{ "remote_cluster": "primary-cluster", "leader_index": "jaeger-primary-service-*", "settings": { "index.number_of_replicas": 1 } }'

Set up automated failover with HAProxy

Configure HAProxy to automatically route traffic to the secondary datacenter when the primary fails.

sudo apt install -y haproxy
sudo dnf install -y haproxy
global
    log 127.0.0.1:514 local0
    stats timeout 30s
    daemon
    user haproxy
    group haproxy

defaults
    log global
    mode http
    option httplog
    option dontlognull
    timeout connect 5000
    timeout client 50000
    timeout server 50000
    errorfile 400 /etc/haproxy/errors/400.http
    errorfile 403 /etc/haproxy/errors/403.http
    errorfile 408 /etc/haproxy/errors/408.http
    errorfile 500 /etc/haproxy/errors/500.http
    errorfile 502 /etc/haproxy/errors/502.http
    errorfile 503 /etc/haproxy/errors/503.http
    errorfile 504 /etc/haproxy/errors/504.http

frontend jaeger_ui_frontend
    bind *:80
    bind *:443 ssl crt /etc/ssl/certs/jaeger.pem
    redirect scheme https if !{ ssl_fc }
    default_backend jaeger_ui_backend

backend jaeger_ui_backend
    balance roundrobin
    option httpchk GET /api/services
    server primary-jaeger 203.0.113.10:16686 check inter 10s fall 3 rise 2
    server secondary-jaeger 203.0.113.20:16686 check inter 10s fall 3 rise 2 backup

frontend jaeger_collector_frontend
    bind *:14268
    default_backend jaeger_collector_backend

backend jaeger_collector_backend
    balance roundrobin
    option httpchk GET /
    server primary-collector 203.0.113.10:14268 check inter 10s fall 3 rise 2
    server secondary-collector 203.0.113.20:14268 check inter 10s fall 3 rise 2 backup

frontend jaeger_grpc_frontend
    bind *:14250
    mode tcp
    default_backend jaeger_grpc_backend

backend jaeger_grpc_backend
    mode tcp
    balance roundrobin
    server primary-grpc 203.0.113.10:14250 check inter 10s fall 3 rise 2
    server secondary-grpc 203.0.113.20:14250 check inter 10s fall 3 rise 2 backup

listen stats
    bind *:8404
    stats enable
    stats uri /stats
    stats refresh 30s
    stats admin if TRUE

Create monitoring and alerting scripts

Set up automated monitoring to detect failover events and send alerts.

#!/bin/bash

PRIMARY_DC_URL="http://203.0.113.10:16686"
SECONDARY_DC_URL="http://203.0.113.20:16686"
ALERT_EMAIL="admin@example.com"
LOG_FILE="/var/log/jaeger-failover.log"

check_primary() {
    curl -sf "$PRIMARY_DC_URL/api/services" > /dev/null 2>&1
    return $?
}

check_secondary() {
    curl -sf "$SECONDARY_DC_URL/api/services" > /dev/null 2>&1
    return $?
}

send_alert() {
    local message="$1"
    echo "$(date): $message" >> "$LOG_FILE"
    echo "$message" | mail -s "Jaeger Failover Alert" "$ALERT_EMAIL"
}

if ! check_primary; then
    if check_secondary; then
        send_alert "Primary Jaeger datacenter is down. Failed over to secondary datacenter."
        # Update HAProxy configuration to prefer secondary
        sed -i 's/backup$//' /etc/haproxy/haproxy.cfg
        sudo systemctl reload haproxy
    else
        send_alert "CRITICAL: Both Jaeger datacenters are down!"
    fi
else
    # Primary is healthy, ensure proper configuration
    if ! grep -q "backup$" /etc/haproxy/haproxy.cfg; then
        send_alert "Primary Jaeger datacenter recovered. Failing back from secondary."
        sed -i '/secondary.*server/s/$/ backup/' /etc/haproxy/haproxy.cfg
        sudo systemctl reload haproxy
    fi
fi
sudo chmod 755 /opt/jaeger/monitor-failover.sh
sudo chown root:root /opt/jaeger/monitor-failover.sh

Configure automated failover with systemd timer

Set up a systemd timer to run the failover monitoring script every 30 seconds.

[Unit]
Description=Jaeger Failover Monitor
After=network.target

[Service]
Type=oneshot
User=root
ExecStart=/opt/jaeger/monitor-failover.sh
StandardOutput=journal
StandardError=journal
[Unit]
Description=Run Jaeger Failover Monitor every 30 seconds
Requires=jaeger-failover.service

[Timer]
OnBootSec=30
OnUnitActiveSec=30

[Install]
WantedBy=timers.target
sudo systemctl daemon-reload
sudo systemctl enable --now jaeger-failover.timer
sudo systemctl enable --now haproxy

Configure data retention and archival

Set up Index Lifecycle Management (ILM) policies to manage trace data retention across both datacenters.

# Create ILM policy on primary cluster
curl -X PUT -u elastic:YourSecurePassword123 \
  "http://203.0.113.10:9200/_ilm/policy/jaeger-trace-policy" \
  -H "Content-Type: application/json" \
  -d '{
    "policy": {
      "phases": {
        "hot": {
          "actions": {
            "rollover": {
              "max_size": "5gb",
              "max_age": "1d"
            }
          }
        },
        "warm": {
          "min_age": "7d",
          "actions": {
            "allocate": {
              "number_of_replicas": 1
            }
          }
        },
        "cold": {
          "min_age": "30d",
          "actions": {
            "allocate": {
              "number_of_replicas": 0
            }
          }
        },
        "delete": {
          "min_age": "90d"
        }
      }
    }
  }'

Apply policy to index templates

curl -X PUT -u elastic:YourSecurePassword123 \ "http://203.0.113.10:9200/_index_template/jaeger-spans" \ -H "Content-Type: application/json" \ -d '{ "index_patterns": ["jaeger-primary-span-*"], "template": { "settings": { "index.lifecycle.name": "jaeger-trace-policy", "index.lifecycle.rollover_alias": "jaeger-spans-write", "number_of_shards": 3, "number_of_replicas": 1 } } }'

Verify your setup

# Check primary datacenter cluster health
curl -u elastic:YourSecurePassword123 "http://203.0.113.10:9200/_cluster/health?pretty"

Check secondary datacenter cluster health

curl -u elastic:YourSecurePassword123 "http://203.0.113.20:9200/_cluster/health?pretty"

Verify cross-cluster replication status

curl -u elastic:YourSecurePassword123 "http://203.0.113.20:9200/_ccr/stats?pretty"

Check Jaeger UI accessibility through HAProxy

curl -I http://your-haproxy-server/api/services

Test failover monitoring script

sudo /opt/jaeger/monitor-failover.sh

Check systemd timer status

sudo systemctl status jaeger-failover.timer

View HAProxy stats

curl http://your-haproxy-server:8404/stats

Common issues

SymptomCauseFix
Cross-cluster replication failingNetwork connectivity issues between datacentersCheck firewall rules and network connectivity between Elasticsearch clusters
HAProxy shows backend servers as downHealth check URL incorrect or services not respondingVerify health check URLs and ensure Jaeger services are running
High memory usage in ElasticsearchJVM heap size not optimized for available memoryAdjust ES_JAVA_OPTS to use 50% of available RAM
Trace data not replicatingIndex patterns don't match in CCR configurationEnsure follower index patterns match leader index naming
Failover script not executingPermissions issue or mail not configuredCheck script permissions and install mail utilities
SSL certificate errors between clustersCertificate mismatch or expired certificatesRegenerate certificates and ensure they're copied to all nodes

Next steps

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle managed devops services for businesses that depend on uptime. From initial setup to ongoing operations.