Set up automated Consul snapshots with GPG encryption, systemd timers, and complete disaster recovery procedures. Includes monitoring integration with Prometheus and automated restoration workflows for production environments.
Prerequisites
- Running Consul cluster with ACL enabled
- Root or sudo access
- GPG for encryption
- Basic systemd knowledge
What this solves
Consul stores critical service discovery data, key-value pairs, and configuration that your applications depend on. Without proper backups, a cluster failure can bring down your entire infrastructure. This tutorial shows you how to implement automated Consul snapshots with encryption, monitoring, and disaster recovery procedures to protect against data loss and minimize downtime during failures.
Prerequisites
- Running Consul cluster with ACL enabled
- Root or sudo access to all Consul nodes
- GPG installed for backup encryption
- Basic understanding of systemd services
Step-by-step configuration
Install required packages
Install GPG for encryption, AWS CLI for remote storage, and monitoring tools.
sudo apt update
sudo apt install -y gnupg2 awscli jq curl
Create backup user and directories
Create a dedicated user for backup operations with minimal privileges.
sudo useradd -r -s /bin/bash -d /opt/consul-backup consul-backup
sudo mkdir -p /opt/consul-backup/{scripts,backups,logs,keys}
sudo chown -R consul-backup:consul-backup /opt/consul-backup
sudo chmod 755 /opt/consul-backup
sudo chmod 700 /opt/consul-backup/{backups,keys}
Generate GPG encryption key
Create a GPG key pair for encrypting backup files. Store the private key securely.
sudo -u consul-backup gpg --batch --gen-key <<EOF
Key-Type: RSA
Key-Length: 4096
Name-Real: Consul Backup
Name-Email: consul-backup@example.com
Expire-Date: 0
Passphrase: YourSecurePassphrase123!
%commit
EOF
Export the public key for verification and store the key ID:
sudo -u consul-backup gpg --list-keys
export CONSUL_GPG_KEY=$(sudo -u consul-backup gpg --list-keys --with-colons | grep fpr | head -1 | cut -d: -f10)
echo "CONSUL_GPG_KEY=$CONSUL_GPG_KEY" | sudo tee /opt/consul-backup/keys/gpg-key-id
Create Consul ACL token for backups
Generate a dedicated ACL token with minimal permissions for snapshot operations.
consul acl policy create -name "consul-backup" -rules - <<EOF
acl = "read"
key_prefix "" {
policy = "read"
}
node_prefix "" {
policy = "read"
}
operator = "read"
service_prefix "" {
policy = "read"
}
session_prefix "" {
policy = "read"
}
EOF
Create the token and store it securely:
BACKUP_TOKEN=$(consul acl token create -policy-name "consul-backup" -description "Consul backup token" -format json | jq -r '.SecretID')
echo "CONSUL_HTTP_TOKEN=$BACKUP_TOKEN" | sudo tee /opt/consul-backup/keys/consul-token
sudo chown consul-backup:consul-backup /opt/consul-backup/keys/consul-token
sudo chmod 600 /opt/consul-backup/keys/consul-token
Create backup script
Create the main backup script that handles snapshot creation, encryption, and remote storage.
#!/bin/bash
set -euo pipefail
Configuration
BACKUP_DIR="/opt/consul-backup/backups"
LOG_FILE="/opt/consul-backup/logs/backup.log"
RETENTION_DAYS=30
S3_BUCKET="your-consul-backups"
CONSUL_ADDR="http://localhost:8500"
GPG_KEY_FILE="/opt/consul-backup/keys/gpg-key-id"
TOKEN_FILE="/opt/consul-backup/keys/consul-token"
Source configuration files
[ -f "$GPG_KEY_FILE" ] && source "$GPG_KEY_FILE"
[ -f "$TOKEN_FILE" ] && source "$TOKEN_FILE"
Logging function
log() {
echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" | tee -a "$LOG_FILE"
}
Create timestamp
TIMESTAMP=$(date '+%Y%m%d_%H%M%S')
SNAPSHOT_FILE="consul-snapshot-$TIMESTAMP.snap"
ENCRYPTED_FILE="$SNAPSHOT_FILE.gpg"
log "Starting Consul backup process"
Check Consul connectivity
if ! consul members &>/dev/null; then
log "ERROR: Cannot connect to Consul cluster"
exit 1
fi
Create snapshot
log "Creating Consul snapshot"
if consul snapshot save -http-addr="$CONSUL_ADDR" "$BACKUP_DIR/$SNAPSHOT_FILE"; then
log "Snapshot created successfully: $SNAPSHOT_FILE"
else
log "ERROR: Failed to create snapshot"
exit 1
fi
Encrypt snapshot
log "Encrypting snapshot with GPG"
if gpg --trust-model always --encrypt -r "$CONSUL_GPG_KEY" --cipher-algo AES256 --compress-algo 2 --output "$BACKUP_DIR/$ENCRYPTED_FILE" "$BACKUP_DIR/$SNAPSHOT_FILE"; then
log "Snapshot encrypted successfully: $ENCRYPTED_FILE"
# Remove unencrypted file
rm "$BACKUP_DIR/$SNAPSHOT_FILE"
else
log "ERROR: Failed to encrypt snapshot"
exit 1
fi
Upload to S3 (optional)
if [ -n "${S3_BUCKET:-}" ]; then
log "Uploading encrypted snapshot to S3"
if aws s3 cp "$BACKUP_DIR/$ENCRYPTED_FILE" "s3://$S3_BUCKET/consul-backups/$ENCRYPTED_FILE"; then
log "Snapshot uploaded to S3 successfully"
else
log "WARNING: Failed to upload snapshot to S3"
fi
fi
Clean up old local backups
log "Cleaning up backups older than $RETENTION_DAYS days"
find "$BACKUP_DIR" -name "consul-snapshot-*.snap.gpg" -type f -mtime +"$RETENTION_DAYS" -delete
Verify backup integrity
log "Verifying backup integrity"
if gpg --trust-model always --decrypt "$BACKUP_DIR/$ENCRYPTED_FILE" >/dev/null 2>&1; then
log "Backup integrity verification successful"
else
log "ERROR: Backup integrity verification failed"
exit 1
fi
Update metrics file for monitoring
echo "consul_backup_last_success_timestamp $(date +%s)" > /opt/consul-backup/logs/backup-metrics.prom
echo "consul_backup_file_size_bytes $(stat -c%s "$BACKUP_DIR/$ENCRYPTED_FILE")" >> /opt/consul-backup/logs/backup-metrics.prom
log "Consul backup completed successfully"
log "Backup file: $ENCRYPTED_FILE"
log "File size: $(du -h "$BACKUP_DIR/$ENCRYPTED_FILE" | cut -f1)"
Make the script executable:
sudo chmod +x /opt/consul-backup/scripts/consul-backup.sh
sudo chown consul-backup:consul-backup /opt/consul-backup/scripts/consul-backup.sh
Create restore script
Create a disaster recovery script for restoring from encrypted snapshots.
#!/bin/bash
set -euo pipefail
Configuration
BACKUP_DIR="/opt/consul-backup/backups"
LOG_FILE="/opt/consul-backup/logs/restore.log"
CONSUL_ADDR="http://localhost:8500"
TOKEN_FILE="/opt/consul-backup/keys/consul-token"
Source token file
[ -f "$TOKEN_FILE" ] && source "$TOKEN_FILE"
Logging function
log() {
echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" | tee -a "$LOG_FILE"
}
Check if backup file is provided
if [ $# -eq 0 ]; then
echo "Usage: $0 "
echo "Available backups:"
ls -la "$BACKUP_DIR"/*.snap.gpg 2>/dev/null || echo "No backups found"
exit 1
fi
ENCRYPTED_FILE="$1"
SNAPSHOT_FILE="${ENCRYPTED_FILE%.gpg}"
Verify file exists
if [ ! -f "$BACKUP_DIR/$ENCRYPTED_FILE" ]; then
log "ERROR: Encrypted snapshot file not found: $BACKUP_DIR/$ENCRYPTED_FILE"
exit 1
fi
log "Starting Consul restore process"
log "Restoring from: $ENCRYPTED_FILE"
Decrypt snapshot
log "Decrypting snapshot"
if gpg --trust-model always --decrypt "$BACKUP_DIR/$ENCRYPTED_FILE" > "$BACKUP_DIR/$SNAPSHOT_FILE"; then
log "Snapshot decrypted successfully"
else
log "ERROR: Failed to decrypt snapshot"
exit 1
fi
Confirm restore operation
read -p "Are you sure you want to restore Consul data? This will overwrite existing data. (yes/no): " -r
if [[ ! $REPLY =~ ^[Yy][Ee][Ss]$ ]]; then
log "Restore operation cancelled"
rm "$BACKUP_DIR/$SNAPSHOT_FILE"
exit 0
fi
Stop Consul service on all nodes
log "Stopping Consul service on all nodes"
echo "Please stop Consul on all cluster nodes before proceeding"
read -p "Press Enter after stopping Consul on all nodes..."
Restore snapshot
log "Restoring Consul snapshot"
if consul snapshot restore -http-addr="$CONSUL_ADDR" "$BACKUP_DIR/$SNAPSHOT_FILE"; then
log "Snapshot restored successfully"
else
log "ERROR: Failed to restore snapshot"
rm "$BACKUP_DIR/$SNAPSHOT_FILE"
exit 1
fi
Clean up decrypted file
rm "$BACKUP_DIR/$SNAPSHOT_FILE"
log "Consul restore completed successfully"
log "Please start Consul service on all nodes"
Make the restore script executable:
sudo chmod +x /opt/consul-backup/scripts/consul-restore.sh
sudo chown consul-backup:consul-backup /opt/consul-backup/scripts/consul-restore.sh
Configure automated backups with systemd
Create a systemd service for running backups.
[Unit]
Description=Consul Backup Service
After=consul.service
Requires=consul.service
[Service]
Type=oneshot
User=consul-backup
Group=consul-backup
ExecStart=/opt/consul-backup/scripts/consul-backup.sh
Environment="HOME=/opt/consul-backup"
WorkingDirectory=/opt/consul-backup
StandardOutput=append:/opt/consul-backup/logs/backup.log
StandardError=append:/opt/consul-backup/logs/backup.log
[Install]
WantedBy=multi-user.target
Create a systemd timer for automated execution:
[Unit]
Description=Run Consul Backup Daily
Requires=consul-backup.service
[Timer]
OnCalendar=daily
RandomizedDelaySec=3600
Persistent=true
[Install]
WantedBy=timers.target
Enable and start the timer:
sudo systemctl daemon-reload
sudo systemctl enable consul-backup.timer
sudo systemctl start consul-backup.timer
sudo systemctl status consul-backup.timer
Configure backup monitoring
Create a monitoring script that exports metrics for Prometheus.
#!/bin/bash
set -euo pipefail
Configuration
BACKUP_DIR="/opt/consul-backup/backups"
METRICS_FILE="/opt/consul-backup/logs/backup-metrics.prom"
MAX_AGE_HOURS=26 # Alert if backup is older than 26 hours
Initialize metrics
echo "# HELP consul_backup_last_success_timestamp Unix timestamp of last successful backup" > "$METRICS_FILE"
echo "# TYPE consul_backup_last_success_timestamp gauge" >> "$METRICS_FILE"
echo "# HELP consul_backup_file_size_bytes Size of latest backup file in bytes" >> "$METRICS_FILE"
echo "# TYPE consul_backup_file_size_bytes gauge" >> "$METRICS_FILE"
echo "# HELP consul_backup_age_hours Age of latest backup in hours" >> "$METRICS_FILE"
echo "# TYPE consul_backup_age_hours gauge" >> "$METRICS_FILE"
Find latest backup
LATEST_BACKUP=$(find "$BACKUP_DIR" -name "consul-snapshot-*.snap.gpg" -type f -printf '%T@ %p\n' | sort -n | tail -1 | cut -d' ' -f2-)
if [ -n "$LATEST_BACKUP" ]; then
# Get backup timestamp and size
BACKUP_TIMESTAMP=$(stat -c %Y "$LATEST_BACKUP")
BACKUP_SIZE=$(stat -c %s "$LATEST_BACKUP")
CURRENT_TIME=$(date +%s)
AGE_HOURS=$(( (CURRENT_TIME - BACKUP_TIMESTAMP) / 3600 ))
echo "consul_backup_last_success_timestamp $BACKUP_TIMESTAMP" >> "$METRICS_FILE"
echo "consul_backup_file_size_bytes $BACKUP_SIZE" >> "$METRICS_FILE"
echo "consul_backup_age_hours $AGE_HOURS" >> "$METRICS_FILE"
# Health check
if [ $AGE_HOURS -gt $MAX_AGE_HOURS ]; then
echo "consul_backup_healthy 0" >> "$METRICS_FILE"
else
echo "consul_backup_healthy 1" >> "$METRICS_FILE"
fi
else
echo "consul_backup_healthy 0" >> "$METRICS_FILE"
echo "consul_backup_age_hours 999" >> "$METRICS_FILE"
fi
Export metrics to node_exporter textfile directory (if available)
if [ -d "/var/lib/prometheus/node-exporter" ]; then
cp "$METRICS_FILE" "/var/lib/prometheus/node-exporter/consul-backup.prom"
fi
Make the monitoring script executable and create a cron job:
sudo chmod +x /opt/consul-backup/scripts/backup-monitor.sh
sudo chown consul-backup:consul-backup /opt/consul-backup/scripts/backup-monitor.sh
Add to consul-backup user's crontab:
sudo -u consul-backup crontab -l 2>/dev/null | { cat; echo "/5 * /opt/consul-backup/scripts/backup-monitor.sh"; } | sudo -u consul-backup crontab -
Configure S3 storage (optional)
Set up AWS credentials for remote backup storage.
[default]
aws_access_key_id = YOUR_ACCESS_KEY
aws_secret_access_key = YOUR_SECRET_KEY
region = us-west-2
sudo mkdir -p /opt/consul-backup/.aws
sudo chown consul-backup:consul-backup /opt/consul-backup/.aws
sudo chmod 700 /opt/consul-backup/.aws
Create the S3 bucket:
aws s3 mb s3://your-consul-backups
aws s3api put-bucket-versioning --bucket your-consul-backups --versioning-configuration Status=Enabled
Create disaster recovery documentation
Document the complete disaster recovery procedure.
# Consul Disaster Recovery Procedure
Emergency Restore Steps
- Stop all Consul agents across the cluster
sudo systemctl stop consul
- Clean Consul data directory on all nodes
sudo rm -rf /opt/consul/data/*
- Choose restore point
ls -la /opt/consul-backup/backups/
- Restore from backup (run on one node only)
sudo -u consul-backup /opt/consul-backup/scripts/consul-restore.sh consul-snapshot-YYYYMMDD_HHMMSS.snap.gpg
- Start Consul on the leader node first
sudo systemctl start consul
- Verify leader election and wait for stability
consul members
consul operator raft list-peers
- Start Consul on remaining nodes one by one
sudo systemctl start consul
Recovery Verification
- Check cluster health:
consul members
- Verify services:
consul catalog services
- Test KV store:
consul kv get -recurse
- Monitor logs:
sudo journalctl -u consul -f
Emergency Contacts
- Infrastructure Team: [contact info]
- On-call Engineer: [contact info]
Verify your setup
Test the backup and restore process to ensure everything works correctly.
# Test manual backup
sudo -u consul-backup /opt/consul-backup/scripts/consul-backup.sh
Check backup files
ls -la /opt/consul-backup/backups/
Verify systemd timer status
sudo systemctl status consul-backup.timer
sudo systemctl list-timers consul-backup.timer
Check monitoring metrics
cat /opt/consul-backup/logs/backup-metrics.prom
View backup logs
tail -f /opt/consul-backup/logs/backup.log
Test decryption (without restoring)
sudo -u consul-backup gpg --decrypt /opt/consul-backup/backups/consul-snapshot-*.snap.gpg > /dev/null
Verify Consul connectivity
consul members
consul kv put test/backup "$(date)"
consul kv get test/backup
Configure Prometheus alerting
If you have Prometheus monitoring set up, add these alerting rules to monitor backup health. This integrates well with Prometheus long-term storage setups.
groups:
- name: consul-backup
rules:
- alert: ConsulBackupFailed
expr: consul_backup_healthy == 0
for: 1h
labels:
severity: critical
annotations:
summary: "Consul backup is failing"
description: "Consul backup has not completed successfully in the last 26 hours"
- alert: ConsulBackupAging
expr: consul_backup_age_hours > 30
for: 15m
labels:
severity: warning
annotations:
summary: "Consul backup is getting old"
description: "Last Consul backup is {{ $value }} hours old"
- alert: ConsulBackupSizeAnomaly
expr: |
consul_backup_file_size_bytes < 1000 or
consul_backup_file_size_bytes > 1000000000
for: 5m
labels:
severity: warning
annotations:
summary: "Consul backup size is unusual"
description: "Consul backup file size is {{ $value }} bytes, which seems unusual"
Common issues
| Symptom | Cause | Fix |
|---|---|---|
| Permission denied creating snapshot | Incorrect ACL token permissions | Verify token has operator read permissions: consul acl token read -id $TOKEN_ID |
| GPG encryption fails | GPG key not found or expired | List keys: sudo -u consul-backup gpg --list-keys and regenerate if needed |
| Backup script fails silently | Missing environment variables | Check log file: tail /opt/consul-backup/logs/backup.log |
| S3 upload fails | Invalid AWS credentials or permissions | Test AWS CLI: aws s3 ls and verify IAM permissions |
| Restore fails with "no leader" | Cluster not properly stopped | Ensure all Consul nodes are stopped before restore |
| Timer not running backups | Systemd timer not enabled | Enable timer: sudo systemctl enable consul-backup.timer |
| Monitoring metrics not updating | Node exporter textfile directory missing | Create directory: sudo mkdir -p /var/lib/prometheus/node-exporter |
Next steps
- Secure your Consul cluster with proper ACL policies and TLS encryption
- Monitor Consul performance with comprehensive Prometheus metrics and Grafana dashboards
- Implement Consul Connect service mesh for secure service-to-service communication
- Set up multi-datacenter Consul replication for geographic disaster recovery
- Configure Prometheus alerting for backup monitoring and failure notifications
Running this in production?
Automated install script
Run this to automate the entire setup
#!/usr/bin/env bash
set -euo pipefail
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color
# Configuration
CONSUL_BACKUP_USER="consul-backup"
BACKUP_BASE_DIR="/opt/consul-backup"
GPG_PASSPHRASE="${GPG_PASSPHRASE:-ConsulBackup$(date +%s)!}"
CONSUL_DATACENTER="${CONSUL_DATACENTER:-dc1}"
S3_BUCKET="${S3_BUCKET:-}"
# Usage message
usage() {
echo "Usage: $0 [options]"
echo "Options:"
echo " -b, --s3-bucket BUCKET S3 bucket for remote backups (optional)"
echo " -d, --datacenter NAME Consul datacenter name (default: dc1)"
echo " -p, --passphrase PHRASE GPG passphrase (default: auto-generated)"
echo " -h, --help Show this help message"
exit 1
}
# Parse command line arguments
while [[ $# -gt 0 ]]; do
case $1 in
-b|--s3-bucket)
S3_BUCKET="$2"
shift 2
;;
-d|--datacenter)
CONSUL_DATACENTER="$2"
shift 2
;;
-p|--passphrase)
GPG_PASSPHRASE="$2"
shift 2
;;
-h|--help)
usage
;;
*)
echo -e "${RED}Unknown option: $1${NC}"
usage
;;
esac
done
# Logging functions
log_info() {
echo -e "${GREEN}[INFO]${NC} $1"
}
log_warn() {
echo -e "${YELLOW}[WARN]${NC} $1"
}
log_error() {
echo -e "${RED}[ERROR]${NC} $1"
}
log_step() {
echo -e "${BLUE}$1${NC}"
}
# Cleanup function
cleanup() {
local exit_code=$?
if [[ $exit_code -ne 0 ]]; then
log_error "Installation failed. Cleaning up..."
systemctl stop consul-backup.timer 2>/dev/null || true
systemctl disable consul-backup.timer 2>/dev/null || true
rm -f /etc/systemd/system/consul-backup.service
rm -f /etc/systemd/system/consul-backup.timer
systemctl daemon-reload
fi
exit $exit_code
}
trap cleanup ERR
# Check if running as root
if [[ $EUID -ne 0 ]]; then
log_error "This script must be run as root"
exit 1
fi
# Detect OS distribution
log_step "[1/12] Detecting operating system..."
if [[ -f /etc/os-release ]]; then
. /etc/os-release
case "$ID" in
ubuntu|debian)
PKG_MGR="apt"
PKG_UPDATE="apt update"
PKG_INSTALL="apt install -y"
;;
almalinux|rocky|centos|rhel|ol|fedora)
PKG_MGR="dnf"
PKG_UPDATE="dnf check-update || true"
PKG_INSTALL="dnf install -y"
;;
amzn)
PKG_MGR="yum"
PKG_UPDATE="yum check-update || true"
PKG_INSTALL="yum install -y"
;;
*)
log_error "Unsupported distribution: $ID"
exit 1
;;
esac
log_info "Detected OS: $PRETTY_NAME"
else
log_error "Cannot detect operating system"
exit 1
fi
# Check prerequisites
log_step "[2/12] Checking prerequisites..."
if ! command -v consul &> /dev/null; then
log_error "Consul is not installed or not in PATH"
exit 1
fi
if ! consul members &> /dev/null; then
log_error "Cannot connect to Consul cluster"
exit 1
fi
log_info "Prerequisites check passed"
# Install required packages
log_step "[3/12] Installing required packages..."
$PKG_UPDATE
$PKG_INSTALL gnupg2 jq curl
if [[ -n "$S3_BUCKET" ]]; then
if [[ "$PKG_MGR" == "apt" ]]; then
$PKG_INSTALL awscli
else
$PKG_INSTALL awscli2
fi
fi
log_info "Required packages installed"
# Create backup user and directories
log_step "[4/12] Creating backup user and directories..."
if ! id "$CONSUL_BACKUP_USER" &>/dev/null; then
useradd -r -s /bin/bash -d "$BACKUP_BASE_DIR" "$CONSUL_BACKUP_USER"
fi
mkdir -p "$BACKUP_BASE_DIR"/{scripts,backups,logs,keys}
chown -R "$CONSUL_BACKUP_USER:$CONSUL_BACKUP_USER" "$BACKUP_BASE_DIR"
chmod 755 "$BACKUP_BASE_DIR"
chmod 700 "$BACKUP_BASE_DIR"/{backups,keys}
chmod 755 "$BACKUP_BASE_DIR"/{scripts,logs}
log_info "Backup user and directories created"
# Generate GPG encryption key
log_step "[5/12] Generating GPG encryption key..."
sudo -u "$CONSUL_BACKUP_USER" gpg --batch --gen-key <<EOF
Key-Type: RSA
Key-Length: 4096
Name-Real: Consul Backup
Name-Email: consul-backup@$(hostname -f)
Expire-Date: 0
Passphrase: $GPG_PASSPHRASE
%commit
EOF
GPG_KEY_ID=$(sudo -u "$CONSUL_BACKUP_USER" gpg --list-keys --with-colons | grep fpr | head -1 | cut -d: -f10)
echo "CONSUL_GPG_KEY=$GPG_KEY_ID" > "$BACKUP_BASE_DIR/keys/gpg-key-id"
echo "GPG_PASSPHRASE=$GPG_PASSPHRASE" >> "$BACKUP_BASE_DIR/keys/gpg-key-id"
chown "$CONSUL_BACKUP_USER:$CONSUL_BACKUP_USER" "$BACKUP_BASE_DIR/keys/gpg-key-id"
chmod 600 "$BACKUP_BASE_DIR/keys/gpg-key-id"
log_info "GPG key generated with ID: ${GPG_KEY_ID:0:16}..."
# Create Consul ACL policy and token
log_step "[6/12] Creating Consul ACL policy and token..."
consul acl policy create -name "consul-backup" -rules - <<EOF || true
acl = "read"
key_prefix "" {
policy = "read"
}
node_prefix "" {
policy = "read"
}
operator = "read"
service_prefix "" {
policy = "read"
}
session_prefix "" {
policy = "read"
}
EOF
BACKUP_TOKEN=$(consul acl token create -policy-name "consul-backup" -description "Consul backup token" -format json | jq -r '.SecretID')
echo "CONSUL_HTTP_TOKEN=$BACKUP_TOKEN" > "$BACKUP_BASE_DIR/keys/consul-token"
chown "$CONSUL_BACKUP_USER:$CONSUL_BACKUP_USER" "$BACKUP_BASE_DIR/keys/consul-token"
chmod 600 "$BACKUP_BASE_DIR/keys/consul-token"
log_info "Consul ACL token created"
# Create backup script
log_step "[7/12] Creating backup script..."
cat > "$BACKUP_BASE_DIR/scripts/consul-backup.sh" <<'SCRIPT_EOF'
#!/bin/bash
set -euo pipefail
BACKUP_DIR="/opt/consul-backup/backups"
LOG_FILE="/opt/consul-backup/logs/backup.log"
RETENTION_DAYS=30
S3_BUCKET="${S3_BUCKET:-}"
CONSUL_ADDR="http://localhost:8500"
GPG_KEY_FILE="/opt/consul-backup/keys/gpg-key-id"
TOKEN_FILE="/opt/consul-backup/keys/consul-token"
[[ -f "$GPG_KEY_FILE" ]] && source "$GPG_KEY_FILE"
[[ -f "$TOKEN_FILE" ]] && source "$TOKEN_FILE"
log() {
echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" | tee -a "$LOG_FILE"
}
TIMESTAMP=$(date '+%Y%m%d_%H%M%S')
SNAPSHOT_FILE="consul-snapshot-$TIMESTAMP.snap"
ENCRYPTED_FILE="$SNAPSHOT_FILE.gpg"
log "Starting Consul backup process"
if ! consul members &>/dev/null; then
log "ERROR: Cannot connect to Consul cluster"
exit 1
fi
log "Creating Consul snapshot"
if consul snapshot save -http-addr="$CONSUL_ADDR" "$BACKUP_DIR/$SNAPSHOT_FILE"; then
log "Snapshot created successfully: $SNAPSHOT_FILE"
else
log "ERROR: Failed to create snapshot"
exit 1
fi
log "Encrypting snapshot with GPG"
if echo "$GPG_PASSPHRASE" | gpg --batch --yes --passphrase-fd 0 --trust-model always --encrypt -r "$CONSUL_GPG_KEY" --cipher-algo AES256 --output "$BACKUP_DIR/$ENCRYPTED_FILE" "$BACKUP_DIR/$SNAPSHOT_FILE"; then
log "Snapshot encrypted successfully"
rm -f "$BACKUP_DIR/$SNAPSHOT_FILE"
else
log "ERROR: Failed to encrypt snapshot"
exit 1
fi
if [[ -n "$S3_BUCKET" ]] && command -v aws &>/dev/null; then
log "Uploading encrypted snapshot to S3"
if aws s3 cp "$BACKUP_DIR/$ENCRYPTED_FILE" "s3://$S3_BUCKET/consul-backups/$(date +%Y/%m)/"; then
log "Snapshot uploaded to S3 successfully"
else
log "WARNING: Failed to upload snapshot to S3"
fi
fi
log "Cleaning up old backups (older than $RETENTION_DAYS days)"
find "$BACKUP_DIR" -name "consul-snapshot-*.snap.gpg" -mtime +$RETENTION_DAYS -delete
find "$LOG_FILE" -mtime +$RETENTION_DAYS -delete 2>/dev/null || true
log "Backup process completed successfully"
SCRIPT_EOF
chown "$CONSUL_BACKUP_USER:$CONSUL_BACKUP_USER" "$BACKUP_BASE_DIR/scripts/consul-backup.sh"
chmod 750 "$BACKUP_BASE_DIR/scripts/consul-backup.sh"
log_info "Backup script created"
# Create systemd service
log_step "[8/12] Creating systemd service..."
cat > /etc/systemd/system/consul-backup.service <<EOF
[Unit]
Description=Consul Backup Service
After=consul.service
[Service]
Type=oneshot
User=$CONSUL_BACKUP_USER
Group=$CONSUL_BACKUP_USER
ExecStart=$BACKUP_BASE_DIR/scripts/consul-backup.sh
Environment=S3_BUCKET=$S3_BUCKET
StandardOutput=journal
StandardError=journal
EOF
log_info "Systemd service created"
# Create systemd timer
log_step "[9/12] Creating systemd timer..."
cat > /etc/systemd/system/consul-backup.timer <<EOF
[Unit]
Description=Run Consul backup every 6 hours
Requires=consul-backup.service
[Timer]
OnCalendar=*-*-* 00,06,12,18:00:00
Persistent=true
[Install]
WantedBy=timers.target
EOF
systemctl daemon-reload
systemctl enable consul-backup.timer
systemctl start consul-backup.timer
log_info "Systemd timer configured and started"
# Create restore script
log_step "[10/12] Creating restore script..."
cat > "$BACKUP_BASE_DIR/scripts/consul-restore.sh" <<'RESTORE_EOF'
#!/bin/bash
set -euo pipefail
if [[ $# -ne 1 ]]; then
echo "Usage: $0 <encrypted_snapshot_file>"
exit 1
fi
ENCRYPTED_FILE="$1"
GPG_KEY_FILE="/opt/consul-backup/keys/gpg-key-id"
TOKEN_FILE="/opt/consul-backup/keys/consul-token"
[[ -f "$GPG_KEY_FILE" ]] && source "$GPG_KEY_FILE"
[[ -f "$TOKEN_FILE" ]] && source "$TOKEN_FILE"
if [[ ! -f "$ENCRYPTED_FILE" ]]; then
echo "ERROR: Encrypted snapshot file not found: $ENCRYPTED_FILE"
exit 1
fi
TEMP_SNAPSHOT="/tmp/consul-restore-$(date +%s).snap"
echo "Decrypting snapshot..."
if echo "$GPG_PASSPHRASE" | gpg --batch --yes --passphrase-fd 0 --decrypt --output "$TEMP_SNAPSHOT" "$ENCRYPTED_FILE"; then
echo "Snapshot decrypted successfully"
else
echo "ERROR: Failed to decrypt snapshot"
exit 1
fi
echo "Restoring Consul snapshot..."
if consul snapshot restore "$TEMP_SNAPSHOT"; then
echo "Snapshot restored successfully"
rm -f "$TEMP_SNAPSHOT"
else
echo "ERROR: Failed to restore snapshot"
rm -f "$TEMP_SNAPSHOT"
exit 1
fi
RESTORE_EOF
chown "$CONSUL_BACKUP_USER:$CONSUL_BACKUP_USER" "$BACKUP_BASE_DIR/scripts/consul-restore.sh"
chmod 750 "$BACKUP_BASE_DIR/scripts/consul-restore.sh"
log_info "Restore script created"
# Configure log rotation
log_step "[11/12] Configuring log rotation..."
cat > /etc/logrotate.d/consul-backup <<EOF
$BACKUP_BASE_DIR/logs/*.log {
daily
rotate 30
compress
delaycompress
missingok
notifempty
create 644 $CONSUL_BACKUP_USER $CONSUL_BACKUP_USER
}
EOF
log_info "Log rotation configured"
# Verify installation
log_step "[12/12] Verifying installation..."
if systemctl is-active consul-backup.timer &>/dev/null; then
log_info "Consul backup timer is active"
else
log_error "Consul backup timer is not active"
exit 1
fi
# Test backup script
log_info "Running test backup..."
sudo -u "$CONSUL_BACKUP_USER" "$BACKUP_BASE_DIR/scripts/consul-backup.sh"
if [[ -n "$(find "$BACKUP_BASE_DIR/backups" -name "consul-snapshot-*.snap.gpg" -mtime -1)" ]]; then
log_info "Test backup completed successfully"
else
log_error "Test backup failed"
exit 1
fi
echo
log_info "Consul backup and disaster recovery setup completed successfully!"
echo
echo -e "${BLUE}Configuration Summary:${NC}"
echo "- Backup user: $CONSUL_BACKUP_USER"
echo "- Backup directory: $BACKUP_BASE_DIR"
echo "- GPG key ID: ${GPG_KEY_ID:0
Review the script before running. Execute with: bash install.sh