Configure automated backup strategies for CockroachDB with systemd timers, implement comprehensive disaster recovery procedures, and set up monitoring with Prometheus and Grafana for production-grade database infrastructure.
Prerequisites
- Root or sudo access
- S3-compatible storage account
- 3 or more servers for cluster
- Basic understanding of SQL and systemd
What this solves
CockroachDB clusters need automated backup strategies and disaster recovery procedures to protect against data loss and maintain business continuity. This tutorial sets up automated database backups using systemd timers, configures cross-region disaster recovery with point-in-time recovery capabilities, and implements comprehensive monitoring with alerting to ensure your distributed SQL database remains resilient and recoverable.
Step-by-step installation
Install CockroachDB cluster
First, install CockroachDB on your primary cluster nodes. We'll set up a three-node cluster for high availability.
wget -qO- https://binaries.cockroachdb.com/cockroach-v24.3.0.linux-amd64.tgz | tar xz
sudo mv cockroach-v24.3.0.linux-amd64/cockroach /usr/local/bin/
sudo chmod 755 /usr/local/bin/cockroach
Create CockroachDB user and directories
Create a dedicated user for CockroachDB and set up the required directory structure with proper permissions.
sudo useradd -m -s /bin/bash cockroach
sudo mkdir -p /var/lib/cockroach /var/log/cockroach /etc/cockroach
sudo chown cockroach:cockroach /var/lib/cockroach /var/log/cockroach
sudo chmod 750 /var/lib/cockroach /var/log/cockroach
sudo chmod 755 /etc/cockroach
Generate cluster certificates
Create SSL certificates for secure cluster communication. This sets up a certificate authority and node certificates.
sudo mkdir -p /etc/cockroach/certs /etc/cockroach/private
sudo cockroach cert create-ca --certs-dir=/etc/cockroach/certs --ca-key=/etc/cockroach/private/ca.key
sudo cockroach cert create-node localhost 203.0.113.10 203.0.113.11 203.0.113.12 --certs-dir=/etc/cockroach/certs --ca-key=/etc/cockroach/private/ca.key
sudo cockroach cert create-client root --certs-dir=/etc/cockroach/certs --ca-key=/etc/cockroach/private/ca.key
sudo chown -R cockroach:cockroach /etc/cockroach/certs
sudo chmod 400 /etc/cockroach/private/ca.key
Configure CockroachDB systemd service
Create a systemd service file for automatic startup and process management.
[Unit]
Description=Cockroach Database cluster node
Requires=network.target
After=network.target
[Service]
Type=notify
User=cockroach
Group=cockroach
ExecStart=/usr/local/bin/cockroach start --certs-dir=/etc/cockroach/certs --store=/var/lib/cockroach --listen-addr=203.0.113.10:26257 --http-addr=203.0.113.10:8080 --join=203.0.113.10:26257,203.0.113.11:26257,203.0.113.12:26257 --background=false
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal
SyslogIdentifier=cockroach
KillMode=mixed
KillSignal=SIGTERM
TimeoutStopSec=60
[Install]
WantedBy=multi-user.target
Initialize the cluster
Start the services on all nodes, then initialize the cluster from one node.
sudo systemctl daemon-reload
sudo systemctl enable cockroach
sudo systemctl start cockroach
sudo systemctl status cockroach
On the first node only, initialize the cluster:
sudo -u cockroach cockroach init --certs-dir=/etc/cockroach/certs --host=203.0.113.10:26257
Install backup dependencies
Install required packages for backup automation and monitoring.
sudo apt update
sudo apt install -y awscli postgresql-client-common curl jq
Configure S3-compatible backup storage
Set up credentials for S3-compatible storage where backups will be stored.
#!/bin/bash
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_DEFAULT_REGION="us-east-1"
export BACKUP_BUCKET="cockroachdb-backups"
export CLUSTER_NAME="production"
export BACKUP_RETENTION_DAYS="30"
sudo chown cockroach:cockroach /etc/cockroach/backup-config
sudo chmod 600 /etc/cockroach/backup-config
Create backup automation script
Create a comprehensive backup script with error handling, logging, and cleanup.
#!/bin/bash
set -euo pipefail
Source configuration
source /etc/cockroach/backup-config
Logging setup
LOGFILE="/var/log/cockroach/backup-$(date +%Y%m%d-%H%M%S).log"
exec 1> >(tee -a "$LOGFILE")
exec 2>&1
echo "[$(date)] Starting CockroachDB backup"
Backup timestamp
BACKUP_TIMESTAMP=$(date +%Y%m%d-%H%M%S)
BACKUP_PATH="s3://${BACKUP_BUCKET}/${CLUSTER_NAME}/full/${BACKUP_TIMESTAMP}"
Perform backup
echo "[$(date)] Creating full backup to: $BACKUP_PATH"
cockroach sql --certs-dir=/etc/cockroach/certs --host=localhost:26257 --execute="BACKUP TO '$BACKUP_PATH' WITH revision_history;"
if [ $? -eq 0 ]; then
echo "[$(date)] Backup completed successfully"
# Update latest backup marker
echo "$BACKUP_TIMESTAMP" > /var/lib/cockroach/last-backup
# Cleanup old backups
echo "[$(date)] Cleaning up backups older than $BACKUP_RETENTION_DAYS days"
aws s3 ls "s3://${BACKUP_BUCKET}/${CLUSTER_NAME}/full/" | while read -r line; do
BACKUP_DATE=$(echo "$line" | awk '{print $1" "$2}')
BACKUP_NAME=$(echo "$line" | awk '{print $4}' | sed 's|/||')
if [[ -n "$BACKUP_DATE" && -n "$BACKUP_NAME" ]]; then
BACKUP_EPOCH=$(date -d "$BACKUP_DATE" +%s)
CUTOFF_EPOCH=$(date -d "$BACKUP_RETENTION_DAYS days ago" +%s)
if [[ $BACKUP_EPOCH -lt $CUTOFF_EPOCH ]]; then
echo "[$(date)] Deleting old backup: $BACKUP_NAME"
aws s3 rm "s3://${BACKUP_BUCKET}/${CLUSTER_NAME}/full/${BACKUP_NAME}/" --recursive
fi
fi
done
echo "[$(date)] Backup process completed successfully"
exit 0
else
echo "[$(date)] Backup failed with exit code $?"
exit 1
fi
sudo chmod 755 /usr/local/bin/cockroach-backup.sh
sudo chown cockroach:cockroach /usr/local/bin/cockroach-backup.sh
Create incremental backup script
Set up incremental backups for more frequent data protection between full backups.
#!/bin/bash
set -euo pipefail
Source configuration
source /etc/cockroach/backup-config
Logging setup
LOGFILE="/var/log/cockroach/incremental-backup-$(date +%Y%m%d-%H%M%S).log"
exec 1> >(tee -a "$LOGFILE")
exec 2>&1
echo "[$(date)] Starting CockroachDB incremental backup"
Get latest full backup
if [ ! -f "/var/lib/cockroach/last-backup" ]; then
echo "[$(date)] No full backup found. Please run full backup first."
exit 1
fi
LAST_FULL_BACKUP=$(cat /var/lib/cockroach/last-backup)
BACKUP_TIMESTAMP=$(date +%Y%m%d-%H%M%S)
FULL_BACKUP_PATH="s3://${BACKUP_BUCKET}/${CLUSTER_NAME}/full/${LAST_FULL_BACKUP}"
INCREMENTAL_PATH="s3://${BACKUP_BUCKET}/${CLUSTER_NAME}/incremental/${BACKUP_TIMESTAMP}"
echo "[$(date)] Creating incremental backup based on: $FULL_BACKUP_PATH"
echo "[$(date)] Incremental backup destination: $INCREMENTAL_PATH"
cockroach sql --certs-dir=/etc/cockroach/certs --host=localhost:26257 --execute="BACKUP TO '$INCREMENTAL_PATH' INCREMENTAL FROM '$FULL_BACKUP_PATH' WITH revision_history;"
if [ $? -eq 0 ]; then
echo "[$(date)] Incremental backup completed successfully"
exit 0
else
echo "[$(date)] Incremental backup failed with exit code $?"
exit 1
fi
sudo chmod 755 /usr/local/bin/cockroach-incremental-backup.sh
sudo chown cockroach:cockroach /usr/local/bin/cockroach-incremental-backup.sh
Configure systemd timers for automated backups
Create systemd timer units for both full and incremental backups.
[Unit]
Description=CockroachDB Full Backup
Wants=network-online.target
After=network-online.target cockroach.service
Requires=cockroach.service
[Service]
Type=oneshot
User=cockroach
Group=cockroach
ExecStart=/usr/local/bin/cockroach-backup.sh
Environment=PATH=/usr/local/bin:/usr/bin:/bin
StandardOutput=journal
StandardError=journal
[Unit]
Description=Run CockroachDB full backup daily
Requires=cockroach-backup.service
[Timer]
OnCalendar=daily
RandomizedDelaySec=1800
Persistent=true
[Install]
WantedBy=timers.target
[Unit]
Description=CockroachDB Incremental Backup
Wants=network-online.target
After=network-online.target cockroach.service
Requires=cockroach.service
[Service]
Type=oneshot
User=cockroach
Group=cockroach
ExecStart=/usr/local/bin/cockroach-incremental-backup.sh
Environment=PATH=/usr/local/bin:/usr/bin:/bin
StandardOutput=journal
StandardError=journal
[Unit]
Description=Run CockroachDB incremental backup every 4 hours
Requires=cockroach-incremental-backup.service
[Timer]
OnCalendar=--* 00,04,08,12,16,20:00:00
RandomizedDelaySec=300
Persistent=true
[Install]
WantedBy=timers.target
Enable backup timers
Start and enable the systemd timers for automated backup execution.
sudo systemctl daemon-reload
sudo systemctl enable cockroach-backup.timer cockroach-incremental-backup.timer
sudo systemctl start cockroach-backup.timer cockroach-incremental-backup.timer
sudo systemctl status cockroach-backup.timer cockroach-incremental-backup.timer
Create disaster recovery script
Create a comprehensive disaster recovery script for point-in-time restoration.
#!/bin/bash
set -euo pipefail
Source configuration
source /etc/cockroach/backup-config
if [ "$#" -lt 1 ]; then
echo "Usage: $0 [target-time]"
echo "Example: $0 20241201-120000"
echo "Example: $0 20241201-120000 '2024-12-01 15:30:00'"
exit 1
fi
BACKUP_TIMESTAMP="$1"
TARGET_TIME="${2:-}"
BACKUP_PATH="s3://${BACKUP_BUCKET}/${CLUSTER_NAME}/full/${BACKUP_TIMESTAMP}"
Logging setup
LOGFILE="/var/log/cockroach/restore-$(date +%Y%m%d-%H%M%S).log"
exec 1> >(tee -a "$LOGFILE")
exec 2>&1
echo "[$(date)] Starting CockroachDB restore from backup: $BACKUP_TIMESTAMP"
Verify backup exists
echo "[$(date)] Verifying backup exists at: $BACKUP_PATH"
aws s3 ls "$BACKUP_PATH/" > /dev/null
if [ $? -ne 0 ]; then
echo "[$(date)] ERROR: Backup not found at $BACKUP_PATH"
exit 1
fi
Build restore command
if [ -n "$TARGET_TIME" ]; then
RESTORE_CMD="RESTORE DATABASE defaultdb FROM '$BACKUP_PATH' AS OF SYSTEM TIME '$TARGET_TIME' WITH skip_missing_foreign_keys;"
echo "[$(date)] Performing point-in-time restore to: $TARGET_TIME"
else
RESTORE_CMD="RESTORE DATABASE defaultdb FROM '$BACKUP_PATH' WITH skip_missing_foreign_keys;"
echo "[$(date)] Performing full restore to latest backup time"
fi
echo "[$(date)] Executing restore command"
cockroach sql --certs-dir=/etc/cockroach/certs --host=localhost:26257 --execute="$RESTORE_CMD"
if [ $? -eq 0 ]; then
echo "[$(date)] Restore completed successfully"
exit 0
else
echo "[$(date)] Restore failed with exit code $?"
exit 1
fi
sudo chmod 755 /usr/local/bin/cockroach-restore.sh
sudo chown cockroach:cockroach /usr/local/bin/cockroach-restore.sh
Install Prometheus for monitoring
Install Prometheus to collect metrics from CockroachDB and backup processes.
wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
tar xzf prometheus-2.45.0.linux-amd64.tar.gz
sudo mv prometheus-2.45.0.linux-amd64/prometheus /usr/local/bin/
sudo mv prometheus-2.45.0.linux-amd64/promtool /usr/local/bin/
sudo useradd -M -s /bin/false prometheus
sudo mkdir -p /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /etc/prometheus /var/lib/prometheus
Configure Prometheus for CockroachDB monitoring
Set up Prometheus configuration to scrape CockroachDB metrics and backup job status.
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "/etc/prometheus/cockroach-rules.yml"
scrape_configs:
- job_name: 'cockroachdb'
static_configs:
- targets:
- '203.0.113.10:8080'
- '203.0.113.11:8080'
- '203.0.113.12:8080'
metrics_path: '/_status/vars'
scrape_interval: 10s
- job_name: 'node-exporter'
static_configs:
- targets:
- '203.0.113.10:9100'
- '203.0.113.11:9100'
- '203.0.113.12:9100'
- job_name: 'backup-monitoring'
static_configs:
- targets: ['localhost:9090']
metrics_path: '/metrics'
scrape_interval: 60s
Create CockroachDB alerting rules
Define Prometheus alerting rules for CockroachDB cluster health and backup monitoring.
groups:
- name: cockroachdb
rules:
- alert: CockroachDBNodeDown
expr: up{job="cockroachdb"} == 0
for: 5m
labels:
severity: critical
annotations:
summary: "CockroachDB node is down"
description: "CockroachDB node {{ $labels.instance }} has been down for more than 5 minutes."
- alert: CockroachDBHighCPU
expr: rate(sys_cpu_user_percent{job="cockroachdb"}[5m]) > 80
for: 10m
labels:
severity: warning
annotations:
summary: "CockroachDB high CPU usage"
description: "CockroachDB node {{ $labels.instance }} CPU usage is above 80% for more than 10 minutes."
- alert: CockroachDBHighMemory
expr: sys_rss{job="cockroachdb"} / sys_rss_limit{job="cockroachdb"} > 0.9
for: 10m
labels:
severity: warning
annotations:
summary: "CockroachDB high memory usage"
description: "CockroachDB node {{ $labels.instance }} memory usage is above 90% for more than 10 minutes."
- alert: CockroachDBBackupFailed
expr: time() - cockroachdb_last_backup_timestamp > 86400
for: 1h
labels:
severity: critical
annotations:
summary: "CockroachDB backup failed"
description: "CockroachDB backup has not completed successfully in the last 24 hours."
- alert: CockroachDBReplicationLag
expr: replication_lag_seconds{job="cockroachdb"} > 300
for: 5m
labels:
severity: warning
annotations:
summary: "CockroachDB replication lag"
description: "CockroachDB replication lag on {{ $labels.instance }} is above 5 minutes."
- alert: CockroachDBUnderReplicated
expr: ranges_underreplicated{job="cockroachdb"} > 0
for: 15m
labels:
severity: critical
annotations:
summary: "CockroachDB under-replicated ranges"
description: "CockroachDB has {{ $value }} under-replicated ranges for more than 15 minutes."
Create backup monitoring script
Create a script that exposes backup metrics to Prometheus.
#!/bin/bash
set -euo pipefail
Source configuration
source /etc/cockroach/backup-config
METRICS_FILE="/var/lib/prometheus/backup-metrics.prom"
Initialize metrics file
echo "# HELP cockroachdb_last_backup_timestamp Unix timestamp of last successful backup" > "$METRICS_FILE"
echo "# TYPE cockroachdb_last_backup_timestamp gauge" >> "$METRICS_FILE"
Check last backup time
if [ -f "/var/lib/cockroach/last-backup" ]; then
LAST_BACKUP_FILE=$(cat /var/lib/cockroach/last-backup)
BACKUP_TIMESTAMP=$(date -d "${LAST_BACKUP_FILE:0:8} ${LAST_BACKUP_FILE:9:2}:${LAST_BACKUP_FILE:11:2}:${LAST_BACKUP_FILE:13:2}" +%s)
echo "cockroachdb_last_backup_timestamp $BACKUP_TIMESTAMP" >> "$METRICS_FILE"
else
echo "cockroachdb_last_backup_timestamp 0" >> "$METRICS_FILE"
fi
Check backup job status from logs
echo "# HELP cockroachdb_backup_success Last backup job success (1=success, 0=failure)" >> "$METRICS_FILE"
echo "# TYPE cockroachdb_backup_success gauge" >> "$METRICS_FILE"
if grep -q "Backup completed successfully" /var/log/cockroach/backup-*.log 2>/dev/null | tail -1; then
echo "cockroachdb_backup_success 1" >> "$METRICS_FILE"
else
echo "cockroachdb_backup_success 0" >> "$METRICS_FILE"
fi
Count backup files
echo "# HELP cockroachdb_backup_count Total number of backups in storage" >> "$METRICS_FILE"
echo "# TYPE cockroachdb_backup_count gauge" >> "$METRICS_FILE"
BACKUP_COUNT=$(aws s3 ls "s3://${BACKUP_BUCKET}/${CLUSTER_NAME}/full/" | wc -l || echo "0")
echo "cockroachdb_backup_count $BACKUP_COUNT" >> "$METRICS_FILE"
sudo chmod 755 /usr/local/bin/cockroach-backup-metrics.sh
sudo chown prometheus:prometheus /usr/local/bin/cockroach-backup-metrics.sh
Create systemd timer for backup metrics
Set up a systemd timer to update backup metrics for Prometheus scraping.
[Unit]
Description=Update CockroachDB backup metrics
Wants=network-online.target
After=network-online.target
[Service]
Type=oneshot
User=prometheus
Group=prometheus
ExecStart=/usr/local/bin/cockroach-backup-metrics.sh
StandardOutput=journal
StandardError=journal
[Unit]
Description=Update backup metrics every 5 minutes
Requires=cockroach-backup-metrics.service
[Timer]
OnBootSec=5min
OnUnitActiveSec=5min
Persistent=true
[Install]
WantedBy=timers.target
Configure Prometheus systemd service
Create and start the Prometheus service for monitoring.
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
Restart=on-failure
RestartSec=5s
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries \
--web.listen-address=0.0.0.0:9090 \
--web.enable-lifecycle \
--storage.tsdb.retention.time=30d
[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable prometheus cockroach-backup-metrics.timer
sudo systemctl start prometheus cockroach-backup-metrics.timer
Install and configure Grafana
Install Grafana for visualizing CockroachDB metrics and backup status.
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt update
sudo apt install -y grafana
sudo systemctl enable grafana-server
sudo systemctl start grafana-server
Configure alerting with Alertmanager
Install and configure Alertmanager for sending backup failure notifications.
wget https://github.com/prometheus/alertmanager/releases/download/v0.25.0/alertmanager-0.25.0.linux-amd64.tar.gz
tar xzf alertmanager-0.25.0.linux-amd64.tar.gz
sudo mv alertmanager-0.25.0.linux-amd64/alertmanager /usr/local/bin/
sudo useradd -M -s /bin/false alertmanager
sudo mkdir -p /etc/alertmanager /var/lib/alertmanager
sudo chown alertmanager:alertmanager /etc/alertmanager /var/lib/alertmanager
global:
smtp_smarthost: 'localhost:587'
smtp_from: 'alerts@example.com'
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'web.hook'
receivers:
- name: 'web.hook'
email_configs:
- to: 'admin@example.com'
subject: 'CockroachDB Alert: {{ .GroupLabels.alertname }}'
body: |
{{ range .Alerts }}
Alert: {{ .Annotations.summary }}
Description: {{ .Annotations.description }}
Instance: {{ .Labels.instance }}
{{ end }}
Verify your setup
Test the backup system and verify monitoring is working correctly.
# Check CockroachDB cluster status
sudo -u cockroach cockroach node status --certs-dir=/etc/cockroach/certs --host=localhost:26257
Test manual backup
sudo -u cockroach /usr/local/bin/cockroach-backup.sh
Check systemd timer status
sudo systemctl status cockroach-backup.timer cockroach-incremental-backup.timer
Verify Prometheus metrics
curl -s http://localhost:9090/metrics | grep cockroachdb_
Check backup logs
sudo tail -f /var/log/cockroach/backup-*.log
Access Grafana at http://your-server:3000 (admin/admin) and verify that the Prometheus data source is configured correctly. You can reference our advanced Grafana dashboards tutorial for detailed monitoring setup.
Test disaster recovery procedure
Practice the disaster recovery process with a test restoration.
# Create test data
sudo -u cockroach cockroach sql --certs-dir=/etc/cockroach/certs --host=localhost:26257 --execute="CREATE DATABASE testdr; USE testdr; CREATE TABLE test (id INT PRIMARY KEY, data STRING); INSERT INTO test VALUES (1, 'test-data');"
Perform backup
sudo -u cockroach /usr/local/bin/cockroach-backup.sh
Simulate disaster and restore (on a test cluster)
sudo -u cockroach /usr/local/bin/cockroach-restore.sh 20241201-120000
Common issues
| Symptom | Cause | Fix |
|---|---|---|
| Backup fails with permission denied | Incorrect file ownership or AWS credentials | sudo chown cockroach:cockroach /etc/cockroach/backup-config && chmod 600 /etc/cockroach/backup-config |
| Systemd timer not running | Timer not enabled or service file issues | sudo systemctl enable cockroach-backup.timer && systemctl start cockroach-backup.timer |
| Prometheus not scraping CockroachDB | Firewall blocking port 8080 or wrong target | Check netstat -ln | grep 8080 and verify prometheus.yml targets |
| Backup restoration fails | Target database already exists or wrong backup path | Drop existing database first or verify S3 backup path exists |
| High backup storage costs | Retention policy not working | Check backup cleanup script logs and verify AWS CLI permissions |
Next steps
- Configure Prometheus Alertmanager with Slack integration for team notifications
- Implement backup rotation policies for optimized storage management
- Configure multi-region CockroachDB deployment for geographic redundancy
- Set up cross-cluster replication for disaster recovery
- Monitor CockroachDB performance with advanced Grafana dashboards
Running this in production?
Automated install script
Run this to automate the entire setup
#!/usr/bin/env bash
set -euo pipefail
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'
# Default configuration
COCKROACH_VERSION="24.3.0"
CLUSTER_IPS="${1:-}"
NODE_IP="${2:-}"
BACKUP_BUCKET="${3:-cockroachdb-backups}"
usage() {
echo "Usage: $0 <cluster_ips> <node_ip> [backup_bucket]"
echo "Example: $0 '10.0.1.10,10.0.1.11,10.0.1.12' '10.0.1.10' 'my-backup-bucket'"
exit 1
}
if [[ -z "$CLUSTER_IPS" || -z "$NODE_IP" ]]; then
usage
fi
# Error handling
cleanup() {
echo -e "${RED}[ERROR] Installation failed. Cleaning up...${NC}"
systemctl stop cockroach 2>/dev/null || true
userdel cockroach 2>/dev/null || true
rm -rf /var/lib/cockroach /var/log/cockroach /etc/cockroach
exit 1
}
trap cleanup ERR
# Check prerequisites
if [[ $EUID -ne 0 ]]; then
echo -e "${RED}[ERROR] This script must be run as root${NC}"
exit 1
fi
# Auto-detect distribution
echo -e "${YELLOW}[1/12] Detecting distribution...${NC}"
if [ -f /etc/os-release ]; then
. /etc/os-release
case "$ID" in
ubuntu|debian)
PKG_MGR="apt"
PKG_UPDATE="apt update"
PKG_INSTALL="apt install -y"
PSQL_CLIENT="postgresql-client-common"
;;
almalinux|rocky|centos|rhel|ol|fedora)
PKG_MGR="dnf"
PKG_UPDATE="dnf check-update || true"
PKG_INSTALL="dnf install -y"
PSQL_CLIENT="postgresql"
;;
amzn)
PKG_MGR="yum"
PKG_UPDATE="yum check-update || true"
PKG_INSTALL="yum install -y"
PSQL_CLIENT="postgresql"
;;
*)
echo -e "${RED}[ERROR] Unsupported distribution: $ID${NC}"
exit 1
;;
esac
echo -e "${GREEN}Detected: $PRETTY_NAME${NC}"
else
echo -e "${RED}[ERROR] Cannot detect distribution${NC}"
exit 1
fi
# Install dependencies
echo -e "${YELLOW}[2/12] Installing dependencies...${NC}"
$PKG_UPDATE
$PKG_INSTALL curl wget tar awscli $PSQL_CLIENT jq
# Download and install CockroachDB
echo -e "${YELLOW}[3/12] Downloading CockroachDB v${COCKROACH_VERSION}...${NC}"
cd /tmp
wget -q "https://binaries.cockroachdb.com/cockroach-v${COCKROACH_VERSION}.linux-amd64.tgz"
tar xzf "cockroach-v${COCKROACH_VERSION}.linux-amd64.tgz"
mv "cockroach-v${COCKROACH_VERSION}.linux-amd64/cockroach" /usr/local/bin/
chmod 755 /usr/local/bin/cockroach
rm -rf "cockroach-v${COCKROACH_VERSION}.linux-amd64"*
# Create user and directories
echo -e "${YELLOW}[4/12] Creating CockroachDB user and directories...${NC}"
useradd -m -s /bin/bash cockroach || true
mkdir -p /var/lib/cockroach /var/log/cockroach /etc/cockroach/{certs,private}
chown cockroach:cockroach /var/lib/cockroach /var/log/cockroach
chown -R cockroach:cockroach /etc/cockroach
chmod 750 /var/lib/cockroach /var/log/cockroach
chmod 755 /etc/cockroach
# Generate certificates
echo -e "${YELLOW}[5/12] Generating cluster certificates...${NC}"
cockroach cert create-ca --certs-dir=/etc/cockroach/certs --ca-key=/etc/cockroach/private/ca.key
cockroach cert create-node localhost $NODE_IP $(echo $CLUSTER_IPS | tr ',' ' ') --certs-dir=/etc/cockroach/certs --ca-key=/etc/cockroach/private/ca.key
cockroach cert create-client root --certs-dir=/etc/cockroach/certs --ca-key=/etc/cockroach/private/ca.key
chown -R cockroach:cockroach /etc/cockroach/certs
chmod 400 /etc/cockroach/private/ca.key
# Create systemd service
echo -e "${YELLOW}[6/12] Creating systemd service...${NC}"
cat > /etc/systemd/system/cockroach.service << EOF
[Unit]
Description=Cockroach Database cluster node
Requires=network.target
After=network.target
[Service]
Type=notify
User=cockroach
Group=cockroach
ExecStart=/usr/local/bin/cockroach start --certs-dir=/etc/cockroach/certs --store=/var/lib/cockroach --listen-addr=${NODE_IP}:26257 --http-addr=${NODE_IP}:8080 --join=$(echo $CLUSTER_IPS | sed 's/,/:26257,/g'):26257 --background=false
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal
SyslogIdentifier=cockroach
KillMode=mixed
KillSignal=SIGTERM
TimeoutStopSec=60
[Install]
WantedBy=multi-user.target
EOF
# Start CockroachDB service
echo -e "${YELLOW}[7/12] Starting CockroachDB service...${NC}"
systemctl daemon-reload
systemctl enable cockroach
systemctl start cockroach
sleep 10
# Create backup configuration
echo -e "${YELLOW}[8/12] Creating backup configuration...${NC}"
cat > /etc/cockroach/backup-config << 'EOF'
#!/bin/bash
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_DEFAULT_REGION="us-east-1"
export BACKUP_BUCKET="cockroachdb-backups"
export CLUSTER_NAME="production"
export BACKUP_RETENTION_DAYS="30"
EOF
chown cockroach:cockroach /etc/cockroach/backup-config
chmod 600 /etc/cockroach/backup-config
# Create backup script
echo -e "${YELLOW}[9/12] Creating backup automation script...${NC}"
cat > /usr/local/bin/cockroach-backup << EOF
#!/bin/bash
set -euo pipefail
source /etc/cockroach/backup-config
TIMESTAMP=\$(date +%Y%m%d_%H%M%S)
BACKUP_PATH="s3://\${BACKUP_BUCKET}/\${CLUSTER_NAME}/\${TIMESTAMP}"
echo "\$(date): Starting backup to \${BACKUP_PATH}"
# Full backup
sudo -u cockroach cockroach sql --certs-dir=/etc/cockroach/certs --host=${NODE_IP}:26257 \\
--execute="BACKUP INTO '\${BACKUP_PATH}' WITH DETACHED;"
# Log backup completion
echo "\$(date): Backup completed successfully"
# Cleanup old backups
aws s3 ls "s3://\${BACKUP_BUCKET}/\${CLUSTER_NAME}/" | awk '\$1 < "'"\$(date -d "\${BACKUP_RETENTION_DAYS} days ago" +%Y-%m-%d)"'" {print \$4}' | \\
xargs -I {} aws s3 rm "s3://\${BACKUP_BUCKET}/\${CLUSTER_NAME}/{}" --recursive
EOF
chmod 755 /usr/local/bin/cockroach-backup
# Create systemd timer for backups
echo -e "${YELLOW}[10/12] Creating backup timer...${NC}"
cat > /etc/systemd/system/cockroach-backup.service << EOF
[Unit]
Description=CockroachDB Backup Service
Wants=network-online.target
After=network-online.target
[Service]
Type=oneshot
User=root
ExecStart=/usr/local/bin/cockroach-backup
StandardOutput=journal
StandardError=journal
SyslogIdentifier=cockroach-backup
EOF
cat > /etc/systemd/system/cockroach-backup.timer << EOF
[Unit]
Description=CockroachDB Backup Timer
Requires=cockroach-backup.service
[Timer]
OnCalendar=daily
Persistent=true
RandomizedDelaySec=3600
[Install]
WantedBy=timers.target
EOF
systemctl daemon-reload
systemctl enable cockroach-backup.timer
# Create monitoring script
echo -e "${YELLOW}[11/12] Creating monitoring script...${NC}"
cat > /usr/local/bin/cockroach-monitor << EOF
#!/bin/bash
set -euo pipefail
HEALTH_URL="http://${NODE_IP}:8080/health"
CLUSTER_URL="http://${NODE_IP}:8080/_status/cluster"
# Check node health
if ! curl -sf "\$HEALTH_URL" > /dev/null; then
echo "CRITICAL: CockroachDB node health check failed"
exit 2
fi
# Check cluster status
CLUSTER_STATUS=\$(curl -sf "\$CLUSTER_URL" | jq -r '.cluster_id // "unknown"')
if [[ "\$CLUSTER_STATUS" == "unknown" ]]; then
echo "WARNING: Could not retrieve cluster status"
exit 1
fi
echo "OK: CockroachDB cluster is healthy (ID: \$CLUSTER_STATUS)"
exit 0
EOF
chmod 755 /usr/local/bin/cockroach-monitor
# Configure firewall
echo -e "${YELLOW}[12/12] Configuring firewall...${NC}"
if command -v firewall-cmd >/dev/null 2>&1; then
firewall-cmd --permanent --add-port=26257/tcp --add-port=8080/tcp
firewall-cmd --reload
elif command -v ufw >/dev/null 2>&1; then
ufw allow 26257/tcp
ufw allow 8080/tcp
fi
# Verification
echo -e "${GREEN}[SUCCESS] CockroachDB installation completed!${NC}"
echo ""
echo "Next steps:"
echo "1. Configure backup credentials in /etc/cockroach/backup-config"
echo "2. Start backup timer: systemctl start cockroach-backup.timer"
echo "3. Initialize cluster from first node:"
echo " sudo -u cockroach cockroach init --certs-dir=/etc/cockroach/certs --host=${NODE_IP}:26257"
echo "4. Access web UI: https://${NODE_IP}:8080"
echo "5. Test monitoring: /usr/local/bin/cockroach-monitor"
echo ""
echo "Service status:"
systemctl status cockroach --no-pager -l
Review the script before running. Execute with: bash install.sh