Set up a production-ready CockroachDB cluster with SSL certificates, load balancing, and automated backups for distributed SQL workloads. This tutorial covers multi-node deployment, security hardening, and performance optimization for enterprise environments.
Prerequisites
- Root or sudo access
- At least 3 servers for production cluster
- 8GB RAM minimum per node
- Fast SSD storage recommended
- Stable network connectivity between nodes
What this solves
CockroachDB provides horizontally scalable, distributed SQL capabilities with automatic failover and strong consistency guarantees. This tutorial helps you deploy a production-ready CockroachDB cluster with SSL encryption, load balancing, and automated backup strategies for mission-critical applications requiring global distribution and high availability.
Step-by-step installation
Update system packages and install dependencies
Start by updating your package manager and installing required dependencies for CockroachDB cluster deployment.
sudo apt update && sudo apt upgrade -y
sudo apt install -y wget curl gnupg2 software-properties-common
Download and install CockroachDB binary
Download the latest CockroachDB release and install it system-wide with proper permissions.
cd /tmp
wget https://binaries.cockroachdb.com/cockroach-v23.2.4.linux-amd64.tgz
tar -xzf cockroach-v23.2.4.linux-amd64.tgz
sudo cp cockroach-v23.2.4.linux-amd64/cockroach /usr/local/bin/
sudo chmod 755 /usr/local/bin/cockroach
Create CockroachDB user and directories
Create a dedicated system user for CockroachDB and set up the required directory structure with secure permissions.
sudo useradd --system --shell /bin/false --home-dir /var/lib/cockroach cockroach
sudo mkdir -p /var/lib/cockroach/{data,certs,logs}
sudo mkdir -p /etc/cockroach
sudo chown -R cockroach:cockroach /var/lib/cockroach
sudo chmod 700 /var/lib/cockroach/data
sudo chmod 755 /var/lib/cockroach/certs
Generate SSL certificates for secure cluster communication
Create a Certificate Authority and generate SSL certificates for secure inter-node communication and client connections.
cd /var/lib/cockroach/certs
sudo -u cockroach /usr/local/bin/cockroach cert create-ca --certs-dir=/var/lib/cockroach/certs --ca-key=/var/lib/cockroach/certs/ca.key
sudo -u cockroach /usr/local/bin/cockroach cert create-node node1.example.com node2.example.com node3.example.com localhost 127.0.0.1 203.0.113.10 203.0.113.11 203.0.113.12 --certs-dir=/var/lib/cockroach/certs --ca-key=/var/lib/cockroach/certs/ca.key
sudo -u cockroach /usr/local/bin/cockroach cert create-client root --certs-dir=/var/lib/cockroach/certs --ca-key=/var/lib/cockroach/certs/ca.key
Configure CockroachDB cluster settings
Create the main configuration file with cluster settings, performance tuning, and security parameters.
# CockroachDB Cluster Configuration
[logging]
level = info
file-defaults:
dir = /var/lib/cockroach/logs
max-file-size = 100MiB
max-group-size = 1GiB
file-defaults:
format = crdb-v2-tty
[cluster]
name = production-cluster
organization = example-corp
[security]
max-offset = 500ms
allow-node-client-cert = true
[performance]
cache-size = 4GiB
max-sql-memory = 8GiB
max-disk-temp-storage = 32GiB
Create systemd service for cluster management
Set up a systemd service file for automatic startup, restart policies, and resource management.
[Unit]
Description=CockroachDB Distributed SQL Database
Requires=network.target
After=network.target
[Service]
Type=notify
User=cockroach
Group=cockroach
ExecStart=/usr/local/bin/cockroach start \
--certs-dir=/var/lib/cockroach/certs \
--store=/var/lib/cockroach/data \
--listen-addr=0.0.0.0:26257 \
--http-addr=0.0.0.0:8080 \
--join=node1.example.com:26257,node2.example.com:26257,node3.example.com:26257 \
--cache=4GiB \
--max-sql-memory=8GiB \
--background
ExecReload=/bin/kill -HUP $MAINPID
KillMode=control-group
KillSignal=SIGINT
TimeoutStopSec=60
Restart=always
RestartSec=10
SyslogIdentifier=cockroachdb
LimitNOFILE=65536
LimitMEMLOCK=infinity
[Install]
WantedBy=multi-user.target
Configure firewall rules for cluster communication
Open required ports for CockroachDB cluster communication and admin interface access.
sudo ufw allow 26257/tcp comment 'CockroachDB SQL'
sudo ufw allow 8080/tcp comment 'CockroachDB Admin UI'
sudo ufw allow from 203.0.113.0/24 to any port 26257
sudo ufw reload
Initialize the cluster on the first node
Start the CockroachDB service on the first node and initialize the cluster with security settings.
sudo systemctl daemon-reload
sudo systemctl enable cockroachdb
sudo systemctl start cockroachdb
sudo systemctl status cockroachdb
Initialize cluster database
Once the first node is running, initialize the cluster and create the initial database structure.
sudo -u cockroach /usr/local/bin/cockroach init \
--certs-dir=/var/lib/cockroach/certs \
--host=localhost:26257
Set up HAProxy load balancer
Configure HAProxy to distribute SQL connections across cluster nodes with health checks and failover support. This integrates with our HAProxy high availability setup.
sudo apt install -y haproxy
global
daemon
user haproxy
group haproxy
log stdout local0
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
defaults
mode tcp
timeout connect 5000ms
timeout client 50000ms
timeout server 50000ms
option dontlognull
option redispatch
retries 3
maxconn 3000
listen cockroachdb-sql
bind *:5432
mode tcp
option tcplog
option httpchk GET /health?ready=1
server node1 node1.example.com:26257 check port 8080 check-ssl verify none
server node2 node2.example.com:26257 check port 8080 check-ssl verify none
server node3 node3.example.com:26257 check port 8080 check-ssl verify none
listen cockroachdb-admin
bind *:8081
mode tcp
option tcplog
server node1 node1.example.com:8080 check
server node2 node2.example.com:8080 check backup
server node3 node3.example.com:8080 check backup
listen stats
bind *:1936
mode http
stats enable
stats uri /stats
stats refresh 30s
Create database users and security policies
Connect to the cluster and create application users with appropriate privileges and security constraints.
sudo -u cockroach /usr/local/bin/cockroach sql \
--certs-dir=/var/lib/cockroach/certs \
--host=localhost:26257 \
--execute="CREATE USER appuser WITH PASSWORD 'SecureP@ssw0rd123!';"
sudo -u cockroach /usr/local/bin/cockroach sql \
--certs-dir=/var/lib/cockroach/certs \
--host=localhost:26257 \
--execute="CREATE DATABASE production;"
sudo -u cockroach /usr/local/bin/cockroach sql \
--certs-dir=/var/lib/cockroach/certs \
--host=localhost:26257 \
--execute="GRANT ALL ON DATABASE production TO appuser;"
sudo -u cockroach /usr/local/bin/cockroach sql \
--certs-dir=/var/lib/cockroach/certs \
--host=localhost:26257 \
--execute="SET CLUSTER SETTING server.host_based_authentication.configuration = 'host all all 0.0.0.0/0 cert-password';"
Configure automated backup system
Set up automated daily backups with retention policies and monitoring. This works with our backup automation strategies.
#!/bin/bash
CockroachDB Backup Script
set -euo pipefail
BACKUP_DIR="/var/backups/cockroach"
LOG_FILE="/var/log/cockroach-backup.log"
RETENTION_DAYS=30
DATE=$(date +"%Y-%m-%d_%H-%M-%S")
Create backup directory
mkdir -p "$BACKUP_DIR"
Function to log messages
log_message() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "$LOG_FILE"
}
log_message "Starting CockroachDB backup"
Perform backup
if /usr/local/bin/cockroach dump production \
--certs-dir=/var/lib/cockroach/certs \
--host=localhost:26257 > "$BACKUP_DIR/backup_$DATE.sql"; then
log_message "Backup completed successfully: backup_$DATE.sql"
# Compress backup
gzip "$BACKUP_DIR/backup_$DATE.sql"
log_message "Backup compressed: backup_$DATE.sql.gz"
else
log_message "ERROR: Backup failed"
exit 1
fi
Clean old backups
find "$BACKUP_DIR" -name "backup_*.sql.gz" -mtime +$RETENTION_DAYS -delete
log_message "Cleaned backups older than $RETENTION_DAYS days"
log_message "Backup process completed"
sudo chmod 755 /usr/local/bin/cockroach-backup.sh
sudo chown cockroach:cockroach /usr/local/bin/cockroach-backup.sh
Create systemd timer for automated backups
Set up automated backup scheduling with systemd timers and proper logging.
[Unit]
Description=CockroachDB Backup Service
Wants=cockroachdb.service
After=cockroachdb.service
[Service]
Type=oneshot
User=cockroach
Group=cockroach
ExecStart=/usr/local/bin/cockroach-backup.sh
StandardOutput=append:/var/log/cockroach-backup.log
StandardError=append:/var/log/cockroach-backup.log
[Unit]
Description=Run CockroachDB backup daily
Requires=cockroach-backup.service
[Timer]
OnCalendar=daily
RandomizedDelaySec=1800
Persistent=true
[Install]
WantedBy=timers.target
sudo systemctl daemon-reload
sudo systemctl enable cockroach-backup.timer
sudo systemctl start cockroach-backup.timer
Configure cluster monitoring and alerting
Set up comprehensive monitoring with custom metrics collection for cluster health and performance tracking.
#!/bin/bash
CockroachDB Health Check Script
set -euo pipefail
LOG_FILE="/var/log/cockroach-health.log"
ALERT_THRESHOLD_CPU=80
ALERT_THRESHOLD_MEMORY=85
log_message() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "$LOG_FILE"
}
Check cluster status
if ! /usr/local/bin/cockroach node status \
--certs-dir=/var/lib/cockroach/certs \
--host=localhost:26257 &>/dev/null; then
log_message "ALERT: Cluster health check failed"
exit 1
fi
Check individual node health
NODE_STATUS=$(/usr/local/bin/cockroach sql \
--certs-dir=/var/lib/cockroach/certs \
--host=localhost:26257 \
--execute="SELECT node_id, address, is_available FROM crdb_internal.gossip_nodes;" \
--format=csv)
log_message "Node status check: $NODE_STATUS"
Check replication status
REPL_STATUS=$(/usr/local/bin/cockroach sql \
--certs-dir=/var/lib/cockroach/certs \
--host=localhost:26257 \
--execute="SELECT range_id, replicas FROM crdb_internal.ranges WHERE database_name='production' LIMIT 5;" \
--format=csv)
log_message "Replication status: $REPL_STATUS"
log_message "Health check completed successfully"
sudo chmod 755 /usr/local/bin/cockroach-health-check.sh
sudo chown cockroach:cockroach /usr/local/bin/cockroach-health-check.sh
Verify your setup
Test cluster connectivity, replication, and performance to ensure proper deployment.
# Check cluster status
sudo -u cockroach /usr/local/bin/cockroach node status \
--certs-dir=/var/lib/cockroach/certs \
--host=localhost:26257
Verify database connectivity
sudo -u cockroach /usr/local/bin/cockroach sql \
--certs-dir=/var/lib/cockroach/certs \
--host=localhost:26257 \
--execute="SHOW DATABASES;"
Test load balancer
curl -k https://localhost:8081/health
Check backup timer status
sudo systemctl status cockroach-backup.timer
Verify cluster replication
sudo -u cockroach /usr/local/bin/cockroach sql \
--certs-dir=/var/lib/cockroach/certs \
--host=localhost:26257 \
--execute="SELECT count(*) FROM crdb_internal.ranges;"
Test performance with sample data
sudo -u cockroach /usr/local/bin/cockroach workload init bank \
--certs-dir=/var/lib/cockroach/certs \
--host=localhost:26257
Common issues
| Symptom | Cause | Fix |
|---|---|---|
| Node won't join cluster | Certificate mismatch or network connectivity | Verify certificates copied correctly and firewall rules allow port 26257 |
| SSL connection errors | Certificate permissions or missing client certificates | Check cert permissions (644) and ensure client certificates exist |
| High memory usage | Cache settings too aggressive | Adjust --cache and --max-sql-memory parameters in systemd service |
| Backup script fails | Insufficient disk space or permission issues | Check /var/backups space and verify cockroach user ownership |
| HAProxy health checks failing | Admin UI not accessible or SSL verification issues | Verify port 8080 accessible and adjust check-ssl verify settings |
| Cluster split-brain | Network partition between nodes | Ensure odd number of nodes (3,5,7) and stable network connectivity |
Next steps
Automated install script
Run this to automate the entire setup
#!/usr/bin/env bash
set -euo pipefail
# Color codes
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'
# Default values
COCKROACH_VERSION="23.2.4"
CLUSTER_NAME="${1:-production-cluster}"
NODE_HOSTNAME="${2:-$(hostname -f)}"
JOIN_NODES="${3:-}"
usage() {
echo "Usage: $0 [cluster_name] [node_hostname] [join_nodes]"
echo "Example: $0 production-cluster node1.example.com 'node1.example.com:26257,node2.example.com:26257,node3.example.com:26257'"
exit 1
}
log() {
echo -e "${GREEN}[INFO]${NC} $1"
}
warn() {
echo -e "${YELLOW}[WARN]${NC} $1"
}
error() {
echo -e "${RED}[ERROR]${NC} $1" >&2
}
cleanup() {
if [ $? -ne 0 ]; then
error "Installation failed. Cleaning up..."
systemctl stop cockroachdb 2>/dev/null || true
systemctl disable cockroachdb 2>/dev/null || true
rm -f /etc/systemd/system/cockroachdb.service
rm -f /usr/local/bin/cockroach
userdel cockroach 2>/dev/null || true
rm -rf /var/lib/cockroach
rm -rf /etc/cockroach
fi
}
trap cleanup ERR
# Check if running as root
if [[ $EUID -ne 0 ]]; then
error "This script must be run as root"
exit 1
fi
# Detect distribution
if [ -f /etc/os-release ]; then
. /etc/os-release
case "$ID" in
ubuntu|debian)
PKG_MGR="apt"
PKG_UPDATE="apt update && apt upgrade -y"
PKG_INSTALL="apt install -y"
FIREWALL_CMD="ufw"
;;
almalinux|rocky|centos|rhel|ol|fedora)
PKG_MGR="dnf"
PKG_UPDATE="dnf update -y"
PKG_INSTALL="dnf install -y"
FIREWALL_CMD="firewall-cmd"
;;
amzn)
PKG_MGR="yum"
PKG_UPDATE="yum update -y"
PKG_INSTALL="yum install -y"
FIREWALL_CMD="firewall-cmd"
;;
*)
error "Unsupported distribution: $ID"
exit 1
;;
esac
else
error "Cannot detect distribution"
exit 1
fi
log "[1/8] Updating system packages and installing dependencies"
$PKG_UPDATE
$PKG_INSTALL wget curl gnupg2
log "[2/8] Downloading and installing CockroachDB binary"
cd /tmp
wget -O "cockroach-v${COCKROACH_VERSION}.linux-amd64.tgz" \
"https://binaries.cockroachdb.com/cockroach-v${COCKROACH_VERSION}.linux-amd64.tgz"
tar -xzf "cockroach-v${COCKROACH_VERSION}.linux-amd64.tgz"
cp "cockroach-v${COCKROACH_VERSION}.linux-amd64/cockroach" /usr/local/bin/
chmod 755 /usr/local/bin/cockroach
rm -rf "cockroach-v${COCKROACH_VERSION}.linux-amd64"*
log "[3/8] Creating CockroachDB user and directories"
if ! id cockroach &>/dev/null; then
useradd --system --shell /bin/false --home-dir /var/lib/cockroach cockroach
fi
mkdir -p /var/lib/cockroach/{data,certs,logs}
mkdir -p /etc/cockroach
chown -R cockroach:cockroach /var/lib/cockroach
chmod 750 /var/lib/cockroach/data
chmod 755 /var/lib/cockroach/certs
chmod 755 /var/lib/cockroach/logs
log "[4/8] Generating SSL certificates"
cd /var/lib/cockroach/certs
sudo -u cockroach /usr/local/bin/cockroach cert create-ca \
--certs-dir=/var/lib/cockroach/certs \
--ca-key=/var/lib/cockroach/certs/ca.key
sudo -u cockroach /usr/local/bin/cockroach cert create-node \
"$NODE_HOSTNAME" localhost 127.0.0.1 \
--certs-dir=/var/lib/cockroach/certs \
--ca-key=/var/lib/cockroach/certs/ca.key
sudo -u cockroach /usr/local/bin/cockroach cert create-client root \
--certs-dir=/var/lib/cockroach/certs \
--ca-key=/var/lib/cockroach/certs/ca.key
log "[5/8] Creating CockroachDB configuration"
cat > /etc/cockroach/cockroach.conf << 'EOF'
[logging]
level = info
file-defaults:
dir = /var/lib/cockroach/logs
max-file-size = 100MiB
max-group-size = 1GiB
[cluster]
name = production-cluster
organization = example-corp
[security]
max-offset = 500ms
allow-node-client-cert = true
EOF
chown cockroach:cockroach /etc/cockroach/cockroach.conf
chmod 644 /etc/cockroach/cockroach.conf
log "[6/8] Creating systemd service"
JOIN_FLAG=""
if [ -n "$JOIN_NODES" ]; then
JOIN_FLAG="--join=$JOIN_NODES"
fi
cat > /etc/systemd/system/cockroachdb.service << EOF
[Unit]
Description=CockroachDB Distributed SQL Database
Requires=network.target
After=network.target
[Service]
Type=notify
User=cockroach
Group=cockroach
ExecStart=/usr/local/bin/cockroach start \\
--certs-dir=/var/lib/cockroach/certs \\
--store=/var/lib/cockroach/data \\
--listen-addr=0.0.0.0:26257 \\
--http-addr=0.0.0.0:8080 \\
$JOIN_FLAG \\
--cache=25%% \\
--max-sql-memory=25%% \\
--background
ExecReload=/bin/kill -HUP \$MAINPID
KillMode=control-group
KillSignal=SIGINT
TimeoutStopSec=60
Restart=always
RestartSec=10
SyslogIdentifier=cockroachdb
LimitNOFILE=65536
LimitMEMLOCK=infinity
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable cockroachdb
log "[7/8] Configuring firewall"
case "$FIREWALL_CMD" in
"ufw")
if command -v ufw >/dev/null; then
ufw allow 26257/tcp comment "CockroachDB cluster"
ufw allow 8080/tcp comment "CockroachDB web UI"
fi
;;
"firewall-cmd")
if command -v firewall-cmd >/dev/null && systemctl is-active firewalld >/dev/null; then
firewall-cmd --permanent --add-port=26257/tcp
firewall-cmd --permanent --add-port=8080/tcp
firewall-cmd --reload
fi
;;
esac
log "[8/8] Starting CockroachDB service"
systemctl start cockroachdb
# Verification
log "Verifying installation..."
sleep 5
if systemctl is-active cockroachdb >/dev/null; then
log "✓ CockroachDB service is running"
else
error "✗ CockroachDB service failed to start"
systemctl status cockroachdb --no-pager
exit 1
fi
if /usr/local/bin/cockroach version >/dev/null 2>&1; then
log "✓ CockroachDB binary is functional"
else
error "✗ CockroachDB binary is not working"
exit 1
fi
log "CockroachDB installation completed successfully!"
log "Node hostname: $NODE_HOSTNAME"
log "Web UI: https://$NODE_HOSTNAME:8080"
log "SQL port: 26257"
if [ -z "$JOIN_NODES" ]; then
warn "No join nodes specified. To initialize cluster, run:"
warn "/usr/local/bin/cockroach init --certs-dir=/var/lib/cockroach/certs --host=$NODE_HOSTNAME:26257"
fi
log "Certificate files are in /var/lib/cockroach/certs/"
log "Copy ca.crt and client.root.* to other nodes for cluster setup"
Review the script before running. Execute with: bash install.sh