Implement ClickHouse backup automation with compression and S3 integration

Intermediate 45 min May 18, 2026 127 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Set up automated ClickHouse backups with compression, S3 storage, and systemd timers. Includes monitoring, encryption, and recovery procedures for production environments.

Prerequisites

  • ClickHouse server installed and running
  • S3-compatible storage bucket
  • Root or sudo access

What this solves

This tutorial sets up automated ClickHouse backups with compression and S3 integration. You'll configure systemd timers to run regular backups, compress data to reduce storage costs, and store backups securely in S3-compatible storage with monitoring and alerting.

Step-by-step configuration

Install backup dependencies

Install required packages for backup operations, compression, and S3 integration.

sudo apt update
sudo apt install -y awscli gzip lz4 pigz clickhouse-backup
sudo dnf install -y awscli gzip lz4 pigz
wget https://github.com/Altinity/clickhouse-backup/releases/latest/download/clickhouse-backup-linux-amd64.tar.gz
sudo tar -xzf clickhouse-backup-linux-amd64.tar.gz -C /usr/local/bin/
sudo chmod +x /usr/local/bin/clickhouse-backup

Configure S3 credentials

Set up AWS credentials and S3 bucket access for backup storage. Replace with your actual S3 credentials and bucket details.

sudo mkdir -p /etc/clickhouse-backup
sudo aws configure set aws_access_key_id YOUR_ACCESS_KEY_ID
sudo aws configure set aws_secret_access_key YOUR_SECRET_ACCESS_KEY
sudo aws configure set default.region us-east-1
Security: Store credentials securely. Consider using IAM roles or credential files with restricted permissions instead of environment variables in production.

Create backup configuration

Configure clickhouse-backup with S3 settings, compression options, and retention policies.

general:
  remote_storage: s3
  max_file_size: 1073741824
  disable_progress_bar: true
  backups_to_keep_local: 3
  backups_to_keep_remote: 30
  log_level: info

clickhouse:
  username: default
  password: ""
  host: localhost
  port: 9000
  data_path: /var/lib/clickhouse
  skip_tables:
    - system.*
    - INFORMATION_SCHEMA.*
  timeout: 5m

s3:
  access_key: YOUR_ACCESS_KEY_ID
  secret_key: YOUR_SECRET_ACCESS_KEY
  bucket: your-clickhouse-backups
  endpoint: s3.amazonaws.com
  region: us-east-1
  acl: private
  force_path_style: false
  path: clickhouse-backups/
  disable_ssl: false
  part_size: 104857600
  storage_class: STANDARD_IA

compression:
  format: lz4
  level: 1

encryption:
  type: AES256
  key: ""
  key_id: ""

Set secure file permissions

Protect the backup configuration file containing sensitive credentials.

sudo chown clickhouse:clickhouse /etc/clickhouse-backup/config.yml
sudo chmod 600 /etc/clickhouse-backup/config.yml

Create backup script

Create a comprehensive backup script that handles full and incremental backups with error handling and logging.

#!/bin/bash

ClickHouse Backup Script with S3 Integration

set -euo pipefail

Configuration

BACKUP_NAME="backup-$(date +%Y%m%d-%H%M%S)" LOG_FILE="/var/log/clickhouse-backup.log" CONFIG_FILE="/etc/clickhouse-backup/config.yml" RETENTION_DAYS=30 TIMEOUT=3600

Logging function

log() { echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" | tee -a "$LOG_FILE" }

Error handling

error_exit() { log "ERROR: $1" exit 1 }

Check if ClickHouse is running

if ! systemctl is-active --quiet clickhouse-server; then error_exit "ClickHouse server is not running" fi log "Starting ClickHouse backup: $BACKUP_NAME"

Create local backup

log "Creating local backup..." timeout $TIMEOUT clickhouse-backup -c "$CONFIG_FILE" create "$BACKUP_NAME" || error_exit "Failed to create local backup"

Upload to S3

log "Uploading backup to S3..." timeout $TIMEOUT clickhouse-backup -c "$CONFIG_FILE" upload "$BACKUP_NAME" || error_exit "Failed to upload backup to S3"

Verify backup integrity

log "Verifying backup integrity..." clickhous-backup -c "$CONFIG_FILE" list remote | grep "$BACKUP_NAME" || error_exit "Backup verification failed"

Clean up old backups

log "Cleaning up old backups..." clickhouse-backup -c "$CONFIG_FILE" delete local --age "${RETENTION_DAYS}d" || log "Warning: Failed to clean local backups" clickhouse-backup -c "$CONFIG_FILE" delete remote --age "${RETENTION_DAYS}d" || log "Warning: Failed to clean remote backups"

Get backup size information

BACKUP_SIZE=$(clickhouse-backup -c "$CONFIG_FILE" list local | grep "$BACKUP_NAME" | awk '{print $3}' || echo "Unknown") log "Backup completed successfully. Size: $BACKUP_SIZE"

Send metrics to monitoring (optional)

if command -v curl >/dev/null 2>&1 && [ -n "${WEBHOOK_URL:-}" ]; then curl -X POST "$WEBHOOK_URL" -d "backup_completed=1&backup_name=$BACKUP_NAME&backup_size=$BACKUP_SIZE" || log "Warning: Failed to send metrics" fi log "Backup process finished successfully"

Make backup script executable

Set appropriate permissions for the backup script and create log directory.

sudo chmod +x /usr/local/bin/clickhouse-backup.sh
sudo mkdir -p /var/log
sudo touch /var/log/clickhouse-backup.log
sudo chown clickhouse:clickhouse /var/log/clickhouse-backup.log
sudo chmod 644 /var/log/clickhouse-backup.log

Create systemd service

Create a systemd service unit for running backups with proper isolation and resource limits.

[Unit]
Description=ClickHouse Backup Service
After=clickhouse-server.service
Requires=clickhouse-server.service
Wants=network-online.target
After=network-online.target

[Service]
Type=oneshot
User=clickhouse
Group=clickhouse
ExecStart=/usr/local/bin/clickhouse-backup.sh
TimeoutStartSec=7200
PrivateTmp=yes
ProtectSystem=strict
ProtectHome=yes
NoNewPrivileges=yes
ReadWritePaths=/var/lib/clickhouse /var/log /tmp
Environment=HOME=/var/lib/clickhouse
WorkingDirectory=/var/lib/clickhouse

[Install]
WantedBy=multi-user.target

Create systemd timer

Configure systemd timer for automated daily backups with randomized execution to avoid peak hours.

[Unit]
Description=Run ClickHouse Backup Daily
Requires=clickhouse-backup.service

[Timer]
OnCalendar=daily
RandomizedDelaySec=3600
Persistent=true
AccuracySec=1h

[Install]
WantedBy=timers.target

Enable and start the backup timer

Enable the systemd timer to start automatic backups and check the schedule.

sudo systemctl daemon-reload
sudo systemctl enable clickhouse-backup.timer
sudo systemctl start clickhouse-backup.timer
sudo systemctl list-timers clickhouse-backup.timer

Create recovery script

Create a script for easy backup recovery and restoration procedures.

#!/bin/bash

ClickHouse Restore Script

set -euo pipefail BACKUP_NAME="$1" CONFIG_FILE="/etc/clickhouse-backup/config.yml" LOG_FILE="/var/log/clickhouse-restore.log" if [ $# -eq 0 ]; then echo "Usage: $0 " echo "Available backups:" clickhouse-backup -c "$CONFIG_FILE" list remote exit 1 fi log() { echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" | tee -a "$LOG_FILE" } log "Starting restore from backup: $BACKUP_NAME"

Download backup from S3

log "Downloading backup from S3..." clickhouse-backup -c "$CONFIG_FILE" download "$BACKUP_NAME"

Stop ClickHouse for restoration

log "Stopping ClickHouse server..." sudo systemctl stop clickhouse-server

Restore backup

log "Restoring backup..." clickhouse-backup -c "$CONFIG_FILE" restore "$BACKUP_NAME"

Start ClickHouse

log "Starting ClickHouse server..." sudo systemctl start clickhouse-server

Wait for server to be ready

log "Waiting for ClickHouse to be ready..." for i in {1..30}; do if clickhouse-client --query "SELECT 1" >/dev/null 2>&1; then log "ClickHouse server is ready" break fi sleep 5 done log "Restore completed successfully"

Make recovery script executable

Set permissions for the recovery script and create its log file.

sudo chmod +x /usr/local/bin/clickhouse-restore.sh
sudo touch /var/log/clickhouse-restore.log
sudo chown clickhouse:clickhouse /var/log/clickhouse-restore.log
sudo chmod 644 /var/log/clickhouse-restore.log

Configure backup monitoring

Set up monitoring script to check backup status and send alerts for failures.

#!/bin/bash

ClickHouse Backup Monitoring Script

set -euo pipefail CONFIG_FILE="/etc/clickhouse-backup/config.yml" LOG_FILE="/var/log/clickhouse-backup.log" ALERT_EMAIL="admin@example.com" MAX_BACKUP_AGE_HOURS=25

Check if backup ran successfully in the last 25 hours

LAST_BACKUP=$(clickhouse-backup -c "$CONFIG_FILE" list local | head -n1 | awk '{print $1}' || echo "") if [ -z "$LAST_BACKUP" ]; then echo "ERROR: No backups found" | mail -s "ClickHouse Backup Alert: No backups found" "$ALERT_EMAIL" exit 1 fi

Check backup age

BACKUP_TIME=$(echo "$LAST_BACKUP" | grep -o '[0-9]\{8\}-[0-9]\{6\}' || echo "") if [ -n "$BACKUP_TIME" ]; then BACKUP_TIMESTAMP=$(date -d "${BACKUP_TIME:0:8} ${BACKUP_TIME:9:2}:${BACKUP_TIME:11:2}:${BACKUP_TIME:13:2}" +%s) CURRENT_TIMESTAMP=$(date +%s) AGE_HOURS=$(( (CURRENT_TIMESTAMP - BACKUP_TIMESTAMP) / 3600 )) if [ $AGE_HOURS -gt $MAX_BACKUP_AGE_HOURS ]; then echo "ERROR: Last backup is $AGE_HOURS hours old (max allowed: $MAX_BACKUP_AGE_HOURS)" | \ mail -s "ClickHouse Backup Alert: Backup too old" "$ALERT_EMAIL" exit 1 fi fi

Check for recent errors in log

if tail -n 100 "$LOG_FILE" | grep -q "ERROR"; then echo "Recent errors found in backup log:" > /tmp/backup_errors.txt tail -n 100 "$LOG_FILE" | grep "ERROR" >> /tmp/backup_errors.txt mail -s "ClickHouse Backup Alert: Errors detected" "$ALERT_EMAIL" < /tmp/backup_errors.txt rm -f /tmp/backup_errors.txt fi echo "Backup monitoring check completed successfully"

Install mail utility and configure monitoring cron

Install mail utilities for alerting and set up monitoring checks.

sudo apt install -y mailutils postfix
sudo dnf install -y mailx postfix
sudo chmod +x /usr/local/bin/clickhouse-backup-monitor.sh
sudo systemctl enable --now postfix

Add monitoring to crontab (runs every 4 hours)

echo "0 /4 /usr/local/bin/clickhouse-backup-monitor.sh" | sudo crontab -u clickhouse -

Verify your setup

Test the backup system and verify all components are working correctly.

# Test backup configuration
sudo clickhouse-backup -c /etc/clickhouse-backup/config.yml list local

Run a manual backup test

sudo systemctl start clickhouse-backup.service sudo systemctl status clickhouse-backup.service

Check backup logs

sudo tail -f /var/log/clickhouse-backup.log

Verify S3 upload

aws s3 ls s3://your-clickhouse-backups/clickhouse-backups/ --recursive

Check timer status

sudo systemctl status clickhouse-backup.timer sudo systemctl list-timers clickhouse-backup.timer

Test monitoring script

sudo -u clickhouse /usr/local/bin/clickhouse-backup-monitor.sh

Common issues

SymptomCauseFix
Permission denied errorsIncorrect file ownershipsudo chown -R clickhouse:clickhouse /etc/clickhouse-backup /var/log/clickhouse-*.log
S3 upload failsInvalid credentials or bucket policyTest with aws s3 ls s3://your-bucket and check IAM permissions
Backup timeoutLarge database or slow storageIncrease timeout in service file and backup script
ClickHouse connection failsServer not running or wrong credentialsCheck service status and verify connection settings in config
Timer not runningTimer not enabledsudo systemctl enable clickhouse-backup.timer && sudo systemctl start clickhouse-backup.timer
Compression failsMissing compression utilitiesInstall lz4 or gzip packages for your distribution

Next steps

Running this in production?

Want this handled for you? Setting this up once is straightforward. Keeping it patched, monitored, backed up and performant across environments is the harder part. See how we run infrastructure like this for European teams.

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle high availability infrastructure for businesses that depend on uptime. From initial setup to ongoing operations.