Implement TimescaleDB backup strategies and point-in-time recovery with automated failover

Advanced 45 min Apr 08, 2026 38 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Configure comprehensive TimescaleDB backup solutions with continuous WAL archiving, automated point-in-time recovery procedures, and monitoring systems to ensure data protection and business continuity.

Prerequisites

  • Root or sudo access
  • PostgreSQL 16+ installed
  • At least 20GB free disk space
  • Basic PostgreSQL administration knowledge

What this solves

TimescaleDB requires specialized backup strategies that account for its time-series data patterns and hypertable architecture. This tutorial implements production-grade continuous backup with Write-Ahead Log (WAL) archiving, automated point-in-time recovery (PITR) capabilities, and comprehensive monitoring to prevent data loss and minimize recovery time objectives.

Step-by-step configuration

Install TimescaleDB and backup dependencies

Install TimescaleDB extension and PostgreSQL backup utilities for comprehensive backup management.

curl -s https://packagecloud.io/install/repositories/timescale/timescaledb/script.deb.sh | sudo bash
sudo apt update
sudo apt install -y timescaledb-2-postgresql-16 postgresql-client-common pigz pv
sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-9-x86_64/pgdg-redhat-repo-latest.noarch.rpm
sudo dnf install -y timescaledb_16 postgresql16-server pigz pv

Configure PostgreSQL for WAL archiving

Enable continuous WAL archiving to support point-in-time recovery and streaming replication.

# WAL configuration for continuous archiving
wal_level = replica
max_wal_senders = 3
max_replication_slots = 3
archive_mode = on
archive_command = 'test ! -f /var/lib/postgresql/wal_archive/%f && cp %p /var/lib/postgresql/wal_archive/%f'
archive_timeout = 300

Checkpoint and WAL settings

checkpoint_timeout = 5min max_wal_size = 2GB min_wal_size = 80MB checkpoint_completion_target = 0.9

Logging for backup monitoring

log_destination = 'csvlog' logging_collector = on log_directory = '/var/log/postgresql' log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' log_rotation_age = 1d log_rotation_size = 100MB

Create WAL archive directory structure

Set up directories for WAL archives and backups with proper permissions for the postgres user.

sudo mkdir -p /var/lib/postgresql/wal_archive
sudo mkdir -p /var/lib/postgresql/backups
sudo mkdir -p /var/lib/postgresql/pitr_recovery
sudo chown -R postgres:postgres /var/lib/postgresql/wal_archive
sudo chown -R postgres:postgres /var/lib/postgresql/backups
sudo chown -R postgres:postgres /var/lib/postgresql/pitr_recovery
sudo chmod 750 /var/lib/postgresql/wal_archive
sudo chmod 750 /var/lib/postgresql/backups

Configure TimescaleDB backup script

Create a comprehensive backup script that handles base backups with compression and metadata logging.

#!/bin/bash

TimescaleDB Backup Script with Compression and Validation

set -euo pipefail

Configuration

BACKUP_DIR="/var/lib/postgresql/backups" WAL_ARCHIVE_DIR="/var/lib/postgresql/wal_archive" DB_NAME="${1:-timescaledb}" RETENTION_DAYS=7 COMPRESSION_LEVEL=6 TIMESTAMP=$(date +%Y%m%d_%H%M%S) BACKUP_NAME="${DB_NAME}_backup_${TIMESTAMP}" LOG_FILE="/var/log/postgresql/backup_${TIMESTAMP}.log"

Logging function

log() { echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" | tee -a "$LOG_FILE" }

Error handling

trap 'log "ERROR: Backup failed at line $LINENO"' ERR log "Starting TimescaleDB backup: $BACKUP_NAME"

Create backup directory

mkdir -p "${BACKUP_DIR}/${BACKUP_NAME}"

Perform base backup with compression

log "Creating base backup with pg_basebackup" pg_basebackup -D "${BACKUP_DIR}/${BACKUP_NAME}" \ -Ft -z -Z$COMPRESSION_LEVEL \ -P -v -W \ -X stream \ --label="TimescaleDB_backup_${TIMESTAMP}" 2>&1 | tee -a "$LOG_FILE"

Create backup metadata

log "Creating backup metadata" cat > "${BACKUP_DIR}/${BACKUP_NAME}/backup_info.txt" << EOF Backup Name: $BACKUP_NAME Database: $DB_NAME Timestamp: $TIMESTAMP Backup Type: Full Base Backup Compression: Level $COMPRESSION_LEVEL WAL Archive Location: $WAL_ARCHIVE_DIR Backup Size: $(du -sh "${BACKUP_DIR}/${BACKUP_NAME}" | cut -f1) PostgreSQL Version: $(psql -t -c "SELECT version();" | head -1) TimescaleDB Version: $(psql -d $DB_NAME -t -c "SELECT extversion FROM pg_extension WHERE extname='timescaledb';" | head -1) EOF

Validate backup integrity

log "Validating backup integrity" if [ -f "${BACKUP_DIR}/${BACKUP_NAME}/base.tar.gz" ]; then tar -tzf "${BACKUP_DIR}/${BACKUP_NAME}/base.tar.gz" >/dev/null 2>&1 log "Base backup validation: PASSED" else log "ERROR: Base backup file not found" exit 1 fi

Cleanup old backups

log "Cleaning up backups older than $RETENTION_DAYS days" find "$BACKUP_DIR" -name "${DB_NAME}_backup_*" -type d -mtime +$RETENTION_DAYS -exec rm -rf {} +

Cleanup old WAL archives (keep 2x retention period)

find "$WAL_ARCHIVE_DIR" -name ".gz" -type f -mtime +$((RETENTION_DAYS 2)) -delete log "Backup completed successfully: $BACKUP_NAME" log "Total backup size: $(du -sh "${BACKUP_DIR}/${BACKUP_NAME}" | cut -f1)" exit 0
sudo chmod 755 /usr/local/bin/timescale-backup.sh
sudo chown postgres:postgres /usr/local/bin/timescale-backup.sh

Implement point-in-time recovery script

Create an automated PITR script that can restore to any point in time within the WAL archive retention period.

#!/bin/bash

TimescaleDB Point-in-Time Recovery Script

set -euo pipefail

Configuration

BACKUP_DIR="/var/lib/postgresql/backups" WAL_ARCHIVE_DIR="/var/lib/postgresql/wal_archive" RECOVERY_DIR="/var/lib/postgresql/pitr_recovery" POSTGRES_DATA_DIR="/var/lib/postgresql/16/main" TARGET_TIME="${1:-}" BACKUP_NAME="${2:-}" RECOVERY_TARGET_ACTION="promote"

Validation

if [[ -z "$TARGET_TIME" ]]; then echo "Usage: $0 'YYYY-MM-DD HH:MM:SS' [backup_name]" echo "Example: $0 '2024-01-15 14:30:00'" exit 1 fi

Get latest backup if not specified

if [[ -z "$BACKUP_NAME" ]]; then BACKUP_NAME=$(find "$BACKUP_DIR" -name "_backup_" -type d | sort | tail -1 | basename) if [[ -z "$BACKUP_NAME" ]]; then echo "ERROR: No backups found in $BACKUP_DIR" exit 1 fi fi echo "Starting Point-in-Time Recovery" echo "Target Time: $TARGET_TIME" echo "Using Backup: $BACKUP_NAME" echo "Recovery Directory: $RECOVERY_DIR"

Stop PostgreSQL service

echo "Stopping PostgreSQL service" sudo systemctl stop postgresql

Clean recovery directory

echo "Preparing recovery directory" rm -rf "$RECOVERY_DIR"/* mkdir -p "$RECOVERY_DIR"

Extract base backup

echo "Extracting base backup" cd "$RECOVERY_DIR" tar -xzf "${BACKUP_DIR}/${BACKUP_NAME}/base.tar.gz" if [ -f "${BACKUP_DIR}/${BACKUP_NAME}/pg_wal.tar.gz" ]; then mkdir -p pg_wal tar -xzf "${BACKUP_DIR}/${BACKUP_NAME}/pg_wal.tar.gz" -C pg_wal fi

Create recovery configuration

echo "Creating recovery configuration" cat > "$RECOVERY_DIR/postgresql.auto.conf" << EOF

Point-in-Time Recovery Configuration

restore_command = 'cp $WAL_ARCHIVE_DIR/%f %p' recovery_target_time = '$TARGET_TIME' recovery_target_action = '$RECOVERY_TARGET_ACTION' EOF

Create recovery signal file

touch "$RECOVERY_DIR/recovery.signal"

Set proper permissions

chown -R postgres:postgres "$RECOVERY_DIR" chmod 700 "$RECOVERY_DIR" echo "Recovery setup completed" echo "To complete recovery:" echo "1. Backup current data directory: sudo mv $POSTGRES_DATA_DIR ${POSTGRES_DATA_DIR}.backup" echo "2. Move recovery directory: sudo mv $RECOVERY_DIR $POSTGRES_DATA_DIR" echo "3. Start PostgreSQL: sudo systemctl start postgresql" echo "4. Monitor recovery in PostgreSQL logs" exit 0
sudo chmod 755 /usr/local/bin/timescale-pitr.sh
sudo chown postgres:postgres /usr/local/bin/timescale-pitr.sh

Configure automated backup scheduling

Set up systemd service and timer for automated daily backups with proper logging and error handling.

[Unit]
Description=TimescaleDB Automated Backup Service
After=postgresql.service
Requires=postgresql.service

[Service]
Type=oneshot
User=postgres
Group=postgres
ExecStart=/usr/local/bin/timescale-backup.sh timescaledb
Environment=PGUSER=postgres
Environment=PGDATABASE=timescaledb
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target
[Unit]
Description=TimescaleDB Backup Timer
Requires=timescale-backup.service

[Timer]
OnCalendar=--* 02:00:00
Persistent=true
RandomizedDelaySec=300

[Install]
WantedBy=timers.target
sudo systemctl daemon-reload
sudo systemctl enable timescale-backup.timer
sudo systemctl start timescale-backup.timer

Implement backup monitoring and alerting

Create monitoring script to validate backup integrity and send alerts on failures.

#!/bin/bash

TimescaleDB Backup Monitoring Script

set -euo pipefail BACKUP_DIR="/var/lib/postgresql/backups" WAL_ARCHIVE_DIR="/var/lib/postgresql/wal_archive" MAX_BACKUP_AGE_HOURS=26 # Allow up to 26 hours for daily backups MAX_WAL_AGE_MINUTES=10 # WAL should be archived within 10 minutes ALERT_EMAIL="admin@example.com" SMTP_SERVER="localhost"

Check latest backup age

check_backup_age() { local latest_backup latest_backup=$(find "$BACKUP_DIR" -name "_backup_" -type d -printf '%T@ %p\n' | sort -n | tail -1 | cut -d' ' -f2-) if [[ -z "$latest_backup" ]]; then echo "CRITICAL: No backups found in $BACKUP_DIR" return 1 fi local backup_time backup_time=$(stat -c %Y "$latest_backup") local current_time current_time=$(date +%s) local age_hours age_hours=$(( (current_time - backup_time) / 3600 )) if [[ $age_hours -gt $MAX_BACKUP_AGE_HOURS ]]; then echo "CRITICAL: Latest backup is $age_hours hours old (max: $MAX_BACKUP_AGE_HOURS)" return 1 fi echo "OK: Latest backup is $age_hours hours old" return 0 }

Check WAL archiving

check_wal_archiving() { local latest_wal latest_wal=$(find "$WAL_ARCHIVE_DIR" -name "*.gz" -type f -printf '%T@ %p\n' | sort -n | tail -1 | cut -d' ' -f2-) if [[ -z "$latest_wal" ]]; then echo "WARNING: No WAL archives found" return 1 fi local wal_time wal_time=$(stat -c %Y "$latest_wal") local current_time current_time=$(date +%s) local age_minutes age_minutes=$(( (current_time - wal_time) / 60 )) if [[ $age_minutes -gt $MAX_WAL_AGE_MINUTES ]]; then echo "WARNING: Latest WAL archive is $age_minutes minutes old (max: $MAX_WAL_AGE_MINUTES)" return 1 fi echo "OK: WAL archiving is current ($age_minutes minutes ago)" return 0 }

Send alert email

send_alert() { local subject="$1" local message="$2" if command -v mail >/dev/null 2>&1; then echo "$message" | mail -s "$subject" "$ALERT_EMAIL" elif command -v sendmail >/dev/null 2>&1; then { echo "Subject: $subject" echo "To: $ALERT_EMAIL" echo "" echo "$message" } | sendmail "$ALERT_EMAIL" else echo "No mail command available for alerts" fi }

Main monitoring logic

main() { local backup_status=0 local wal_status=0 local backup_result local wal_result backup_result=$(check_backup_age) || backup_status=1 wal_result=$(check_wal_archiving) || wal_status=1 echo "Backup Status: $backup_result" echo "WAL Status: $wal_result" if [[ $backup_status -ne 0 ]] || [[ $wal_status -ne 0 ]]; then local alert_message alert_message="TimescaleDB Backup Alert\n\nBackup Status: $backup_result\nWAL Status: $wal_result\n\nTime: $(date)\nHost: $(hostname)" send_alert "TimescaleDB Backup Alert - $(hostname)" "$alert_message" exit 1 fi echo "All backup systems operating normally" exit 0 } main "$@"
sudo chmod 755 /usr/local/bin/timescale-backup-monitor.sh
sudo chown postgres:postgres /usr/local/bin/timescale-backup-monitor.sh

Configure monitoring systemd service

Set up automated monitoring checks every hour to ensure backup systems are functioning properly.

[Unit]
Description=TimescaleDB Backup Monitoring Service
After=postgresql.service

[Service]
Type=oneshot
User=postgres
Group=postgres
ExecStart=/usr/local/bin/timescale-backup-monitor.sh
StandardOutput=journal
StandardError=journal
[Unit]
Description=TimescaleDB Backup Monitoring Timer
Requires=timescale-backup-monitor.service

[Timer]
OnBootSec=15min
OnUnitActiveSec=1h
Persistent=true

[Install]
WantedBy=timers.target
sudo systemctl daemon-reload
sudo systemctl enable timescale-backup-monitor.timer
sudo systemctl start timescale-backup-monitor.timer

Restart PostgreSQL with new configuration

Apply the WAL archiving configuration by restarting the PostgreSQL service.

sudo systemctl restart postgresql
sudo systemctl status postgresql
Note: PostgreSQL restart is required for WAL configuration changes to take effect.

Initialize TimescaleDB in test database

Create a test database with TimescaleDB extension to verify backup functionality.

sudo -u postgres createdb timescaledb
sudo -u postgres psql -d timescaledb -c "CREATE EXTENSION IF NOT EXISTS timescaledb;"
sudo -u postgres psql -d timescaledb -c "SELECT timescaledb_pre_restore();"
sudo -u postgres psql -d timescaledb -c "SELECT timescaledb_post_restore();"

Test backup and recovery procedures

Perform initial backup test

Execute the backup script manually to verify functionality and generate baseline backup.

sudo -u postgres /usr/local/bin/timescale-backup.sh timescaledb

Verify WAL archiving

Check that WAL files are being archived automatically by PostgreSQL.

sudo -u postgres psql -c "SELECT pg_switch_wal();"
sleep 30
ls -la /var/lib/postgresql/wal_archive/

Test point-in-time recovery setup

Prepare PITR configuration without actually performing recovery to validate the process.

sudo -u postgres /usr/local/bin/timescale-pitr.sh "$(date '+%Y-%m-%d %H:%M:%S')" --dry-run

Verify your setup

# Check backup timer status
sudo systemctl status timescale-backup.timer

Check monitoring timer status

sudo systemctl status timescale-backup-monitor.timer

Verify WAL archiving is enabled

sudo -u postgres psql -c "SHOW archive_mode;" sudo -u postgres psql -c "SHOW archive_command;"

Check recent backups

ls -la /var/lib/postgresql/backups/

Verify WAL archives

ls -la /var/lib/postgresql/wal_archive/

Test backup monitoring

sudo -u postgres /usr/local/bin/timescale-backup-monitor.sh

View backup logs

sudo journalctl -u timescale-backup.service --no-pager

Configure retention and cleanup policies

Implement advanced WAL cleanup

Create intelligent WAL archive cleanup that respects backup retention periods and recovery requirements.

#!/bin/bash

TimescaleDB WAL Archive Cleanup Script

set -euo pipefail WAL_ARCHIVE_DIR="/var/lib/postgresql/wal_archive" BACKUP_DIR="/var/lib/postgresql/backups" RETENTION_DAYS=14 DRY_RUN=${1:-false}

Find oldest backup to determine safe cleanup point

oldest_backup=$(find "$BACKUP_DIR" -name "_backup_" -type d -printf '%T@ %p\n' | sort -n | head -1 | cut -d' ' -f2-) if [[ -n "$oldest_backup" ]]; then # Get backup start WAL from backup_label backup_start_wal=$(find "$oldest_backup" -name "backup_label*" -exec cat {} \; | grep "START WAL LOCATION" | awk '{print $4}' | head -1) if [[ -n "$backup_start_wal" ]]; then echo "Oldest backup start WAL: $backup_start_wal" # Remove WAL files older than oldest backup (safely) find "$WAL_ARCHIVE_DIR" -name "*.gz" -type f -mtime +$RETENTION_DAYS | while read -r wal_file; do if [[ "$DRY_RUN" == "true" ]]; then echo "Would remove: $wal_file" else echo "Removing old WAL: $wal_file" rm -f "$wal_file" fi done fi fi echo "WAL cleanup completed"
sudo chmod 755 /usr/local/bin/timescale-wal-cleanup.sh
sudo chown postgres:postgres /usr/local/bin/timescale-wal-cleanup.sh

Schedule WAL cleanup automation

Configure daily WAL cleanup to maintain optimal archive sizes while preserving recovery capabilities.

[Unit]
Description=TimescaleDB WAL Archive Cleanup Service
After=postgresql.service

[Service]
Type=oneshot
User=postgres
Group=postgres
ExecStart=/usr/local/bin/timescale-wal-cleanup.sh
StandardOutput=journal
StandardError=journal
[Unit]
Description=TimescaleDB WAL Cleanup Timer
Requires=timescale-wal-cleanup.service

[Timer]
OnCalendar=--* 03:00:00
Persistent=true
RandomizedDelaySec=600

[Install]
WantedBy=timers.target
sudo systemctl daemon-reload
sudo systemctl enable timescale-wal-cleanup.timer
sudo systemctl start timescale-wal-cleanup.timer

Implement automated failover monitoring

Configure streaming replication for failover

Set up streaming replication to enable automated failover capabilities with hot standby.

# Add replication connection for standby servers
host replication replicator 203.0.113.0/24 md5
# Create replication user
sudo -u postgres psql -c "CREATE USER replicator REPLICATION LOGIN PASSWORD 'secure_replication_password';"

Reload configuration

sudo systemctl reload postgresql

Create failover detection script

Implement automated failover detection and promotion for high availability scenarios.

#!/bin/bash

TimescaleDB Failover Detection and Management Script

set -euo pipefail PRIMARY_HOST="203.0.113.10" PRIMARY_PORT="5432" STANDBY_DATA_DIR="/var/lib/postgresql/16/main" FAILOVER_TRIGGER="/tmp/postgresql.trigger" HEALTH_CHECK_TIMEOUT=10 MAX_FAILURES=3 CURRENT_FAILURES=0

Check if primary is accessible

check_primary() { timeout $HEALTH_CHECK_TIMEOUT pg_isready -h "$PRIMARY_HOST" -p "$PRIMARY_PORT" -U postgres >/dev/null 2>&1 }

Check if this is a standby server

is_standby() { sudo -u postgres psql -t -c "SELECT pg_is_in_recovery();" 2>/dev/null | grep -q "t" }

Promote standby to primary

promote_standby() { echo "$(date): Promoting standby to primary" touch "$FAILOVER_TRIGGER" # Wait for promotion to complete local promoted=false for i in {1..30}; do if ! is_standby; then promoted=true break fi sleep 2 done if [[ "$promoted" == "true" ]]; then echo "$(date): Promotion successful" # Update configuration for new primary role sudo -u postgres psql -c "ALTER SYSTEM RESET primary_conninfo;" sudo -u postgres psql -c "SELECT pg_reload_conf();" return 0 else echo "$(date): Promotion failed" return 1 fi }

Main monitoring loop

main() { if ! is_standby; then echo "$(date): This server is not a standby, exiting" exit 0 fi if ! check_primary; then CURRENT_FAILURES=$((CURRENT_FAILURES + 1)) echo "$(date): Primary health check failed ($CURRENT_FAILURES/$MAX_FAILURES)" if [[ $CURRENT_FAILURES -ge $MAX_FAILURES ]]; then echo "$(date): Primary appears to be down, initiating failover" promote_standby fi else CURRENT_FAILURES=0 echo "$(date): Primary is healthy" fi } main "$@"
sudo chmod 755 /usr/local/bin/timescale-failover-monitor.sh
sudo chown postgres:postgres /usr/local/bin/timescale-failover-monitor.sh

Common issues

SymptomCauseFix
WAL archiving failsPermission issues or full diskCheck sudo tail /var/log/postgresql/postgresql*.log and verify disk space
Backup script failsInsufficient privileges or missing directoriesVerify postgres user owns backup directories and has SUPERUSER role
PITR cannot find WAL filesWAL archive cleanup too aggressiveAdjust retention in WAL cleanup script and verify archive_command
Base backup takes too longLarge database or network issuesUse pg_basebackup -c fast and check network bandwidth
Recovery stops with errorsMissing WAL segmentsCheck WAL archive completeness with pg_waldump
Monitoring alerts not workingMail system not configuredInstall mailutils: sudo apt install mailutils and configure SMTP

Performance optimization

Optimize backup performance

Configure parallel backup processing and compression settings for large TimescaleDB instances.

# Optimize for backup performance
max_wal_senders = 10
max_replication_slots = 10
wal_keep_size = 1GB

Checkpoint optimization for backup windows

checkpoint_timeout = 15min checkpoint_completion_target = 0.7 checkpoint_warning = 300s
sudo systemctl reload postgresql

Configure backup compression optimization

Implement adaptive compression based on data patterns and available resources.

#!/bin/bash

Enhanced backup with dynamic compression

set -euo pipefail

Detect available CPU cores for compression

CPU_CORES=$(nproc) COMPRESSION_THREADS=$((CPU_CORES / 2)) [[ $COMPRESSION_THREADS -lt 1 ]] && COMPRESSION_THREADS=1

Dynamic compression level based on system load

LOAD_AVG=$(uptime | awk -F'load average:' '{print $2}' | awk '{print $1}' | sed 's/,//') if (( $(echo "$LOAD_AVG > $CPU_CORES" | bc -l) )); then COMPRESSION_LEVEL=1 # Lower compression when system is busy else COMPRESSION_LEVEL=6 # Standard compression fi echo "Using $COMPRESSION_THREADS threads with compression level $COMPRESSION_LEVEL"

Rest of backup script with optimizations

export PGPASSWORD="your_backup_password" pg_basebackup -D "${BACKUP_DIR}/${BACKUP_NAME}" \ -Ft -z -Z$COMPRESSION_LEVEL \ -P -v -W \ -X stream \ --checkpoint=fast \ --label="TimescaleDB_optimized_${TIMESTAMP}"

Next steps

Automated install script

Run this to automate the entire setup

#timescaledb #postgresql #backup #pitr #wal-archiving

Need help?

Don't want to manage this yourself?

We handle infrastructure for businesses that depend on uptime. From initial setup to ongoing operations.

Talk to an engineer