Configure Linux file compression and archiving with tar, gzip and backup automation

Beginner 25 min Apr 03, 2026 17 views
Ubuntu 24.04 Ubuntu 22.04 Debian 12 AlmaLinux 9 Rocky Linux 9 Fedora 41

Learn to create compressed archives with tar and gzip, manage different compression formats, and implement automated backup scripts with rotation for efficient system maintenance and data protection.

Prerequisites

  • Root or sudo access
  • Basic command line knowledge
  • At least 2GB free disk space for testing

What this solves

File compression and archiving are essential for system backups, log management, and efficient storage utilization. This tutorial covers the tar command for creating archives, combining it with gzip, bzip2, and xz compression formats, and implementing automated backup scripts with rotation policies. You'll learn to optimize compression ratios, manage archive sizes, and create production-ready backup automation that integrates with log rotation systems and rsync backup automation.

Understanding compression formats and use cases

Compare compression formats

Different compression algorithms offer trade-offs between speed, compression ratio, and CPU usage.

echo "Compression format comparison:" > test_file.txt
for i in {1..1000}; do echo "This is line $i with some repetitive content for testing compression ratios" >> test_file.txt; done

Create archives with different compression

tar -cf test_uncompressed.tar test_file.txt tar -czf test_gzip.tar.gz test_file.txt tar -cjf test_bzip2.tar.bz2 test_file.txt tar -cJf test_xz.tar.xz test_file.txt

Compare sizes

ls -lh test_*

Understand compression characteristics

Each format serves different purposes based on your requirements.

FormatSpeedCompression RatioBest Use Case
gzip (.gz)FastGoodDaily backups, log compression
bzip2 (.bz2)MediumBetterLong-term archives, limited bandwidth
xz (.xz)SlowBestArchival storage, distribution packages
uncompressedFastestNoneTemporary archives, fast extraction needed

Basic tar operations for archiving and extraction

Create basic archives

The tar command creates archives from files and directories with various options for different scenarios.

# Create directory structure for testing
mkdir -p /tmp/backup_test/{config,logs,data}
echo "database_host=localhost" > /tmp/backup_test/config/app.conf
echo "log_level=info" > /tmp/backup_test/config/logging.conf
echo "$(date): Application started" > /tmp/backup_test/logs/app.log
echo "user_data_sample" > /tmp/backup_test/data/users.dat

Create uncompressed archive

tar -cvf backup_test.tar /tmp/backup_test/

List archive contents

tar -tvf backup_test.tar

Extract archives safely

Always verify archive contents and extract to controlled locations to prevent directory traversal issues.

# List contents before extraction (security best practice)
tar -tvf backup_test.tar

Extract to specific directory

mkdir -p /tmp/restore_test cd /tmp/restore_test tar -xvf /tmp/backup_test.tar

Extract specific files only

tar -xvf /tmp/backup_test.tar --strip-components=3 tmp/backup_test/config/app.conf

Extract with path stripping for security

tar -xvf /tmp/backup_test.tar --strip-components=2

Use advanced tar options

Tar provides numerous options for excluding files, following symlinks, and preserving permissions.

# Create archive excluding specific patterns
tar -cvf backup_selective.tar --exclude='.log' --exclude='tmp/' /tmp/backup_test/

Create archive with exclusion file

echo "*.tmp" > exclude_patterns.txt echo "cache/*" >> exclude_patterns.txt tar -cvf backup_filtered.tar --exclude-from=exclude_patterns.txt /tmp/backup_test/

Preserve all permissions and attributes

tar -cpvf backup_full_permissions.tar /tmp/backup_test/

Show progress for large archives

tar -cvf backup_with_progress.tar --checkpoint=100 --checkpoint-action=echo="%T: %d" /tmp/backup_test/

Combining tar with gzip, bzip2, and xz compression

Create compressed archives with tar

Tar can automatically detect and apply compression based on file extension or explicit flags.

# Gzip compression (fastest)
tar -czf backup_gzip.tar.gz /tmp/backup_test/

Bzip2 compression (balanced)

tar -cjf backup_bzip2.tar.bz2 /tmp/backup_test/

XZ compression (highest ratio)

tar -cJf backup_xz.tar.xz /tmp/backup_test/

Auto-detect compression by extension

tar -caf backup_auto_gzip.tar.gz /tmp/backup_test/ tar -caf backup_auto_xz.tar.xz /tmp/backup_test/

Compare compression results

ls -lh backup_.tar

Set compression levels

Most compression formats support multiple levels to balance speed and compression ratio.

# Gzip with different compression levels (1=fast, 9=best)
export GZIP=-1; tar -czf backup_gzip_fast.tar.gz /tmp/backup_test/
export GZIP=-9; tar -czf backup_gzip_best.tar.gz /tmp/backup_test/
unset GZIP

Bzip2 with compression levels

export BZIP2=-1; tar -cjf backup_bzip2_fast.tar.bz2 /tmp/backup_test/ export BZIP2=-9; tar -cjf backup_bzip2_best.tar.bz2 /tmp/backup_test/ unset BZIP2

XZ with compression levels and memory usage

export XZ_OPT="-1"; tar -cJf backup_xz_fast.tar.xz /tmp/backup_test/ export XZ_OPT="-9"; tar -cJf backup_xz_best.tar.xz /tmp/backup_test/ unset XZ_OPT

Compare file sizes and creation time

ls -lh backup__{fast,best}.

Extract compressed archives

Tar automatically detects compression format during extraction, but you can specify it explicitly.

# Auto-detect compression (recommended)
tar -xaf backup_gzip.tar.gz -C /tmp/restore_gzip/
tar -xaf backup_bzip2.tar.bz2 -C /tmp/restore_bzip2/
tar -xaf backup_xz.tar.xz -C /tmp/restore_xz/

Explicit compression flags

tar -xzf backup_gzip.tar.gz -C /tmp/restore_explicit_gzip/ tar -xjf backup_bzip2.tar.bz2 -C /tmp/restore_explicit_bzip2/ tar -xJf backup_xz.tar.xz -C /tmp/restore_explicit_xz/

List compressed archive contents

tar -tzf backup_gzip.tar.gz tar -tjf backup_bzip2.tar.bz2 tar -tJf backup_xz.tar.xz

Automated backup scripts with compression and rotation

Create basic backup script

A production backup script should handle errors, logging, and different backup types efficiently.

#!/bin/bash

System backup script with compression and rotation

Usage: backup_system.sh [full|incremental]

Configuration

BACKUP_BASE="/backup" SOURCE_DIRS=("/etc" "/home" "/var/log" "/opt") BACKUP_TYPE="${1:-incremental}" MAX_BACKUPS=7 COMPRESSION="gzip" # gzip, bzip2, or xz LOGFILE="/var/log/backup.log" EXCLUDE_FILE="/etc/backup_exclude.txt"

Logging function

log_message() { echo "$(date '+%Y-%m-%d %H:%M:%S'): $1" | tee -a "$LOGFILE" }

Error handling

set -euo pipefail trap 'log_message "ERROR: Backup failed at line $LINENO"' ERR

Create backup directory

mkdir -p "$BACKUP_BASE"

Set compression extension

case "$COMPRESSION" in "gzip") EXT=".tar.gz"; TAR_FLAG="z" ;; "bzip2") EXT=".tar.bz2"; TAR_FLAG="j" ;; "xz") EXT=".tar.xz"; TAR_FLAG="J" ;; *) EXT=".tar"; TAR_FLAG="" ;; esac BACKUP_FILE="$BACKUP_BASE/system_${BACKUP_TYPE}_$(date +%Y%m%d_%H%M%S)${EXT}" log_message "Starting $BACKUP_TYPE backup to $BACKUP_FILE"

Create exclude file if it doesn't exist

if [ ! -f "$EXCLUDE_FILE" ]; then cat > "$EXCLUDE_FILE" << 'EOF' *.tmp *.cache /var/cache/* /tmp/* /proc/* /sys/* /dev/* /run/* /mnt/* /media/* EOF log_message "Created default exclude file: $EXCLUDE_FILE" fi

Perform backup based on type

if [ "$BACKUP_TYPE" = "full" ]; then tar -c${TAR_FLAG}f "$BACKUP_FILE" --exclude-from="$EXCLUDE_FILE" "${SOURCE_DIRS[@]}" 2>&1 | tee -a "$LOGFILE" else # Incremental backup (files modified in last 24 hours) find "${SOURCE_DIRS[@]}" -type f -mtime -1 2>/dev/null | tar -c${TAR_FLAG}f "$BACKUP_FILE" --files-from=- 2>&1 | tee -a "$LOGFILE" fi BACKUP_SIZE=$(du -h "$BACKUP_FILE" | cut -f1) log_message "Backup completed successfully. Size: $BACKUP_SIZE"

Verify backup integrity

if tar -t${TAR_FLAG}f "$BACKUP_FILE" >/dev/null 2>&1; then log_message "Backup integrity verified" else log_message "ERROR: Backup integrity check failed" exit 1 fi

Add backup rotation functionality

Implement rotation to prevent backup directories from consuming excessive disk space.

# Add this to the backup script after backup completion

Backup rotation function

rotate_backups() { local backup_pattern="$1" local max_backups="$2" log_message "Starting backup rotation for pattern: $backup_pattern" # Find and sort backups by modification time (oldest first) local backup_count=$(find "$BACKUP_BASE" -name "$backup_pattern" -type f | wc -l) if [ "$backup_count" -gt "$max_backups" ]; then local excess=$((backup_count - max_backups)) log_message "Found $backup_count backups, removing $excess oldest backups" find "$BACKUP_BASE" -name "$backup_pattern" -type f -printf '%T@ %p\n' | \ sort -n | \ head -n "$excess" | \ cut -d' ' -f2- | \ while read -r backup_file; do log_message "Removing old backup: $(basename "$backup_file")" rm -f "$backup_file" done else log_message "Backup count ($backup_count) within limit ($max_backups)" fi }

Rotate backups

rotate_backups "system_${BACKUP_TYPE}_*${EXT}" "$MAX_BACKUPS"

Create backup report

BACKUP_REPORT="$BACKUP_BASE/backup_report_$(date +%Y%m%d).txt" echo "Backup Report - $(date)" > "$BACKUP_REPORT" echo "==========================" >> "$BACKUP_REPORT" echo "Backup Type: $BACKUP_TYPE" >> "$BACKUP_REPORT" echo "Compression: $COMPRESSION" >> "$BACKUP_REPORT" echo "Backup File: $(basename "$BACKUP_FILE")" >> "$BACKUP_REPORT" echo "Backup Size: $BACKUP_SIZE" >> "$BACKUP_REPORT" echo "" >> "$BACKUP_REPORT" echo "Current Backups:" >> "$BACKUP_REPORT" ls -lh "$BACKUP_BASE"/system_*"${EXT}" >> "$BACKUP_REPORT" 2>/dev/null || echo "No backups found" >> "$BACKUP_REPORT" log_message "Backup process completed successfully"

Make script executable and test

Set proper permissions and test the backup script functionality with error handling.

# Make script executable
sudo chmod 755 /usr/local/bin/backup_system.sh

Create backup directory with proper permissions

sudo mkdir -p /backup sudo chown root:root /backup sudo chmod 750 /backup

Test incremental backup

sudo /usr/local/bin/backup_system.sh incremental

Test full backup

sudo /usr/local/bin/backup_system.sh full

Check backup results

sudo ls -lh /backup/ sudo cat /var/log/backup.log

Create systemd service and timer

Automate backups using systemd timers for reliable scheduling without cron dependencies.

[Unit]
Description=System Backup Service
After=multi-user.target

[Service]
Type=oneshot
User=root
Group=root
ExecStart=/usr/local/bin/backup_system.sh incremental
ExecStartPost=/bin/bash -c 'if [ $(date +%u) -eq 7 ]; then /usr/local/bin/backup_system.sh full; fi'
StandardOutput=journal
StandardError=journal
SyslogIdentifier=backup-system

Configure backup timer

Set up automated execution schedule with systemd timer for daily incremental and weekly full backups.

[Unit]
Description=Daily System Backup Timer
Requires=backup-system.service

[Timer]
OnCalendar=daily
RandomizedDelaySec=1h
Persistent=true

[Install]
WantedBy=timers.target

Enable and start backup automation

Activate the systemd timer and verify the backup automation is working correctly.

# Reload systemd configuration
sudo systemctl daemon-reload

Enable and start the timer

sudo systemctl enable backup-system.timer sudo systemctl start backup-system.timer

Check timer status

sudo systemctl status backup-system.timer sudo systemctl list-timers backup-system.timer

Test manual execution

sudo systemctl start backup-system.service sudo systemctl status backup-system.service

Check logs

sudo journalctl -u backup-system.service -f

Advanced compression techniques

Multi-threaded compression

Use parallel compression tools to speed up backup creation on multi-core systems.

sudo apt update
sudo apt install -y pigz pbzip2 pxz
sudo dnf install -y pigz pbzip2 pxz

Use parallel compression

Parallel compression tools can significantly reduce backup time on systems with multiple CPU cores.

# Parallel gzip compression
tar -cf - /tmp/backup_test/ | pigz -p 4 > backup_parallel_gzip.tar.gz

Parallel bzip2 compression

tar -cf - /tmp/backup_test/ | pbzip2 -p4 > backup_parallel_bzip2.tar.bz2

Parallel xz compression

tar -cf - /tmp/backup_test/ | pxz -T 4 > backup_parallel_xz.tar.xz

Compare creation times

time tar -czf backup_single_gzip.tar.gz /tmp/backup_test/ time tar -cf - /tmp/backup_test/ | pigz -p 4 > backup_parallel_gzip2.tar.gz

Create differential backups

Implement differential backup strategy to minimize backup size and time for frequently changing data.

#!/bin/bash

Differential backup script

BACKUP_BASE="/backup/differential" SOURCE_DIR="/home" REFERENCE_FILE="$BACKUP_BASE/last_full_backup.timestamp" BACKUP_DATE=$(date +%Y%m%d_%H%M%S) mkdir -p "$BACKUP_BASE" if [ -f "$REFERENCE_FILE" ]; then # Create differential backup (files newer than last full backup) find "$SOURCE_DIR" -newer "$REFERENCE_FILE" -type f | \ tar -czf "$BACKUP_BASE/differential_$BACKUP_DATE.tar.gz" --files-from=- echo "Differential backup created: differential_$BACKUP_DATE.tar.gz" else # Create full backup and timestamp reference tar -czf "$BACKUP_BASE/full_$BACKUP_DATE.tar.gz" "$SOURCE_DIR" touch "$REFERENCE_FILE" echo "Full backup created: full_$BACKUP_DATE.tar.gz" fi

Verify your setup

# Check installed compression tools
tar --version
gzip --version
bzip2 --version
xz --version

Verify backup script permissions

ls -l /usr/local/bin/backup_system.sh

Check systemd timer status

sudo systemctl status backup-system.timer sudo systemctl list-timers | grep backup

Test backup functionality

sudo /usr/local/bin/backup_system.sh incremental ls -lh /backup/

Verify backup integrity

sudo tar -tzf /backup/system_incremental_*.tar.gz | head -10

Check backup logs

sudo tail -n 20 /var/log/backup.log

Common issues

SymptomCauseFix
tar: Permission deniedInsufficient privileges to read source filesRun backup script with sudo or fix source directory permissions with sudo chown -R user:group /path
Backup file corruptedDisk space exhaustion or interrupted processCheck available space with df -h, verify backup with tar -tf backup.tar.gz
Timer not executingSystemd service misconfigurationCheck timer syntax with sudo systemd-analyze verify backup-system.timer
High CPU usage during backupAggressive compression settingsReduce compression level or use parallel tools: export GZIP=-1 for faster compression
Backup script fails silentlyMissing error handling or loggingAdd set -euo pipefail and proper logging to script
Cannot extract archiveCompression format mismatchUse file backup.tar.gz to identify format, then extract with correct tar flags

Next steps

Automated install script

Run this to automate the entire setup

#tar command #linux file compression #gzip compression #backup automation #archive files linux

Need help?

Don't want to manage this yourself?

We handle infrastructure for businesses that depend on uptime. From initial setup to ongoing operations.

Talk to an engineer