Set up automated disk monitoring, log cleanup, and email alerts using systemd timers to prevent disk space issues. Configure log rotation, temporary file cleanup, and threshold-based alerting for production systems.
Prerequisites
- Root or sudo access
- Basic familiarity with systemd
- Email server configuration knowledge
What this solves
Running out of disk space can cause system failures, service outages, and data loss in production environments. This tutorial sets up automated disk monitoring with email alerts when thresholds are reached, configures systemd timers for regular cleanup tasks, and implements log rotation to prevent uncontrolled disk usage growth.
Step-by-step configuration
Update system packages
Start by updating your package manager to ensure you have the latest system tools and utilities.
sudo apt update && sudo apt upgrade -y
Install monitoring and mail utilities
Install the necessary packages for disk monitoring, email notifications, and system utilities.
sudo apt install -y mailutils postfix logrotate ncdu tree
Create disk monitoring script
Create a script that checks disk usage and sends email alerts when thresholds are exceeded.
sudo mkdir -p /opt/disk-monitor
sudo tee /opt/disk-monitor/disk-check.sh > /dev/null << 'EOF'
#!/bin/bash
Configuration
THRESHOLD_WARNING=80
THRESHOLD_CRITICAL=90
EMAIL_RECIPIENT="admin@example.com"
HOSTNAME=$(hostname -f)
LOG_FILE="/var/log/disk-monitor.log"
Function to log messages
log_message() {
echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" >> "$LOG_FILE"
}
Function to send email alert
send_alert() {
local severity=$1
local filesystem=$2
local usage=$3
local available=$4
local subject="[$severity] Disk Space Alert - $HOSTNAME"
local body="Disk space alert for $HOSTNAME:
Filesystem: $filesystem
Usage: $usage%
Available: $available
Threshold: ${severity,,} at ${THRESHOLD_WARNING}%/${THRESHOLD_CRITICAL}%
Please investigate and free up disk space immediately."
echo -e "$body" | mail -s "$subject" "$EMAIL_RECIPIENT"
log_message "$severity alert sent for $filesystem ($usage% used)"
}
Check disk usage for all mounted filesystems
df -h | awk 'NR>1 && !/tmpfs|devtmpfs|udev/ {print $5 " " $6 " " $4}' | while read output; do
usage=$(echo $output | awk '{print $1}' | sed 's/%//')
filesystem=$(echo $output | awk '{print $2}')
available=$(echo $output | awk '{print $3}')
if [ $usage -ge $THRESHOLD_CRITICAL ]; then
send_alert "CRITICAL" "$filesystem" "$usage" "$available"
elif [ $usage -ge $THRESHOLD_WARNING ]; then
send_alert "WARNING" "$filesystem" "$usage" "$available"
fi
done
Log successful completion
log_message "Disk check completed successfully"
EOF
Make the monitoring script executable
Set proper permissions on the disk monitoring script to allow execution by the system.
sudo chmod 755 /opt/disk-monitor/disk-check.sh
sudo chown root:root /opt/disk-monitor/disk-check.sh
Create cleanup script for temporary files
Create a script to automatically clean up temporary files, old logs, and cache directories.
sudo tee /opt/disk-monitor/cleanup.sh > /dev/null << 'EOF'
#!/bin/bash
LOG_FILE="/var/log/disk-cleanup.log"
CLEANED_SPACE=0
Function to log messages with space saved
log_cleanup() {
local action=$1
local space_before=$2
local space_after=$3
local saved=$((space_before - space_after))
CLEANED_SPACE=$((CLEANED_SPACE + saved))
echo "$(date '+%Y-%m-%d %H:%M:%S') - $action: ${saved}KB freed" >> "$LOG_FILE"
}
Function to get directory size in KB
get_size() {
du -sk "$1" 2>/dev/null | cut -f1 || echo 0
}
echo "$(date '+%Y-%m-%d %H:%M:%S') - Starting cleanup process" >> "$LOG_FILE"
Clean temporary directories
for temp_dir in "/tmp" "/var/tmp"; do
if [ -d "$temp_dir" ]; then
before=$(get_size "$temp_dir")
find "$temp_dir" -type f -atime +7 -delete 2>/dev/null
find "$temp_dir" -type d -empty -delete 2>/dev/null
after=$(get_size "$temp_dir")
log_cleanup "Cleaned $temp_dir" "$before" "$after"
fi
done
Clean old log files (older than 30 days)
if [ -d "/var/log" ]; then
before=$(get_size "/var/log")
find /var/log -name ".log..gz" -mtime +30 -delete 2>/dev/null
find /var/log -name ".log." -mtime +30 -delete 2>/dev/null
after=$(get_size "/var/log")
log_cleanup "Cleaned old logs" "$before" "$after"
fi
Clean package cache
if command -v apt-get >/dev/null 2>&1; then
before=$(get_size "/var/cache/apt")
apt-get clean >/dev/null 2>&1
after=$(get_size "/var/cache/apt")
log_cleanup "APT cache cleanup" "$before" "$after"
elif command -v dnf >/dev/null 2>&1; then
before=$(get_size "/var/cache/dnf")
dnf clean all >/dev/null 2>&1
after=$(get_size "/var/cache/dnf")
log_cleanup "DNF cache cleanup" "$before" "$after"
fi
Clean journal logs older than 30 days
before=$(journalctl --disk-usage 2>/dev/null | grep -oE '[0-9.]+[KMGT]B' | head -1 | sed 's/[^0-9.]//g' || echo 0)
journalctl --vacuum-time=30d >/dev/null 2>&1
after=$(journalctl --disk-usage 2>/dev/null | grep -oE '[0-9.]+[KMGT]B' | head -1 | sed 's/[^0-9.]//g' || echo 0)
log_cleanup "Journal cleanup" "$before" "$after"
echo "$(date '+%Y-%m-%d %H:%M:%S') - Cleanup completed. Total space freed: ${CLEANED_SPACE}KB" >> "$LOG_FILE"
EOF
Make cleanup script executable
Set proper permissions on the cleanup script and ensure it's owned by root for security.
sudo chmod 755 /opt/disk-monitor/cleanup.sh
sudo chown root:root /opt/disk-monitor/cleanup.sh
Create systemd service for disk monitoring
Create a systemd service unit that will run the disk monitoring script.
[Unit]
Description=Disk Usage Monitor
Wants=network-online.target
After=network-online.target
[Service]
Type=oneshot
User=root
Group=root
ExecStart=/opt/disk-monitor/disk-check.sh
StandardOutput=journal
StandardError=journal
Create systemd service for cleanup
Create a systemd service unit for the automated cleanup tasks.
[Unit]
Description=Disk Cleanup Service
Wants=network-online.target
After=network-online.target
[Service]
Type=oneshot
User=root
Group=root
ExecStart=/opt/disk-monitor/cleanup.sh
StandardOutput=journal
StandardError=journal
Create systemd timers
Create timer units to schedule regular execution of the monitoring and cleanup services.
[Unit]
Description=Run disk monitor every 15 minutes
Requires=disk-monitor.service
[Timer]
OnBootSec=5min
OnUnitActiveSec=15min
Persistent=true
[Install]
WantedBy=timers.target
[Unit]
Description=Run disk cleanup daily at 2 AM
Requires=disk-cleanup.service
[Timer]
OnCalendar=--* 02:00:00
Persistent=true
[Install]
WantedBy=timers.target
Configure enhanced log rotation
Set up comprehensive log rotation to prevent log files from consuming excessive disk space.
# Disk monitor logs
/var/log/disk-monitor.log {
weekly
missingok
rotate 12
compress
delaycompress
notifempty
create 644 root root
}
/var/log/disk-cleanup.log {
weekly
missingok
rotate 12
compress
delaycompress
notifempty
create 644 root root
}
Enhanced system log rotation
/var/log/syslog {
daily
missingok
rotate 14
compress
delaycompress
notifempty
create 644 syslog adm
postrotate
systemctl reload rsyslog
endscript
}
/var/log/auth.log {
weekly
missingok
rotate 8
compress
delaycompress
notifempty
create 644 syslog adm
postrotate
systemctl reload rsyslog
endscript
}
Configure Postfix for email notifications
Set up basic Postfix configuration for sending email alerts. This configuration works for most cloud providers.
sudo debconf-set-selections <<< "postfix postfix/mailname string $(hostname -f)"
sudo debconf-set-selections <<< "postfix postfix/main_mailer_type string 'Internet Site'"
# Add or modify these settings in /etc/postfix/main.cf
sudo postconf -e "myhostname = $(hostname -f)"
sudo postconf -e "mydestination = \$myhostname, localhost.\$mydomain, localhost"
sudo postconf -e "relayhost = "
sudo postconf -e "mynetworks = 127.0.0.0/8 [::ffff:127.0.0.0]/104 [::1]/128"
sudo postconf -e "inet_protocols = ipv4"
Enable and start services
Reload systemd configuration and enable the timer services to start automatically.
sudo systemctl daemon-reload
sudo systemctl enable --now disk-monitor.timer
sudo systemctl enable --now disk-cleanup.timer
sudo systemctl enable --now postfix
Create log directories and initial files
Ensure proper log directories exist with correct permissions for monitoring and cleanup scripts.
sudo touch /var/log/disk-monitor.log /var/log/disk-cleanup.log
sudo chmod 644 /var/log/disk-monitor.log /var/log/disk-cleanup.log
sudo chown root:root /var/log/disk-monitor.log /var/log/disk-cleanup.log
Verify your setup
Check that all timers are active and services are properly configured.
# Check timer status
sudo systemctl status disk-monitor.timer disk-cleanup.timer
List all active timers
sudo systemctl list-timers
Test the monitoring script manually
sudo /opt/disk-monitor/disk-check.sh
Test the cleanup script manually
sudo /opt/disk-monitor/cleanup.sh
Check log files were created
ls -la /var/log/disk-*.log
Verify current disk usage
df -h
Check postfix is running
sudo systemctl status postfix
Advanced configuration
Add filesystem-specific monitoring
Create custom thresholds for specific filesystems that may need different monitoring levels.
# Custom thresholds per filesystem
Format: filesystem:warning_threshold:critical_threshold
/:85:95
/var:80:90
/home:75:85
/tmp:90:95
Configure logrotate for application logs
Add rotation rules for common application log directories to prevent them from filling the disk.
# Nginx logs
/var/log/nginx/*.log {
daily
missingok
rotate 30
compress
delaycompress
notifempty
create 644 www-data adm
sharedscripts
postrotate
systemctl reload nginx
endscript
}
Apache logs
/var/log/apache2/*.log {
weekly
missingok
rotate 12
compress
delaycompress
notifempty
create 644 www-data adm
sharedscripts
postrotate
systemctl reload apache2
endscript
}
Common issues
| Symptom | Cause | Fix |
|---|---|---|
| Timer not running | Service not enabled | sudo systemctl enable --now disk-monitor.timer |
| No email alerts | Postfix not configured | Check sudo systemctl status postfix and mail logs |
| Script permission denied | Incorrect file permissions | sudo chmod 755 /opt/disk-monitor/*.sh |
| Cleanup not working | Insufficient permissions | Ensure scripts run as root user in service files |
| Log rotation fails | Service reload issues | Check service status and logrotate configuration syntax |
| High disk usage persists | Large files not cleaned | Use ncdu / to identify large directories manually |
Monitoring and maintenance
Regular maintenance tasks to keep your disk monitoring system healthy.
# View recent timer executions
sudo systemctl list-timers --all
Check monitoring logs
sudo tail -f /var/log/disk-monitor.log
Check cleanup logs
sudo tail -f /var/log/disk-cleanup.log
Test logrotate manually
sudo logrotate -d /etc/logrotate.conf
Force logrotate to run
sudo logrotate -f /etc/logrotate.conf
Analyze disk usage with ncdu
sudo ncdu /var/log
Check journal disk usage
journalctl --disk-usage
Next steps
- Configure Linux system logging with rsyslog and journald for centralized log management
- Set up Prometheus and Grafana monitoring stack with Docker compose
- Configure Linux performance monitoring with collectd and InfluxDB
- Setup centralized log aggregation with Elasticsearch, Logstash, and Kibana
- Configure system backup automation with BorgBackup and systemd timers
Automated install script
Run this to automate the entire setup
#!/usr/bin/env bash
set -euo pipefail
# Colors
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'
# Default configuration
EMAIL_RECIPIENT="${1:-root@localhost}"
WARNING_THRESHOLD="${2:-80}"
CRITICAL_THRESHOLD="${3:-90}"
usage() {
echo "Usage: $0 [email_recipient] [warning_threshold] [critical_threshold]"
echo "Example: $0 admin@example.com 75 85"
exit 1
}
log_message() {
echo -e "${GREEN}[INFO]${NC} $1"
}
log_warning() {
echo -e "${YELLOW}[WARNING]${NC} $1"
}
log_error() {
echo -e "${RED}[ERROR]${NC} $1"
}
cleanup_on_error() {
log_error "Installation failed. Cleaning up..."
systemctl disable disk-monitor.timer 2>/dev/null || true
systemctl disable disk-cleanup.timer 2>/dev/null || true
rm -rf /opt/disk-monitor
rm -f /etc/systemd/system/disk-{monitor,cleanup}.{service,timer}
systemctl daemon-reload
}
trap cleanup_on_error ERR
# Check if running as root
if [[ $EUID -ne 0 ]]; then
log_error "This script must be run as root"
exit 1
fi
# Validate thresholds
if [[ ! "$WARNING_THRESHOLD" =~ ^[0-9]+$ ]] || [[ ! "$CRITICAL_THRESHOLD" =~ ^[0-9]+$ ]]; then
log_error "Thresholds must be numeric"
usage
fi
if [[ $WARNING_THRESHOLD -ge $CRITICAL_THRESHOLD ]]; then
log_error "Warning threshold must be less than critical threshold"
usage
fi
# Detect distribution
log_message "[1/8] Detecting distribution..."
if [ -f /etc/os-release ]; then
. /etc/os-release
case "$ID" in
ubuntu|debian)
PKG_MGR="apt"
PKG_INSTALL="apt install -y"
MAIL_PACKAGE="mailutils"
;;
almalinux|rocky|centos|rhel|ol|fedora)
PKG_MGR="dnf"
PKG_INSTALL="dnf install -y"
MAIL_PACKAGE="mailx"
;;
amzn)
PKG_MGR="yum"
PKG_INSTALL="yum install -y"
MAIL_PACKAGE="mailx"
;;
*)
log_error "Unsupported distribution: $ID"
exit 1
;;
esac
else
log_error "Cannot detect distribution"
exit 1
fi
log_message "Detected: $PRETTY_NAME"
# Update system packages
log_message "[2/8] Updating system packages..."
if [[ $PKG_MGR == "apt" ]]; then
apt update && apt upgrade -y
else
$PKG_INSTALL update -y
fi
# Install required packages
log_message "[3/8] Installing monitoring utilities..."
$PKG_INSTALL $MAIL_PACKAGE postfix logrotate ncdu tree
# Create monitoring directory
log_message "[4/8] Creating monitoring directory and scripts..."
mkdir -p /opt/disk-monitor
mkdir -p /var/log/disk-monitor
chown root:root /opt/disk-monitor
chmod 755 /opt/disk-monitor
# Create disk monitoring script
cat > /opt/disk-monitor/disk-check.sh << 'EOF'
#!/bin/bash
EMAIL_RECIPIENT="EMAIL_PLACEHOLDER"
THRESHOLD_WARNING=WARNING_PLACEHOLDER
THRESHOLD_CRITICAL=CRITICAL_PLACEHOLDER
LOG_FILE="/var/log/disk-monitor/disk-check.log"
HOSTNAME=$(hostname)
log_message() {
echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" >> "$LOG_FILE"
}
send_alert() {
local severity=$1
local filesystem=$2
local usage=$3
local available=$4
local subject="[$severity] Disk Space Alert - $HOSTNAME"
local body="Disk space alert for $HOSTNAME:
Filesystem: $filesystem
Usage: $usage%
Available: $available
Threshold: ${severity,,} at ${THRESHOLD_WARNING}%/${THRESHOLD_CRITICAL}%
Please investigate and free up disk space immediately."
echo -e "$body" | mail -s "$subject" "$EMAIL_RECIPIENT" 2>/dev/null || true
log_message "$severity alert sent for $filesystem ($usage% used)"
}
df -h | awk 'NR>1 && !/tmpfs|devtmpfs|udev/ {print $5 " " $6 " " $4}' | while read output; do
usage=$(echo $output | awk '{print $1}' | sed 's/%//')
filesystem=$(echo $output | awk '{print $2}')
available=$(echo $output | awk '{print $3}')
if [ $usage -ge $THRESHOLD_CRITICAL ]; then
send_alert "CRITICAL" "$filesystem" "$usage" "$available"
elif [ $usage -ge $THRESHOLD_WARNING ]; then
send_alert "WARNING" "$filesystem" "$usage" "$available"
fi
done
log_message "Disk check completed successfully"
EOF
# Replace placeholders in monitoring script
sed -i "s/EMAIL_PLACEHOLDER/$EMAIL_RECIPIENT/g" /opt/disk-monitor/disk-check.sh
sed -i "s/WARNING_PLACEHOLDER/$WARNING_THRESHOLD/g" /opt/disk-monitor/disk-check.sh
sed -i "s/CRITICAL_PLACEHOLDER/$CRITICAL_THRESHOLD/g" /opt/disk-monitor/disk-check.sh
# Create cleanup script
cat > /opt/disk-monitor/cleanup.sh << 'EOF'
#!/bin/bash
LOG_FILE="/var/log/disk-monitor/cleanup.log"
log_cleanup() {
local action=$1
local before=$2
local after=$3
local saved=$((before - after))
echo "$(date '+%Y-%m-%d %H:%M:%S') - $action: Freed ${saved}KB" >> "$LOG_FILE"
}
get_size() {
du -sk "$1" 2>/dev/null | cut -f1 || echo 0
}
echo "$(date '+%Y-%m-%d %H:%M:%S') - Starting cleanup process" >> "$LOG_FILE"
for temp_dir in "/tmp" "/var/tmp"; do
if [ -d "$temp_dir" ]; then
before=$(get_size "$temp_dir")
find "$temp_dir" -type f -atime +7 -delete 2>/dev/null || true
find "$temp_dir" -type d -empty -delete 2>/dev/null || true
after=$(get_size "$temp_dir")
log_cleanup "Cleaned $temp_dir" "$before" "$after"
fi
done
if [ -d "/var/log" ]; then
before=$(get_size "/var/log")
find /var/log -name "*.log.*.gz" -mtime +30 -delete 2>/dev/null || true
find /var/log -name "*.log.*" -mtime +30 -delete 2>/dev/null || true
after=$(get_size "/var/log")
log_cleanup "Cleaned old logs" "$before" "$after"
fi
if command -v apt-get >/dev/null 2>&1; then
before=$(get_size "/var/cache/apt")
apt-get clean >/dev/null 2>&1 || true
after=$(get_size "/var/cache/apt")
log_cleanup "APT cache cleanup" "$before" "$after"
elif command -v dnf >/dev/null 2>&1; then
before=$(get_size "/var/cache/dnf")
dnf clean all >/dev/null 2>&1 || true
after=$(get_size "/var/cache/dnf")
log_cleanup "DNF cache cleanup" "$before" "$after"
fi
echo "$(date '+%Y-%m-%d %H:%M:%S') - Cleanup process completed" >> "$LOG_FILE"
EOF
# Set script permissions
chmod 755 /opt/disk-monitor/*.sh
chown root:root /opt/disk-monitor/*.sh
# Create systemd service files
log_message "[5/8] Creating systemd services..."
cat > /etc/systemd/system/disk-monitor.service << EOF
[Unit]
Description=Disk Space Monitor
After=network.target
[Service]
Type=oneshot
ExecStart=/opt/disk-monitor/disk-check.sh
User=root
EOF
cat > /etc/systemd/system/disk-cleanup.service << EOF
[Unit]
Description=Disk Space Cleanup
After=network.target
[Service]
Type=oneshot
ExecStart=/opt/disk-monitor/cleanup.sh
User=root
EOF
# Create systemd timer files
log_message "[6/8] Creating systemd timers..."
cat > /etc/systemd/system/disk-monitor.timer << EOF
[Unit]
Description=Run disk monitor every 30 minutes
Requires=disk-monitor.service
[Timer]
OnCalendar=*:0/30
Persistent=true
[Install]
WantedBy=timers.target
EOF
cat > /etc/systemd/system/disk-cleanup.timer << EOF
[Unit]
Description=Run disk cleanup daily at 2 AM
Requires=disk-cleanup.service
[Timer]
OnCalendar=daily
RandomizedDelaySec=1800
Persistent=true
[Install]
WantedBy=timers.target
EOF
# Configure log rotation
log_message "[7/8] Configuring log rotation..."
cat > /etc/logrotate.d/disk-monitor << EOF
/var/log/disk-monitor/*.log {
daily
missingok
rotate 30
compress
notifempty
create 644 root root
}
EOF
# Start and enable services
systemctl daemon-reload
systemctl enable disk-monitor.timer
systemctl enable disk-cleanup.timer
systemctl start disk-monitor.timer
systemctl start disk-cleanup.timer
# Verification
log_message "[8/8] Verifying installation..."
if systemctl is-active --quiet disk-monitor.timer; then
log_message "✓ Disk monitor timer is active"
else
log_error "✗ Disk monitor timer failed to start"
exit 1
fi
if systemctl is-active --quiet disk-cleanup.timer; then
log_message "✓ Disk cleanup timer is active"
else
log_error "✗ Disk cleanup timer failed to start"
exit 1
fi
# Test monitoring script
if /opt/disk-monitor/disk-check.sh; then
log_message "✓ Disk monitoring script works correctly"
else
log_warning "⚠ Disk monitoring script test failed (mail configuration may be incomplete)"
fi
log_message "Installation completed successfully!"
log_message "Configuration:"
log_message " Email recipient: $EMAIL_RECIPIENT"
log_message " Warning threshold: $WARNING_THRESHOLD%"
log_message " Critical threshold: $CRITICAL_THRESHOLD%"
log_message " Monitor runs every 30 minutes"
log_message " Cleanup runs daily at 2 AM"
log_message " Logs: /var/log/disk-monitor/"
Review the script before running. Execute with: bash install.sh