Linux Storage Monitoring with smartmontools

Monitor disk health and prevent storage failures with S.M.A.R.T monitoring, automated email alerts, and custom dashboards. Covers smartd daemon configuration, health checks, and integration with monitoring systems.

Prerequisites

Root access to the server
Email system configured (postfix/sendmail)
At least one storage device with S.M.A.R.T support

What this solves

Disk failures are one of the most common causes of data loss and service outages in production environments. S.M.A.R.T (Self-Monitoring, Analysis and Reporting Technology) provides early warning signs of impending disk failures, allowing you to replace drives before they fail completely. This tutorial sets up smartmontools to continuously monitor disk health, send automated alerts, and integrate with monitoring dashboards.

Step-by-step installation

Install smartmontools package

Install the smartmontools package which provides the smartctl command and smartd daemon for continuous monitoring.

sudo apt update
sudo apt install -y smartmontools mailutils

sudo dnf update -y
sudo dnf install -y smartmontools mailx

Identify available storage devices

Scan for all storage devices and check which ones support S.M.A.R.T monitoring capabilities.

sudo smartctl --scan
sudo smartctl --info /dev/sda
sudo smartctl --health /dev/sda

This shows all detected drives and their S.M.A.R.T status. Note the device paths (like /dev/sda, /dev/nvme0n1) for configuration.

Configure email notifications

Set up system email to receive S.M.A.R.T alerts. Configure postfix or use an external SMTP relay.

sudo dpkg-reconfigure postfix

sudo dnf install -y postfix
sudo systemctl enable --now postfix

Choose "Internet Site" and enter your server's hostname. For production, configure proper SMTP relay settings.

Configure smartd daemon

Create the smartd configuration file to monitor specific drives and define alert conditions.

# Monitor all SATA/SAS drives, enable all S.M.A.R.T tests
/dev/sda -a -d auto -n standby,q -s (S/../.././02|L/../../6/03) -m admin@example.com -M exec /usr/share/smartmontools/smartd-runner
/dev/sdb -a -d auto -n standby,q -s (S/../.././02|L/../../6/03) -m admin@example.com -M exec /usr/share/smartmontools/smartd-runner

Monitor NVMe drives
/dev/nvme0n1 -a -n standby,q -s (S/../.././02|L/../../6/03) -m admin@example.com -M exec /usr/share/smartmontools/smartd-runner

Global settings
DEVICESCAN -d removable -n standby -m admin@example.com -M exec /usr/share/smartmontools/smartd-runner

Replace admin@example.com with your actual email address. The configuration monitors all drives, runs short tests daily at 2 AM and long tests weekly on Saturdays at 3 AM.

Create custom alert script

Create a custom script for enhanced alerting with more detailed information and multiple notification channels.

#!/bin/bash

Enhanced S.M.A.R.T alert script
Usage: Called by smartd when issues are detected

DEVICE="$1"
MSG="$2"
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
HOSTNAME=$(hostname -f)
LOGFILE="/var/log/smartd-alerts.log"

Log the alert
echo "[$TIMESTAMP] $HOSTNAME: $DEVICE - $MSG" >> "$LOGFILE"

Get detailed S.M.A.R.T information
SMART_INFO=$(smartctl -a "$DEVICE" 2>/dev/null)
HEALTH_STATUS=$(smartctl -H "$DEVICE" 2>/dev/null | grep "SMART overall-health")

Create detailed email
EMAIL_BODY="S.M.A.R.T Alert - $HOSTNAME

Timestamp: $TIMESTAMP
Device: $DEVICE
Message: $MSG
Health Status: $HEALTH_STATUS

Full S.M.A.R.T Data:
$SMART_INFO

Please check the drive immediately and consider replacement if errors persist."

Send email
echo "$EMAIL_BODY" | mail -s "[ALERT] S.M.A.R.T Issue on $HOSTNAME - $DEVICE" admin@example.com

Optional: Send to monitoring system
curl -X POST https://monitoring.example.com/webhook -d "{\"alert\":\"smart\",\"device\":\"$DEVICE\",\"message\":\"$MSG\"}"

Log to syslog
logger -t smartd-alert "S.M.A.R.T issue on $DEVICE: $MSG"

sudo chmod +x /usr/local/bin/smart-alert.sh

Update smartd configuration for custom script

Modify the smartd configuration to use the custom alert script instead of default email.

# Monitor all drives with custom alerting
/dev/sda -a -d auto -n standby,q -s (S/../.././02|L/../../6/03) -m admin@example.com -M exec /usr/local/bin/smart-alert.sh
/dev/sdb -a -d auto -n standby,q -s (S/../.././02|L/../../6/03) -m admin@example.com -M exec /usr/local/bin/smart-alert.sh
/dev/nvme0n1 -a -n standby,q -s (S/../.././02|L/../../6/03) -m admin@example.com -M exec /usr/local/bin/smart-alert.sh

Enable automatic device scanning for hot-plugged drives
DEVICESCAN -d removable -n standby -m admin@example.com -M exec /usr/local/bin/smart-alert.sh

Enable and start smartd service

Enable the smartd daemon to start automatically and begin monitoring immediately.

sudo systemctl enable smartd
sudo systemctl start smartd
sudo systemctl status smartd

Create monitoring dashboard integration

Create a script to export S.M.A.R.T metrics for monitoring systems like Prometheus.

#!/bin/bash

S.M.A.R.T metrics exporter for monitoring systems
Outputs metrics in Prometheus format

METRICS_FILE="/var/lib/node_exporter/textfile_collector/smart.prom"
TEMP_FILE="/tmp/smart_metrics.$$"

Create directory if it doesn't exist
sudo mkdir -p /var/lib/node_exporter/textfile_collector

Clear previous metrics
echo "# HELP smart_device_health S.M.A.R.T device health status (1=healthy, 0=failing)" > "$TEMP_FILE"
echo "# TYPE smart_device_health gauge" >> "$TEMP_FILE"
echo "# HELP smart_temperature_celsius Current drive temperature" >> "$TEMP_FILE"
echo "# TYPE smart_temperature_celsius gauge" >> "$TEMP_FILE"
echo "# HELP smart_power_on_hours Drive power-on hours" >> "$TEMP_FILE"
echo "# TYPE smart_power_on_hours gauge" >> "$TEMP_FILE"

Scan all devices
for device in $(lsblk -dpno NAME | grep -E '(sd[a-z]|nvme[0-9]n[0-9])'); do
    # Check if device supports S.M.A.R.T
    if smartctl -i "$device" >/dev/null 2>&1; then
        device_name=$(basename "$device")
        
        # Get health status
        health=$(smartctl -H "$device" 2>/dev/null | grep -c "PASSED")
        echo "smart_device_health{device=\"$device_name\"} $health" >> "$TEMP_FILE"
        
        # Get temperature
        temp=$(smartctl -A "$device" 2>/dev/null | awk '/Temperature_Celsius/ {print $10}' | head -1)
        if [[ -n "$temp" && "$temp" =~ ^[0-9]+$ ]]; then
            echo "smart_temperature_celsius{device=\"$device_name\"} $temp" >> "$TEMP_FILE"
        fi
        
        # Get power-on hours
        hours=$(smartctl -A "$device" 2>/dev/null | awk '/Power_On_Hours/ {print $10}' | head -1)
        if [[ -n "$hours" && "$hours" =~ ^[0-9]+$ ]]; then
            echo "smart_power_on_hours{device=\"$device_name\"} $hours" >> "$TEMP_FILE"
        fi
    fi
done

Atomically update metrics file
sudo mv "$TEMP_FILE" "$METRICS_FILE"
sudo chown node_exporter:node_exporter "$METRICS_FILE" 2>/dev/null || true

sudo chmod +x /usr/local/bin/smart-metrics.sh

Set up automated metrics collection

Create a cron job to regularly update S.M.A.R.T metrics for your monitoring system.

sudo crontab -e

# Update S.M.A.R.T metrics every 5 minutes
/5    * /usr/local/bin/smart-metrics.sh

Weekly S.M.A.R.T health report
0 9   1 /usr/local/bin/smart-health-report.sh

Create health reporting script

Generate comprehensive weekly health reports with trend analysis.

#!/bin/bash

Weekly S.M.A.R.T health report generator

REPORT_FILE="/tmp/smart_health_report_$(date +%Y%m%d).txt"
HOSTNAME=$(hostname -f)

echo "S.M.A.R.T Health Report for $HOSTNAME" > "$REPORT_FILE"
echo "Generated: $(date)" >> "$REPORT_FILE"
echo "======================================" >> "$REPORT_FILE"
echo >> "$REPORT_FILE"

for device in $(lsblk -dpno NAME | grep -E '(sd[a-z]|nvme[0-9]n[0-9])'); do
    if smartctl -i "$device" >/dev/null 2>&1; then
        echo "Device: $device" >> "$REPORT_FILE"
        echo "------------------" >> "$REPORT_FILE"
        
        # Basic info
        smartctl -i "$device" | grep -E '(Model|Serial|Capacity)' >> "$REPORT_FILE"
        
        # Health status
        echo >> "$REPORT_FILE"
        smartctl -H "$device" >> "$REPORT_FILE"
        
        # Key attributes
        echo >> "$REPORT_FILE"
        echo "Key Attributes:" >> "$REPORT_FILE"
        smartctl -A "$device" | grep -E '(Reallocated_Sector_Ct|Spin_Retry_Count|End-to-End_Error|Reported_Uncorrect|Command_Timeout|Current_Pending_Sector|Offline_Uncorrectable|Temperature_Celsius|Power_On_Hours)' >> "$REPORT_FILE"
        
        # Recent errors
        echo >> "$REPORT_FILE"
        echo "Recent Errors:" >> "$REPORT_FILE"
        smartctl -l error "$device" | head -10 >> "$REPORT_FILE"
        
        echo >> "$REPORT_FILE"
        echo "======================================" >> "$REPORT_FILE"
        echo >> "$REPORT_FILE"
    fi
done

Email the report
mail -s "Weekly S.M.A.R.T Health Report - $HOSTNAME" admin@example.com < "$REPORT_FILE"

Clean up
rm -f "$REPORT_FILE"

sudo chmod +x /usr/local/bin/smart-health-report.sh

Configure advanced monitoring options

Set up temperature monitoring

Configure specific temperature thresholds and cooling alerts for high-performance environments.

# Temperature monitoring with custom thresholds
/dev/sda -a -d auto -W 4,35,40 -s (S/../.././02|L/../../6/03) -m admin@example.com -M exec /usr/local/bin/smart-alert.sh
/dev/sdb -a -d auto -W 4,35,40 -s (S/../.././02|L/../../6/03) -m admin@example.com -M exec /usr/local/bin/smart-alert.sh

The -W option sets temperature monitoring: difference threshold (4°C), informal warning (35°C), and critical temperature (40°C).

Configure attribute monitoring

Monitor specific S.M.A.R.T attributes that indicate drive degradation.

# Monitor critical attributes with custom thresholds
/dev/sda -a -d auto -k -f -r 194 -r 9 -U 198 -I 194 -s (S/../.././02|L/../../6/03) -m admin@example.com -M exec /usr/local/bin/smart-alert.sh

This monitors raw read error rate (-r 194), power-on hours (-r 9), offline uncorrectable errors (-U 198), and ignores temperature attribute for alerting (-I 194).

Verify your setup

Test the monitoring configuration and verify all components are working correctly.

# Check smartd service status
sudo systemctl status smartd

Verify configuration syntax
sudo smartd -q onecheck

Test manual S.M.A.R.T check on all drives
sudo smartctl -a /dev/sda
sudo smartctl -t short /dev/sda

Check if metrics are being generated
ls -la /var/lib/node_exporter/textfile_collector/
cat /var/lib/node_exporter/textfile_collector/smart.prom

Test email notifications
echo "Test S.M.A.R.T alert" | mail -s "Test Alert" admin@example.com

View recent smartd logs
journalctl -u smartd -f

You can also integrate this monitoring with existing systems by linking to our system monitoring setup or Prometheus monitoring infrastructure.

Common issues

Symptom	Cause	Fix
smartd service fails to start	Invalid device paths in config	Run `sudo smartctl --scan` and update device paths in `/etc/smartd.conf`
No email alerts received	Mail system not configured	Test with `echo "test" \| mail admin@example.com` and configure postfix properly
USB/removable drives cause errors	smartd trying to monitor disconnected drives	Use `-n standby,q` option and ensure DEVICESCAN includes `-d removable`
High CPU usage from smartd	Too frequent testing schedule	Reduce test frequency in schedule: `-s (S/../../7/02\|L/../../6/03)` for weekly short tests
Metrics not appearing in Prometheus	Wrong file permissions or path	Check `/var/lib/node_exporter/textfile_collector/` permissions and node_exporter config
False temperature alerts	Normal seasonal temperature changes	Adjust temperature thresholds in `-W` option or use `-I 194` to ignore temperature alerts

Next steps

Set up backup monitoring with Prometheus and Grafana to complement your storage monitoring
Configure system resource monitoring for comprehensive server health tracking
Implement automated database backups as part of your data protection strategy
Create advanced Grafana dashboards for disk health visualization
Set up RAID monitoring and automated alerts for hardware RAID systems

Running this in production?

Want this handled for you? Setting this up once is straightforward. Keeping it patched, monitored, backed up and performant across environments is the harder part. See how we run infrastructure like this for European teams.

Automated install script

Run this to automate the entire setup

install.sh

#!/usr/bin/env bash
set -euo pipefail

# Production-quality S.M.A.R.T monitoring setup script
# Installs smartmontools with automated health alerts

# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m'

# Global variables
EMAIL=""
HOSTNAME=$(hostname -f)
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

# Cleanup function
cleanup() {
    if [[ $? -ne 0 ]]; then
        echo -e "${RED}[ERROR] Installation failed. Check logs above.${NC}"
    fi
}
trap cleanup EXIT

usage() {
    echo "Usage: $0 [-e EMAIL] [-h]"
    echo "  -e EMAIL    Email address for S.M.A.R.T alerts (required)"
    echo "  -h          Show this help message"
    exit 1
}

log_info() {
    echo -e "${GREEN}[INFO]${NC} $1"
}

log_warn() {
    echo -e "${YELLOW}[WARN]${NC} $1"
}

log_error() {
    echo -e "${RED}[ERROR]${NC} $1"
}

# Parse arguments
while getopts "e:h" opt; do
    case $opt in
        e) EMAIL="$OPTARG" ;;
        h) usage ;;
        *) usage ;;
    esac
done

if [[ -z "$EMAIL" ]]; then
    log_error "Email address is required"
    usage
fi

# Email validation
if [[ ! "$EMAIL" =~ ^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$ ]]; then
    log_error "Invalid email address format"
    exit 1
fi

# Check if running as root
if [[ $EUID -ne 0 ]]; then
    log_error "This script must be run as root"
    exit 1
fi

echo -e "${BLUE}S.M.A.R.T Monitoring Setup${NC}"
echo "Email: $EMAIL"
echo "Hostname: $HOSTNAME"
echo ""

# Detect distribution
echo -e "${BLUE}[1/8]${NC} Detecting system..."
if [ -f /etc/os-release ]; then
    . /etc/os-release
    case "$ID" in
        ubuntu|debian) 
            PKG_MGR="apt"
            PKG_UPDATE="apt update -qq"
            PKG_INSTALL="apt install -y"
            MAIL_PKG="mailutils"
            ;;
        almalinux|rocky|centos|rhel|ol|fedora) 
            PKG_MGR="dnf"
            PKG_UPDATE="dnf update -q -y"
            PKG_INSTALL="dnf install -y"
            MAIL_PKG="mailx"
            ;;
        amzn) 
            PKG_MGR="yum"
            PKG_UPDATE="yum update -q -y"
            PKG_INSTALL="yum install -y"
            MAIL_PKG="mailx"
            ;;
        *) 
            log_error "Unsupported distribution: $ID"
            exit 1
            ;;
    esac
    log_info "Detected: $PRETTY_NAME ($PKG_MGR)"
else
    log_error "Cannot detect distribution (/etc/os-release not found)"
    exit 1
fi

# Update package cache
echo -e "${BLUE}[2/8]${NC} Updating package cache..."
$PKG_UPDATE

# Install smartmontools and mail utilities
echo -e "${BLUE}[3/8]${NC} Installing smartmontools and mail utilities..."
$PKG_INSTALL smartmontools $MAIL_PKG

# Install and configure postfix for email
echo -e "${BLUE}[4/8]${NC} Configuring mail system..."
if [[ "$PKG_MGR" == "apt" ]]; then
    DEBIAN_FRONTEND=noninteractive $PKG_INSTALL postfix
    # Configure postfix as internet site
    echo "$HOSTNAME" > /etc/mailname
    postconf -e "myhostname = $HOSTNAME"
    postconf -e "mydestination = $HOSTNAME, localhost"
    postconf -e "relayhost ="
else
    $PKG_INSTALL postfix
    postconf -e "myhostname = $HOSTNAME"
    postconf -e "mydestination = $HOSTNAME, localhost"
    postconf -e "relayhost ="
fi

systemctl enable postfix
systemctl start postfix

# Scan for storage devices
echo -e "${BLUE}[5/8]${NC} Scanning for storage devices..."
log_info "Available storage devices:"
smartctl --scan || true

DEVICES=($(smartctl --scan | awk '{print $1}' | head -10))
if [[ ${#DEVICES[@]} -eq 0 ]]; then
    log_warn "No S.M.A.R.T capable devices found"
fi

for device in "${DEVICES[@]}"; do
    if [[ -e "$device" ]]; then
        health=$(smartctl --health "$device" 2>/dev/null | grep "SMART overall-health" || echo "Unknown")
        log_info "  $device: $health"
    fi
done

# Create custom alert script
echo -e "${BLUE}[6/8]${NC} Creating custom alert script..."
cat > /usr/local/bin/smart-alert.sh << 'EOF'
#!/bin/bash
# Enhanced S.M.A.R.T alert script
# Usage: Called by smartd when issues are detected

DEVICE="$1"
MSG="$2"
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
HOSTNAME=$(hostname -f)
LOGFILE="/var/log/smartd-alerts.log"

# Log the alert
echo "[$TIMESTAMP] $HOSTNAME: $DEVICE - $MSG" >> "$LOGFILE"

# Get detailed S.M.A.R.T information
SMART_INFO=$(smartctl -a "$DEVICE" 2>/dev/null)
HEALTH_STATUS=$(smartctl -H "$DEVICE" 2>/dev/null | grep "SMART overall-health" || echo "Status unknown")

# Create detailed email
EMAIL_BODY="S.M.A.R.T Alert - $HOSTNAME

Timestamp: $TIMESTAMP
Device: $DEVICE
Message: $MSG
Health Status: $HEALTH_STATUS

Full S.M.A.R.T Data:
$SMART_INFO

Please check the drive immediately and consider replacement if errors persist."

# Send email
echo "$EMAIL_BODY" | mail -s "[ALERT] S.M.A.R.T Issue on $HOSTNAME - $DEVICE" "$3"

# Log to syslog
logger -t smartd-alert "S.M.A.R.T issue on $DEVICE: $MSG"
EOF

chmod 755 /usr/local/bin/smart-alert.sh
chown root:root /usr/local/bin/smart-alert.sh

# Configure smartd
echo -e "${BLUE}[7/8]${NC} Configuring smartd daemon..."
cp /etc/smartd.conf /etc/smartd.conf.backup 2>/dev/null || true

cat > /etc/smartd.conf << EOF
# smartd configuration for automated monitoring
# Generated by smart-setup script on $(date)

# Global settings - scan for all devices
DEVICESCAN -d removable -n standby -s (S/../.././02|L/../../6/03) -m $EMAIL -M exec /usr/local/bin/smart-alert.sh

# Specific device monitoring (uncomment and modify as needed)
EOF

# Add specific devices if found
for device in "${DEVICES[@]}"; do
    if [[ -e "$device" ]]; then
        echo "# $device -a -d auto -n standby,q -s (S/../.././02|L/../../6/03) -m $EMAIL -M exec /usr/local/bin/smart-alert.sh" >> /etc/smartd.conf
    fi
done

# Create log file
touch /var/log/smartd-alerts.log
chmod 644 /var/log/smartd-alerts.log
chown root:root /var/log/smartd-alerts.log

# Enable and start smartd
systemctl enable smartd
systemctl restart smartd

# Start services and final checks
echo -e "${BLUE}[8/8]${NC} Performing final verification..."

# Test email functionality
log_info "Testing email functionality..."
echo "S.M.A.R.T monitoring setup completed on $HOSTNAME at $(date)" | mail -s "S.M.A.R.T Monitoring Setup Complete" "$EMAIL" || log_warn "Email test may have failed"

# Check service status
if systemctl is-active --quiet smartd; then
    log_info "smartd service is running"
else
    log_error "smartd service failed to start"
    exit 1
fi

if systemctl is-active --quiet postfix; then
    log_info "postfix service is running"
else
    log_warn "postfix service is not running - email alerts may not work"
fi

# Display configuration summary
echo ""
echo -e "${GREEN}=== Setup Complete ===${NC}"
echo "Email alerts: $EMAIL"
echo "Monitored devices: ${#DEVICES[@]} found"
echo "Config file: /etc/smartd.conf"
echo "Alert script: /usr/local/bin/smart-alert.sh"
echo "Log file: /var/log/smartd-alerts.log"
echo ""
echo "S.M.A.R.T tests scheduled:"
echo "  - Short test: Daily at 2:00 AM"
echo "  - Long test: Weekly on Saturday at 3:00 AM"
echo ""
log_info "Monitor logs with: tail -f /var/log/smartd-alerts.log"
log_info "Manual device check: smartctl -a /dev/sda"
log_info "Check service status: systemctl status smartd"

Review the script before running. Execute with: bash install.sh

#smartmontools #disk monitoring #storage health #smartd #alerts

Set up Linux storage monitoring with smartmontools and automated health alerts

Prerequisites

What this solves

Step-by-step installation

Install smartmontools package

Identify available storage devices

Configure email notifications

Configure smartd daemon

Monitor NVMe drives

Global settings

Create custom alert script

Enhanced S.M.A.R.T alert script

Usage: Called by smartd when issues are detected

Log the alert

Get detailed S.M.A.R.T information

Create detailed email

Send email

Optional: Send to monitoring system

curl -X POST https://monitoring.example.com/webhook -d "{\"alert\":\"smart\",\"device\":\"$DEVICE\",\"message\":\"$MSG\"}"

Log to syslog

Update smartd configuration for custom script

Enable automatic device scanning for hot-plugged drives

Enable and start smartd service

Create monitoring dashboard integration

S.M.A.R.T metrics exporter for monitoring systems

Outputs metrics in Prometheus format

Create directory if it doesn't exist

Clear previous metrics

Scan all devices

Atomically update metrics file

Set up automated metrics collection

Weekly S.M.A.R.T health report

Create health reporting script

Weekly S.M.A.R.T health report generator

Email the report

Clean up

Configure advanced monitoring options

Set up temperature monitoring

Configure attribute monitoring

Verify your setup

Verify configuration syntax

Test manual S.M.A.R.T check on all drives

Check if metrics are being generated

Test email notifications

View recent smartd logs

Common issues

Next steps

Running this in production?

Related tutorials

Configure Consul Connect service mesh monitoring with distributed tracing

Configure OpenTelemetry custom metrics for application monitoring with Prometheus and Grafana

Configure Jaeger with Elasticsearch backend security and encryption

Don't want to manage this yourself?