Configure Linux process resource monitoring and alerting with cgroups and systemd

Intermediate 25 min Apr 03, 2026 28 views
Ubuntu 24.04 Ubuntu 22.04 Debian 12 AlmaLinux 9 Rocky Linux 9 Fedora 41

Set up comprehensive process resource monitoring using cgroups v2 and systemd to track CPU, memory, and I/O usage with automated alerting when processes exceed defined limits.

Prerequisites

  • Root or sudo access
  • systemd-based Linux distribution
  • Basic understanding of Linux processes

What this solves

This tutorial shows you how to implement production-grade process resource monitoring using cgroups v2 and systemd service units with defined resource limits. You'll learn to monitor CPU usage, memory consumption, and I/O throughput for critical applications, with automated alerts when processes exceed their allocated resources. This approach prevents resource exhaustion, improves system stability, and provides detailed insights into application performance patterns.

Step-by-step configuration

Verify cgroups v2 support

Check that your system is using cgroups v2, which provides unified hierarchy and improved resource management features.

mount | grep cgroup
cat /proc/filesystems | grep cgroup

You should see cgroup2 filesystem mounted at /sys/fs/cgroup. If you see cgroup v1, enable v2 by adding systemd.unified_cgroup_hierarchy=1 to your kernel boot parameters.

Install monitoring tools

Install the necessary packages for cgroup monitoring and system resource analysis.

sudo apt update
sudo apt install -y cgroup-tools systemd-cgroup-utils htop iotop sysstat mailutils
sudo dnf update -y
sudo dnf install -y libcgroup-tools systemd htop iotop sysstat mailx

Create a sample application service

Create a test application that we'll monitor and apply resource limits to demonstrate the monitoring system.

sudo mkdir -p /opt/testapp
sudo tee /opt/testapp/cpu-intensive.sh > /dev/null << 'EOF'
#!/bin/bash

CPU intensive test application

echo "Starting CPU intensive application - PID: $$" while true; do # Simulate CPU work dd if=/dev/zero of=/dev/null bs=1M count=100 2>/dev/null sleep 2 done EOF
sudo chmod 755 /opt/testapp/cpu-intensive.sh

Create systemd service with resource limits

Define a systemd service unit with specific CPU, memory, and I/O limits using cgroups v2 directives.

[Unit]
Description=Test Application for Resource Monitoring
After=network.target

[Service]
Type=simple
User=nobody
Group=nogroup
ExecStart=/opt/testapp/cpu-intensive.sh
Restart=always
RestartSec=10

CPU Limits - 50% of one CPU core

CPUQuota=50% CPUWeight=100

Memory Limits - 512MB

MemoryMax=512M MemoryHigh=400M

I/O Limits - 10MB/s read, 5MB/s write

IOReadBandwidthMax=/dev/sda 10M IOWriteBandwidthMax=/dev/sda 5M

Task limits

TasksMax=50

Enable accounting

CPUAccounting=true MemoryAccounting=true IOAccounting=true TasksAccounting=true [Install] WantedBy=multi-user.target

Create monitoring script

Build a comprehensive monitoring script that checks resource usage and sends alerts when limits are exceeded.

#!/bin/bash

Configuration

SERVICE_NAME="testapp" ALERT_EMAIL="admin@example.com" CPU_THRESHOLD=80 # Alert if CPU usage > 80% of quota MEMORY_THRESHOLD=90 # Alert if memory usage > 90% of limit LOG_FILE="/var/log/resource-monitor.log" ALERT_COOLDOWN=300 # 5 minutes between identical alerts COOLDOWN_FILE="/tmp/alert-cooldown-${SERVICE_NAME}"

Logging function

log_message() { echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" | tee -a "$LOG_FILE" }

Check if service is running

if ! systemctl is-active --quiet "$SERVICE_NAME"; then log_message "WARNING: Service $SERVICE_NAME is not running" exit 1 fi

Get service cgroup path

CGROUP_PATH=$(systemctl show "$SERVICE_NAME" --property=ControlGroup --value) SYS_CGROUP_PATH="/sys/fs/cgroup$CGROUP_PATH" if [ ! -d "$SYS_CGROUP_PATH" ]; then log_message "ERROR: Cgroup path not found: $SYS_CGROUP_PATH" exit 1 fi

CPU monitoring

if [ -f "$SYS_CGROUP_PATH/cpu.stat" ]; then CPU_USAGE_USEC=$(grep "usage_usec" "$SYS_CGROUP_PATH/cpu.stat" | cut -d' ' -f2) CPU_THROTTLED_USEC=$(grep "throttled_usec" "$SYS_CGROUP_PATH/cpu.stat" | cut -d' ' -f2) # Calculate CPU usage percentage (simplified) if [ -n "$CPU_THROTTLED_USEC" ] && [ "$CPU_THROTTLED_USEC" -gt 0 ]; then CPU_THROTTLE_PERCENT=$((CPU_THROTTLED_USEC * 100 / CPU_USAGE_USEC)) if [ "$CPU_THROTTLE_PERCENT" -gt "$CPU_THRESHOLD" ]; then ALERT_MSG="CPU throttling detected: ${CPU_THROTTLE_PERCENT}% for service $SERVICE_NAME" log_message "ALERT: $ALERT_MSG" fi fi fi

Memory monitoring

if [ -f "$SYS_CGROUP_PATH/memory.current" ] && [ -f "$SYS_CGROUP_PATH/memory.max" ]; then MEMORY_CURRENT=$(cat "$SYS_CGROUP_PATH/memory.current") MEMORY_MAX=$(cat "$SYS_CGROUP_PATH/memory.max") if [ "$MEMORY_MAX" != "max" ]; then MEMORY_PERCENT=$((MEMORY_CURRENT * 100 / MEMORY_MAX)) MEMORY_MB=$((MEMORY_CURRENT / 1024 / 1024)) MEMORY_MAX_MB=$((MEMORY_MAX / 1024 / 1024)) log_message "INFO: Memory usage: ${MEMORY_MB}MB / ${MEMORY_MAX_MB}MB (${MEMORY_PERCENT}%)" if [ "$MEMORY_PERCENT" -gt "$MEMORY_THRESHOLD" ]; then ALERT_MSG="High memory usage: ${MEMORY_PERCENT}% (${MEMORY_MB}MB/${MEMORY_MAX_MB}MB) for service $SERVICE_NAME" log_message "ALERT: $ALERT_MSG" # Send email alert with cooldown if [ ! -f "$COOLDOWN_FILE" ] || [ $(($(date +%s) - $(stat -c %Y "$COOLDOWN_FILE" 2>/dev/null || echo 0))) -gt "$ALERT_COOLDOWN" ]; then echo "$ALERT_MSG" | mail -s "Resource Alert: $SERVICE_NAME" "$ALERT_EMAIL" 2>/dev/null touch "$COOLDOWN_FILE" fi fi fi fi

I/O monitoring

if [ -f "$SYS_CGROUP_PATH/io.stat" ]; then IO_STATS=$(cat "$SYS_CGROUP_PATH/io.stat") if [ -n "$IO_STATS" ]; then READ_BYTES=$(echo "$IO_STATS" | grep -E '^[0-9]+:[0-9]+ rbytes=' | head -1 | grep -o 'rbytes=[0-9]*' | cut -d'=' -f2) WRITE_BYTES=$(echo "$IO_STATS" | grep -E '^[0-9]+:[0-9]+ wbytes=' | head -1 | grep -o 'wbytes=[0-9]*' | cut -d'=' -f2) if [ -n "$READ_BYTES" ] && [ -n "$WRITE_BYTES" ]; then READ_MB=$((READ_BYTES / 1024 / 1024)) WRITE_MB=$((WRITE_BYTES / 1024 / 1024)) log_message "INFO: I/O usage: Read ${READ_MB}MB, Write ${WRITE_MB}MB" fi fi fi

Task monitoring

if [ -f "$SYS_CGROUP_PATH/pids.current" ] && [ -f "$SYS_CGROUP_PATH/pids.max" ]; then TASKS_CURRENT=$(cat "$SYS_CGROUP_PATH/pids.current") TASKS_MAX=$(cat "$SYS_CGROUP_PATH/pids.max") log_message "INFO: Tasks: $TASKS_CURRENT / $TASKS_MAX" if [ "$TASKS_MAX" != "max" ]; then TASKS_PERCENT=$((TASKS_CURRENT * 100 / TASKS_MAX)) if [ "$TASKS_PERCENT" -gt 80 ]; then log_message "ALERT: High task count: ${TASKS_PERCENT}% ($TASKS_CURRENT/$TASKS_MAX) for service $SERVICE_NAME" fi fi fi
sudo chmod 755 /opt/testapp/monitor-resources.sh
sudo mkdir -p /var/log
sudo touch /var/log/resource-monitor.log
sudo chown syslog:adm /var/log/resource-monitor.log

Create systemd timer for automated monitoring

Set up a systemd timer to run the monitoring script at regular intervals, providing continuous resource oversight.

[Unit]
Description=Resource Monitor for Applications
After=multi-user.target

[Service]
Type=oneshot
ExecStart=/opt/testapp/monitor-resources.sh
User=root
StandardOutput=journal
StandardError=journal
[Unit]
Description=Run Resource Monitor every 2 minutes
Requires=resource-monitor.service

[Timer]
OnCalendar=::0/2
Persistent=true

[Install]
WantedBy=timers.target

Enable and start services

Start the test application and monitoring system, then enable automatic startup on boot.

sudo systemctl daemon-reload
sudo systemctl enable --now testapp.service
sudo systemctl enable --now resource-monitor.timer
sudo systemctl status testapp.service
sudo systemctl status resource-monitor.timer

Create manual monitoring commands

Set up convenient commands for real-time resource monitoring and analysis.

#!/bin/bash

if [ $# -eq 0 ]; then
    echo "Usage: $0 "
    echo "Example: $0 testapp"
    exit 1
fi

SERVICE_NAME="$1"

if ! systemctl is-active --quiet "$SERVICE_NAME"; then
    echo "Service $SERVICE_NAME is not running"
    exit 1
fi

CGROUP_PATH=$(systemctl show "$SERVICE_NAME" --property=ControlGroup --value)
SYS_CGROUP_PATH="/sys/fs/cgroup$CGROUP_PATH"

echo "=== Resource Usage for $SERVICE_NAME ==="
echo "Cgroup Path: $SYS_CGROUP_PATH"
echo

CPU Information

echo "CPU Usage:" if [ -f "$SYS_CGROUP_PATH/cpu.stat" ]; then cat "$SYS_CGROUP_PATH/cpu.stat" else echo "CPU stats not available" fi echo

Memory Information

echo "Memory Usage:" if [ -f "$SYS_CGROUP_PATH/memory.current" ]; then CURRENT=$(cat "$SYS_CGROUP_PATH/memory.current") MAX=$(cat "$SYS_CGROUP_PATH/memory.max" 2>/dev/null || echo "unlimited") echo "Current: $((CURRENT / 1024 / 1024)) MB" if [ "$MAX" != "max" ] && [ "$MAX" != "unlimited" ]; then echo "Limit: $((MAX / 1024 / 1024)) MB" echo "Usage: $((CURRENT * 100 / MAX))%" else echo "Limit: unlimited" fi fi echo

I/O Information

echo "I/O Usage:" if [ -f "$SYS_CGROUP_PATH/io.stat" ]; then cat "$SYS_CGROUP_PATH/io.stat" else echo "I/O stats not available" fi echo

Task Information

echo "Task Count:" if [ -f "$SYS_CGROUP_PATH/pids.current" ]; then CURRENT_TASKS=$(cat "$SYS_CGROUP_PATH/pids.current") MAX_TASKS=$(cat "$SYS_CGROUP_PATH/pids.max" 2>/dev/null || echo "unlimited") echo "Current: $CURRENT_TASKS" echo "Limit: $MAX_TASKS" fi
sudo chmod 755 /usr/local/bin/check-cgroup-resources

Verify your setup

Test the monitoring system and verify that resource limits are being enforced correctly.

# Check service status
sudo systemctl status testapp.service

View real-time resource usage

check-cgroup-resources testapp

Monitor system resources

sudo systemctl show testapp.service --property=CPUUsageNSec,MemoryCurrent,TasksCurrent

Check monitoring logs

sudo tail -f /var/log/resource-monitor.log

View timer status

sudo systemctl list-timers resource-monitor.timer
Note: The test application will consume CPU resources but should be limited by the 50% quota. Memory usage should remain below the 512MB limit. Check the logs for any alerts or resource violations.

Advanced monitoring configuration

Configure email notifications

Set up email alerts for critical resource threshold breaches. This assumes you have a mail server configured.

# Configure postfix for local mail delivery
sudo dpkg-reconfigure postfix

For production environments, integrate with your existing SMTP server or monitoring system like Prometheus and Grafana for more sophisticated alerting.

Create dashboard script

Build a comprehensive dashboard for monitoring multiple services simultaneously.

#!/bin/bash

SERVICES=("testapp" "nginx" "postgresql")

while true; do
    clear
    echo "=== System Resource Dashboard ==="
    echo "$(date)"
    echo
    
    for service in "${SERVICES[@]}"; do
        if systemctl is-active --quiet "$service" 2>/dev/null; then
            echo "[$service] - RUNNING"
            CGROUP_PATH=$(systemctl show "$service" --property=ControlGroup --value 2>/dev/null)
            if [ -n "$CGROUP_PATH" ]; then
                SYS_CGROUP_PATH="/sys/fs/cgroup$CGROUP_PATH"
                
                # Memory usage
                if [ -f "$SYS_CGROUP_PATH/memory.current" ]; then
                    CURRENT=$(cat "$SYS_CGROUP_PATH/memory.current")
                    echo "  Memory: $((CURRENT / 1024 / 1024)) MB"
                fi
                
                # Task count
                if [ -f "$SYS_CGROUP_PATH/pids.current" ]; then
                    TASKS=$(cat "$SYS_CGROUP_PATH/pids.current")
                    echo "  Tasks: $TASKS"
                fi
            fi
            echo
        else
            echo "[$service] - STOPPED"
            echo
        fi
    done
    
    echo "Press Ctrl+C to exit"
    sleep 5
done
sudo chmod 755 /usr/local/bin/resource-dashboard

Production deployment considerations

Warning: Always test resource limits in a staging environment before applying to production services. Overly restrictive limits can cause service failures or performance degradation.

For production deployments, consider these additional configurations:

  • Integrate monitoring with your existing observability stack
  • Use more sophisticated alerting rules based on historical usage patterns
  • Implement gradual limit enforcement with warning thresholds
  • Set up log rotation for monitoring logs to prevent disk space issues
  • Configure backup notification channels beyond email

You can extend this monitoring approach to work with container orchestration systems or integrate with CPU scheduling optimization for more comprehensive resource management.

Common issues

SymptomCauseFix
Monitoring script shows "Cgroup path not found"Service not using cgroups v2 or systemdVerify service is managed by systemd: systemctl show service --property=ControlGroup
CPU limits not enforcedCPUAccounting disabledAdd CPUAccounting=true to service unit and reload
Memory alerts not workingMemory accounting disabledEnable with MemoryAccounting=true in service unit
Email alerts not sendingMail system not configuredInstall and configure postfix: sudo apt install postfix mailutils
Timer not running monitoringTimer service not enabledEnable timer: sudo systemctl enable --now resource-monitor.timer
Resource limits too restrictiveInsufficient resources allocatedAnalyze usage patterns and adjust limits in service unit

Next steps

Automated install script

Run this to automate the entire setup

#cgroups #systemd #resource monitoring #process limits #alerting

Need help?

Don't want to manage this yourself?

We handle infrastructure for businesses that depend on uptime. From initial setup to ongoing operations.

Talk to an engineer