Implement Linux resource quotas with systemd and automated enforcement

Intermediate 45 min Apr 17, 2026 30 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Configure systemd resource control and cgroups v2 to implement CPU, memory, and I/O quotas with automated enforcement. Set up monitoring and alerts for resource violations across production workloads.

Prerequisites

  • Root or sudo access
  • systemd version 245 or newer
  • Basic understanding of Linux system administration
  • Python 3 for monitoring scripts

What this solves

Resource quotas prevent applications from consuming unlimited system resources, which can crash servers and affect other workloads. This tutorial shows you how to implement comprehensive resource limits using systemd and cgroups v2, with automated enforcement and monitoring to maintain system stability in production environments.

Prerequisites and system setup

Verify cgroups v2 support

Modern Linux distributions use cgroups v2 by default. Verify your system supports the unified hierarchy.

mount | grep cgroup2
cat /sys/fs/cgroup/cgroup.controllers

Update system packages

Ensure you have the latest systemd and resource management tools.

sudo apt update && sudo apt upgrade -y
sudo apt install -y systemd-cron cgroup-tools htop
sudo dnf update -y
sudo dnf install -y systemd libcgroup-tools htop

Enable resource accounting

Configure systemd to track resource usage for all services.

[Manager]
DefaultCPUAccounting=yes
DefaultMemoryAccounting=yes
DefaultBlockIOAccounting=yes
DefaultIPAccounting=yes
sudo systemctl daemon-reload
sudo systemctl daemon-reexec

Configure CPU quotas and limits

Create CPU-limited service unit

Configure a service with CPU percentage and quota limits using systemd unit file directives.

[Unit]
Description=Web Application with CPU Limits
After=network.target

[Service]
Type=simple
User=webapp
Group=webapp
ExecStart=/opt/webapp/bin/server
Restart=always

CPU Limits

CPUQuota=50% CPUWeight=100 CPUAccounting=yes

Memory Limits

MemoryMax=512M MemoryHigh=400M MemoryAccounting=yes [Install] WantedBy=multi-user.target

Configure slice-based resource limits

Create a custom slice to group related services with shared resource limits.

[Unit]
Description=Web Services Resource Slice
Before=slices.target

[Slice]

Limit entire slice to 2 CPU cores

CPUQuota=200% CPUWeight=200

Limit slice memory to 2GB

MemoryMax=2G MemoryHigh=1.6G

I/O bandwidth limits

IOWeight=200 IOReadBandwidthMax=/dev/sda 50M IOWriteBandwidthMax=/dev/sda 30M

Apply slice to services

Modify existing services to run within the resource-limited slice.

[Service]
Slice=webservices.slice
sudo mkdir -p /etc/systemd/system/nginx.service.d
sudo systemctl daemon-reload
sudo systemctl restart nginx

Implement memory quotas and OOM protection

Configure memory limits with graceful handling

Set memory limits with proper high watermarks to prevent sudden OOM kills.

[Unit]
Description=Database Service with Memory Limits
After=network.target

[Service]
Type=forking
User=postgres
Group=postgres
ExecStart=/usr/bin/pg_ctl start -D /var/lib/postgresql/data
ExecStop=/usr/bin/pg_ctl stop -D /var/lib/postgresql/data

Memory management

MemoryMax=1G MemoryHigh=800M MemorySwapMax=200M MemoryAccounting=yes

OOM behavior

OOMPolicy=continue OOMScoreAdjust=-100 [Install] WantedBy=multi-user.target

Configure user session limits

Apply resource limits to user sessions to prevent runaway processes. This builds on our user session limits tutorial.

[Slice]

Per-user CPU limits

CPUQuota=100% CPUWeight=100

Per-user memory limits

MemoryMax=2G MemoryHigh=1.6G

Process limits

TasksMax=500

Set up disk I/O and network bandwidth controls

Configure I/O bandwidth limits

Limit disk I/O to prevent storage bottlenecks affecting other services.

[Unit]
Description=Backup Service with I/O Limits
After=multi-user.target

[Service]
Type=oneshot
User=backup
ExecStart=/opt/backup/scripts/daily-backup.sh

I/O limits

IOWeight=50 IOReadBandwidthMax=/dev/sda 20M IOWriteBandwidthMax=/dev/sda 10M IOReadIOPSMax=/dev/sda 1000 IOWriteIOPSMax=/dev/sda 500 [Install] WantedBy=multi-user.target

Implement network bandwidth controls

Use traffic control (tc) with systemd integration for network bandwidth limits.

[Unit]
Description=Network Bandwidth Limits
After=network.target
Before=multi-user.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/opt/scripts/setup-network-limits.sh
ExecStop=/opt/scripts/cleanup-network-limits.sh

[Install]
WantedBy=multi-user.target
#!/bin/bash

Create traffic control hierarchy

tc qdisc add dev eth0 root handle 1: htb default 30 tc class add dev eth0 parent 1: classid 1:1 htb rate 1gbit

High priority class (50% bandwidth)

tc class add dev eth0 parent 1:1 classid 1:10 htb rate 500mbit ceil 1gbit prio 1

Normal priority class (30% bandwidth)

tc class add dev eth0 parent 1:1 classid 1:20 htb rate 300mbit ceil 800mbit prio 2

Low priority class (20% bandwidth)

tc class add dev eth0 parent 1:1 classid 1:30 htb rate 200mbit ceil 400mbit prio 3

Add filters for cgroup integration

tc filter add dev eth0 parent 1: protocol ip prio 10 handle 1: cgroup
sudo chmod +x /opt/scripts/setup-network-limits.sh
sudo mkdir -p /opt/scripts

Automated policy enforcement and monitoring

Create resource monitoring script

Implement automated monitoring that checks resource usage and enforces policies.

#!/usr/bin/env python3
import subprocess
import json
import logging
import smtplib
from email.mime.text import MIMEText
from datetime import datetime

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

def get_cgroup_stats(service_name):
    """Get resource usage statistics for a systemd service"""
    try:
        cmd = f"systemctl show {service_name} --property=CPUUsageNSec,MemoryCurrent,TasksCurrent"
        result = subprocess.run(cmd.split(), capture_output=True, text=True)
        
        stats = {}
        for line in result.stdout.strip().split('\n'):
            if '=' in line:
                key, value = line.split('=', 1)
                stats[key] = value
        return stats
    except Exception as e:
        logging.error(f"Failed to get stats for {service_name}: {e}")
        return None

def check_resource_violations():
    """Check for services violating resource limits"""
    services = ['nginx', 'webapp', 'database']
    violations = []
    
    for service in services:
        stats = get_cgroup_stats(service)
        if not stats:
            continue
            
        memory_current = int(stats.get('MemoryCurrent', 0))
        tasks_current = int(stats.get('TasksCurrent', 0))
        
        # Check memory threshold (90% of limit)
        if memory_current > 900  1024  1024:  # 900MB threshold
            violations.append(f"{service}: Memory usage {memory_current // 1024 // 1024}MB")
            
        # Check process count threshold
        if tasks_current > 450:  # 90% of 500 task limit
            violations.append(f"{service}: Task count {tasks_current}")
    
    return violations

def send_alert(violations):
    """Send email alert for resource violations"""
    if not violations:
        return
        
    message = "\n".join(violations)
    msg = MIMEText(f"Resource violations detected:\n\n{message}")
    msg['Subject'] = 'Resource Quota Violations Detected'
    msg['From'] = 'monitor@example.com'
    msg['To'] = 'admin@example.com'
    
    try:
        with smtplib.SMTP('localhost') as server:
            server.send_message(msg)
        logging.info("Alert sent successfully")
    except Exception as e:
        logging.error(f"Failed to send alert: {e}")

if __name__ == '__main__':
    violations = check_resource_violations()
    if violations:
        logging.warning(f"Resource violations: {violations}")
        send_alert(violations)
    else:
        logging.info("All services within resource limits")
sudo chmod +x /opt/scripts/resource-monitor.py

Create automated enforcement service

Set up a systemd timer to run monitoring and enforcement actions.

[Unit]
Description=Resource Quota Monitor
After=multi-user.target

[Service]
Type=oneshot
User=root
ExecStart=/opt/scripts/resource-monitor.py
StandardOutput=journal
StandardError=journal
[Unit]
Description=Run Resource Monitor Every 5 Minutes
Requires=resource-monitor.service

[Timer]
OnBootSec=5min
OnUnitActiveSec=5min
Persistent=true

[Install]
WantedBy=timers.target
sudo systemctl daemon-reload
sudo systemctl enable --now resource-monitor.timer

Configure resource usage alerts

Set up integration with monitoring systems for proactive alerts. This complements our memory cgroups tutorial.

#!/usr/bin/env python3
import time
import subprocess
from http.server import HTTPServer, BaseHTTPRequestHandler

class MetricsHandler(BaseHTTPRequestHandler):
    def do_GET(self):
        if self.path == '/metrics':
            metrics = self.get_systemd_metrics()
            self.send_response(200)
            self.send_header('Content-Type', 'text/plain')
            self.end_headers()
            self.wfile.write(metrics.encode())
        else:
            self.send_error(404)
    
    def get_systemd_metrics(self):
        services = ['nginx', 'webapp', 'database']
        metrics = []
        
        for service in services:
            try:
                cmd = f"systemctl show {service} --property=CPUUsageNSec,MemoryCurrent,TasksCurrent"
                result = subprocess.run(cmd.split(), capture_output=True, text=True)
                
                for line in result.stdout.strip().split('\n'):
                    if '=' in line:
                        key, value = line.split('=', 1)
                        if key == 'MemoryCurrent':
                            metrics.append(f'systemd_memory_bytes{{service="{service}"}} {value}')
                        elif key == 'TasksCurrent':
                            metrics.append(f'systemd_tasks_count{{service="{service}"}} {value}')
            except Exception as e:
                continue
                
        return '\n'.join(metrics) + '\n'

if __name__ == '__main__':
    server = HTTPServer(('localhost', 9100), MetricsHandler)
    server.serve_forever()

Verify your setup

Check systemd resource accounting

Verify that resource accounting is active for your services.

sudo systemctl status webapp
sudo systemctl show webapp --property=CPUUsageNSec,MemoryCurrent,TasksCurrent
sudo systemd-cgtop

Test resource limits

Verify that configured limits are enforced by checking cgroup hierarchies.

cat /sys/fs/cgroup/system.slice/webapp.service/memory.max
cat /sys/fs/cgroup/system.slice/webapp.service/cpu.max
sudo systemctl show webservices.slice --property=MemoryMax,CPUQuotaPerSecUSec

Monitor enforcement actions

Check that monitoring and alerts are functioning correctly.

sudo systemctl status resource-monitor.timer
sudo journalctl -u resource-monitor.service -f
sudo /opt/scripts/resource-monitor.py

Common issues

Symptom Cause Fix
Services killed by OOM Memory limits too restrictive Increase MemoryMax or tune MemoryHigh thresholds
CPU throttling affecting performance CPU quota too low Adjust CPUQuota or use CPUWeight for relative limits
Resource accounting not working Missing accounting configuration Enable accounting in /etc/systemd/system.conf
Slice limits not applied Services not assigned to slice Add Slice= directive to service unit files
Network limits not working Traffic control not configured Verify tc qdisc and cgroup integration

Next steps

Need help?

Don't want to manage this yourself?

We handle managed cloud infrastructure for businesses that depend on uptime. From initial setup to ongoing operations.