Linux Resource Quotas with systemd and cgroups v2

Configure systemd resource control and cgroups v2 to implement CPU, memory, and I/O quotas with automated enforcement. Set up monitoring and alerts for resource violations across production workloads.

Prerequisites

Root or sudo access
systemd version 245 or newer
Basic understanding of Linux system administration
Python 3 for monitoring scripts

What this solves

Resource quotas prevent applications from consuming unlimited system resources, which can crash servers and affect other workloads. This tutorial shows you how to implement comprehensive resource limits using systemd and cgroups v2, with automated enforcement and monitoring to maintain system stability in production environments.

Prerequisites and system setup

Verify cgroups v2 support

Modern Linux distributions use cgroups v2 by default. Verify your system supports the unified hierarchy.

mount | grep cgroup2
cat /sys/fs/cgroup/cgroup.controllers

Update system packages

Ensure you have the latest systemd and resource management tools.

sudo apt update && sudo apt upgrade -y
sudo apt install -y systemd-cron cgroup-tools htop

sudo dnf update -y
sudo dnf install -y systemd libcgroup-tools htop

Enable resource accounting

Configure systemd to track resource usage for all services.

[Manager]
DefaultCPUAccounting=yes
DefaultMemoryAccounting=yes
DefaultBlockIOAccounting=yes
DefaultIPAccounting=yes

sudo systemctl daemon-reload
sudo systemctl daemon-reexec

Configure CPU quotas and limits

Create CPU-limited service unit

Configure a service with CPU percentage and quota limits using systemd unit file directives.

[Unit]
Description=Web Application with CPU Limits
After=network.target

[Service]
Type=simple
User=webapp
Group=webapp
ExecStart=/opt/webapp/bin/server
Restart=always

CPU Limits
CPUQuota=50%
CPUWeight=100
CPUAccounting=yes

Memory Limits
MemoryMax=512M
MemoryHigh=400M
MemoryAccounting=yes

[Install]
WantedBy=multi-user.target

Configure slice-based resource limits

Create a custom slice to group related services with shared resource limits.

[Unit]
Description=Web Services Resource Slice
Before=slices.target

[Slice]
Limit entire slice to 2 CPU cores
CPUQuota=200%
CPUWeight=200

Limit slice memory to 2GB
MemoryMax=2G
MemoryHigh=1.6G

I/O bandwidth limits
IOWeight=200
IOReadBandwidthMax=/dev/sda 50M
IOWriteBandwidthMax=/dev/sda 30M

Apply slice to services

Modify existing services to run within the resource-limited slice.

[Service]
Slice=webservices.slice

sudo mkdir -p /etc/systemd/system/nginx.service.d
sudo systemctl daemon-reload
sudo systemctl restart nginx

Implement memory quotas and OOM protection

Configure memory limits with graceful handling

Set memory limits with proper high watermarks to prevent sudden OOM kills.

[Unit]
Description=Database Service with Memory Limits
After=network.target

[Service]
Type=forking
User=postgres
Group=postgres
ExecStart=/usr/bin/pg_ctl start -D /var/lib/postgresql/data
ExecStop=/usr/bin/pg_ctl stop -D /var/lib/postgresql/data

Memory management
MemoryMax=1G
MemoryHigh=800M
MemorySwapMax=200M
MemoryAccounting=yes

OOM behavior
OOMPolicy=continue
OOMScoreAdjust=-100

[Install]
WantedBy=multi-user.target

Configure user session limits

Apply resource limits to user sessions to prevent runaway processes. This builds on our user session limits tutorial.

[Slice]
Per-user CPU limits
CPUQuota=100%
CPUWeight=100

Per-user memory limits
MemoryMax=2G
MemoryHigh=1.6G

Process limits
TasksMax=500

Set up disk I/O and network bandwidth controls

Configure I/O bandwidth limits

Limit disk I/O to prevent storage bottlenecks affecting other services.

[Unit]
Description=Backup Service with I/O Limits
After=multi-user.target

[Service]
Type=oneshot
User=backup
ExecStart=/opt/backup/scripts/daily-backup.sh

I/O limits
IOWeight=50
IOReadBandwidthMax=/dev/sda 20M
IOWriteBandwidthMax=/dev/sda 10M
IOReadIOPSMax=/dev/sda 1000
IOWriteIOPSMax=/dev/sda 500

[Install]
WantedBy=multi-user.target

Implement network bandwidth controls

Use traffic control (tc) with systemd integration for network bandwidth limits.

[Unit]
Description=Network Bandwidth Limits
After=network.target
Before=multi-user.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/opt/scripts/setup-network-limits.sh
ExecStop=/opt/scripts/cleanup-network-limits.sh

[Install]
WantedBy=multi-user.target

#!/bin/bash
Create traffic control hierarchy
tc qdisc add dev eth0 root handle 1: htb default 30
tc class add dev eth0 parent 1: classid 1:1 htb rate 1gbit

High priority class (50% bandwidth)
tc class add dev eth0 parent 1:1 classid 1:10 htb rate 500mbit ceil 1gbit prio 1

Normal priority class (30% bandwidth)
tc class add dev eth0 parent 1:1 classid 1:20 htb rate 300mbit ceil 800mbit prio 2

Low priority class (20% bandwidth)
tc class add dev eth0 parent 1:1 classid 1:30 htb rate 200mbit ceil 400mbit prio 3

Add filters for cgroup integration
tc filter add dev eth0 parent 1: protocol ip prio 10 handle 1: cgroup

sudo chmod +x /opt/scripts/setup-network-limits.sh
sudo mkdir -p /opt/scripts

Automated policy enforcement and monitoring

Create resource monitoring script

Implement automated monitoring that checks resource usage and enforces policies.

#!/usr/bin/env python3
import subprocess
import json
import logging
import smtplib
from email.mime.text import MIMEText
from datetime import datetime

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

def get_cgroup_stats(service_name):
    """Get resource usage statistics for a systemd service"""
    try:
        cmd = f"systemctl show {service_name} --property=CPUUsageNSec,MemoryCurrent,TasksCurrent"
        result = subprocess.run(cmd.split(), capture_output=True, text=True)
        
        stats = {}
        for line in result.stdout.strip().split('\n'):
            if '=' in line:
                key, value = line.split('=', 1)
                stats[key] = value
        return stats
    except Exception as e:
        logging.error(f"Failed to get stats for {service_name}: {e}")
        return None

def check_resource_violations():
    """Check for services violating resource limits"""
    services = ['nginx', 'webapp', 'database']
    violations = []
    
    for service in services:
        stats = get_cgroup_stats(service)
        if not stats:
            continue
            
        memory_current = int(stats.get('MemoryCurrent', 0))
        tasks_current = int(stats.get('TasksCurrent', 0))
        
        # Check memory threshold (90% of limit)
        if memory_current > 900  1024  1024:  # 900MB threshold
            violations.append(f"{service}: Memory usage {memory_current // 1024 // 1024}MB")
            
        # Check process count threshold
        if tasks_current > 450:  # 90% of 500 task limit
            violations.append(f"{service}: Task count {tasks_current}")
    
    return violations

def send_alert(violations):
    """Send email alert for resource violations"""
    if not violations:
        return
        
    message = "\n".join(violations)
    msg = MIMEText(f"Resource violations detected:\n\n{message}")
    msg['Subject'] = 'Resource Quota Violations Detected'
    msg['From'] = 'monitor@example.com'
    msg['To'] = 'admin@example.com'
    
    try:
        with smtplib.SMTP('localhost') as server:
            server.send_message(msg)
        logging.info("Alert sent successfully")
    except Exception as e:
        logging.error(f"Failed to send alert: {e}")

if __name__ == '__main__':
    violations = check_resource_violations()
    if violations:
        logging.warning(f"Resource violations: {violations}")
        send_alert(violations)
    else:
        logging.info("All services within resource limits")

sudo chmod +x /opt/scripts/resource-monitor.py

Create automated enforcement service

Set up a systemd timer to run monitoring and enforcement actions.

[Unit]
Description=Resource Quota Monitor
After=multi-user.target

[Service]
Type=oneshot
User=root
ExecStart=/opt/scripts/resource-monitor.py
StandardOutput=journal
StandardError=journal

[Unit]
Description=Run Resource Monitor Every 5 Minutes
Requires=resource-monitor.service

[Timer]
OnBootSec=5min
OnUnitActiveSec=5min
Persistent=true

[Install]
WantedBy=timers.target

sudo systemctl daemon-reload
sudo systemctl enable --now resource-monitor.timer

Configure resource usage alerts

Set up integration with monitoring systems for proactive alerts. This complements our memory cgroups tutorial.

#!/usr/bin/env python3
import time
import subprocess
from http.server import HTTPServer, BaseHTTPRequestHandler

class MetricsHandler(BaseHTTPRequestHandler):
    def do_GET(self):
        if self.path == '/metrics':
            metrics = self.get_systemd_metrics()
            self.send_response(200)
            self.send_header('Content-Type', 'text/plain')
            self.end_headers()
            self.wfile.write(metrics.encode())
        else:
            self.send_error(404)
    
    def get_systemd_metrics(self):
        services = ['nginx', 'webapp', 'database']
        metrics = []
        
        for service in services:
            try:
                cmd = f"systemctl show {service} --property=CPUUsageNSec,MemoryCurrent,TasksCurrent"
                result = subprocess.run(cmd.split(), capture_output=True, text=True)
                
                for line in result.stdout.strip().split('\n'):
                    if '=' in line:
                        key, value = line.split('=', 1)
                        if key == 'MemoryCurrent':
                            metrics.append(f'systemd_memory_bytes{{service="{service}"}} {value}')
                        elif key == 'TasksCurrent':
                            metrics.append(f'systemd_tasks_count{{service="{service}"}} {value}')
            except Exception as e:
                continue
                
        return '\n'.join(metrics) + '\n'

if __name__ == '__main__':
    server = HTTPServer(('localhost', 9100), MetricsHandler)
    server.serve_forever()

Verify your setup

Check systemd resource accounting

Verify that resource accounting is active for your services.

sudo systemctl status webapp
sudo systemctl show webapp --property=CPUUsageNSec,MemoryCurrent,TasksCurrent
sudo systemd-cgtop

Test resource limits

Verify that configured limits are enforced by checking cgroup hierarchies.

cat /sys/fs/cgroup/system.slice/webapp.service/memory.max
cat /sys/fs/cgroup/system.slice/webapp.service/cpu.max
sudo systemctl show webservices.slice --property=MemoryMax,CPUQuotaPerSecUSec

Monitor enforcement actions

Check that monitoring and alerts are functioning correctly.

sudo systemctl status resource-monitor.timer
sudo journalctl -u resource-monitor.service -f
sudo /opt/scripts/resource-monitor.py

Common issues

Symptom	Cause	Fix
Services killed by OOM	Memory limits too restrictive	Increase `MemoryMax` or tune `MemoryHigh` thresholds
CPU throttling affecting performance	CPU quota too low	Adjust `CPUQuota` or use `CPUWeight` for relative limits
Resource accounting not working	Missing accounting configuration	Enable accounting in `/etc/systemd/system.conf`
Slice limits not applied	Services not assigned to slice	Add `Slice=` directive to service unit files
Network limits not working	Traffic control not configured	Verify `tc qdisc` and cgroup integration

Next steps

#systemd #cgroups #resource-limits #monitoring #automation

Implement Linux resource quotas with systemd and automated enforcement

Prerequisites

What this solves

Prerequisites and system setup

Verify cgroups v2 support

Update system packages

Enable resource accounting

Configure CPU quotas and limits

Create CPU-limited service unit

CPU Limits

Memory Limits

Configure slice-based resource limits

Limit entire slice to 2 CPU cores

Limit slice memory to 2GB

I/O bandwidth limits

Apply slice to services

Implement memory quotas and OOM protection

Configure memory limits with graceful handling

Memory management

OOM behavior

Configure user session limits

Per-user CPU limits

Per-user memory limits

Process limits

Set up disk I/O and network bandwidth controls

Configure I/O bandwidth limits

I/O limits

Implement network bandwidth controls

Create traffic control hierarchy

High priority class (50% bandwidth)

Normal priority class (30% bandwidth)

Low priority class (20% bandwidth)

Add filters for cgroup integration

Automated policy enforcement and monitoring

Create resource monitoring script

Create automated enforcement service

Configure resource usage alerts

Verify your setup

Check systemd resource accounting

Test resource limits

Monitor enforcement actions

Common issues

Next steps

Related tutorials

Implement backup rotation policies with automated cleanup using systemd timers and shell scripts

Configure automated system maintenance with advanced cron scheduling and shell scripts

Configure network-attached storage backup with NFS and encryption

Don't want to manage this yourself?