Configure Linux NUMA optimization for multi-socket servers with memory policy tuning and CPU affinity

Advanced 45 min Apr 26, 2026 78 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Optimize multi-socket server performance by configuring NUMA memory policies, CPU affinity, and topology-aware application placement. Achieve significant performance gains through proper NUMA optimization.

Prerequisites

  • Multi-socket server with NUMA architecture
  • Root or sudo access
  • Basic understanding of Linux system administration
  • Hardware with at least 2 NUMA nodes

What this solves

NUMA (Non-Uniform Memory Access) optimization is critical for multi-socket servers where memory access times vary based on the physical location of CPU cores and memory modules. Without proper NUMA configuration, applications may experience up to 40% performance degradation due to remote memory access penalties. This tutorial configures NUMA memory policies, CPU affinity, and topology-aware placement to maximize server performance.

Understanding NUMA topology and hardware architecture

Check NUMA topology

First, examine your system's NUMA topology to understand the hardware layout. This shows how CPU cores, memory, and I/O devices are distributed across NUMA nodes.

sudo apt update
sudo apt install -y numactl hwloc-nox
sudo dnf update -y
sudo dnf install -y numactl hwloc

Analyze NUMA topology

Display detailed NUMA topology information including CPU cores, memory distribution, and interconnect distances between nodes.

numactl --hardware
lscpu | grep NUMA
lstopo-no-graphics --of txt

Check NUMA statistics

Monitor current NUMA memory allocation and access patterns to identify potential optimization opportunities.

numastat
cat /proc/meminfo | grep -i numa
cat /sys/devices/system/node/node*/meminfo

Configure NUMA memory policies and CPU affinity

Configure kernel NUMA parameters

Optimize kernel NUMA behavior by tuning zone reclaim mode, balancing, and automatic NUMA balancing settings.

# Disable zone reclaim to prefer remote memory over swap
vm.zone_reclaim_mode = 0

Enable automatic NUMA balancing for better memory locality

kernel.numa_balancing = 1

Configure NUMA balancing scan delay

kernel.numa_balancing_scan_delay_ms = 1000

Set NUMA balancing scan period

kernel.numa_balancing_scan_period_min_ms = 1000 kernel.numa_balancing_scan_period_max_ms = 60000

Configure memory compaction

vm.compact_memory = 1 vm.compaction_proactiveness = 20

Apply NUMA kernel parameters

Load the new NUMA optimization settings and verify they are applied correctly.

sudo sysctl -p /etc/sysctl.d/99-numa-optimization.conf
sudo sysctl -a | grep numa
sudo sysctl vm.zone_reclaim_mode

Configure CPU governor for NUMA

Set CPU frequency scaling governor to performance mode for consistent NUMA performance across all cores.

echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
cpupower frequency-info
cpupower frequency-set -g performance

Create NUMA monitoring script

Set up automated monitoring of NUMA statistics to track performance and identify memory access patterns.

#!/bin/bash

NUMA Performance Monitor Script

LOG_FILE="/var/log/numa-performance.log" DATE=$(date '+%Y-%m-%d %H:%M:%S') echo "[$DATE] NUMA Performance Stats" >> $LOG_FILE

Log NUMA hit/miss statistics

echo "NUMA Statistics:" >> $LOG_FILE numastat >> $LOG_FILE

Log memory distribution per node

echo "\nMemory per NUMA node:" >> $LOG_FILE for node in /sys/devices/system/node/node*; do if [ -d "$node" ]; then node_id=$(basename $node) echo "$node_id: $(cat $node/meminfo | grep MemTotal)" >> $LOG_FILE fi done

Log CPU utilization per NUMA node

echo "\nCPU utilization per NUMA node:" >> $LOG_FILE for node in /sys/devices/system/node/node*; do if [ -d "$node" ]; then node_id=$(basename $node) cpus=$(cat $node/cpulist) echo "$node_id CPUs: $cpus" >> $LOG_FILE fi done echo "" >> $LOG_FILE

Make monitoring script executable and schedule it

Enable the NUMA monitoring script and schedule it to run every 5 minutes for continuous performance tracking.

sudo chmod +x /usr/local/bin/numa-monitor.sh
sudo mkdir -p /var/log
sudo touch /var/log/numa-performance.log

Add to crontab for regular monitoring

echo "/5 * /usr/local/bin/numa-monitor.sh" | sudo crontab -

Optimize application NUMA placement with numactl

Configure database NUMA optimization

Create systemd override for database services to bind them to specific NUMA nodes for optimal memory locality.

sudo mkdir -p /etc/systemd/system/postgresql.service.d
[Service]

Bind PostgreSQL to NUMA node 0 with local memory allocation

ExecStart= ExecStart=/usr/bin/numactl --cpunodebind=0 --membind=0 /usr/lib/postgresql/15/bin/postgres -D /var/lib/postgresql/15/main -c config_file=/etc/postgresql/15/main/postgresql.conf

Set CPU affinity for better cache locality

CPUAffinity=0-7

Memory allocation policy

Environment="NUMA_POLICY=bind" Environment="NUMA_NODE=0"

Configure web server NUMA optimization

Optimize web server placement across NUMA nodes to balance load and improve response times.

sudo mkdir -p /etc/systemd/system/nginx.service.d
[Service]

Distribute Nginx workers across NUMA nodes

ExecStart= ExecStart=/usr/bin/numactl --interleave=all /usr/sbin/nginx -g 'daemon on; master_process on;'

Allow access to all CPUs for load balancing

CPUAffinity=0-15

Set memory allocation policy for balanced access

Environment="NUMA_POLICY=interleave"

Configure application-specific NUMA policies

Create wrapper scripts for applications that require specific NUMA placement strategies.

#!/bin/bash

NUMA-aware application launcher

APP_NAME="$1" NUMA_POLICY="$2" NUMA_NODE="$3" shift 3 APP_COMMAND="$@" case "$NUMA_POLICY" in "bind") echo "Launching $APP_NAME with memory bound to node $NUMA_NODE" numactl --cpunodebind="$NUMA_NODE" --membind="$NUMA_NODE" $APP_COMMAND ;; "interleave") echo "Launching $APP_NAME with interleaved memory allocation" numactl --interleave=all $APP_COMMAND ;; "preferred") echo "Launching $APP_NAME with preferred node $NUMA_NODE" numactl --preferred="$NUMA_NODE" $APP_COMMAND ;; *) echo "Unknown NUMA policy: $NUMA_POLICY" echo "Usage: $0 " exit 1 ;; esac

Reload systemd and restart services

Apply the NUMA optimization configurations by reloading systemd and restarting the configured services.

sudo chmod +x /usr/local/bin/numa-launch-app.sh
sudo systemctl daemon-reload
sudo systemctl restart postgresql
sudo systemctl restart nginx

Configure IRQ affinity for network interfaces

Optimize interrupt handling by binding network interface IRQs to specific NUMA nodes for better network performance.

#!/bin/bash

Configure IRQ affinity for NUMA optimization

INTERFACE="$1" NUMA_NODE="$2" if [ -z "$INTERFACE" ] || [ -z "$NUMA_NODE" ]; then echo "Usage: $0 " echo "Example: $0 eth0 0" exit 1 fi

Get CPUs for the specified NUMA node

NUMA_CPUS=$(cat /sys/devices/system/node/node${NUMA_NODE}/cpulist) CPU_MASK=$(echo $NUMA_CPUS | sed 's/-/,/g') echo "Configuring IRQ affinity for $INTERFACE on NUMA node $NUMA_NODE (CPUs: $NUMA_CPUS)"

Find IRQs for the network interface

for irq in $(grep $INTERFACE /proc/interrupts | awk -F: '{print $1}' | tr -d ' '); do echo "Setting IRQ $irq affinity to CPUs $CPU_MASK" echo $CPU_MASK > /proc/irq/$irq/smp_affinity_list done echo "IRQ affinity configuration complete for $INTERFACE"

Apply IRQ affinity optimization

Configure network interface IRQ affinity for optimal NUMA performance and create a systemd service for persistent configuration.

sudo chmod +x /usr/local/bin/numa-irq-affinity.sh

Apply to primary network interface (adjust interface name as needed)

sudo /usr/local/bin/numa-irq-affinity.sh eth0 0
[Unit]
Description=NUMA IRQ Affinity Configuration
After=network.target

[Service]
Type=oneshot
ExecStart=/usr/local/bin/numa-irq-affinity.sh eth0 0
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target

Enable IRQ affinity service

Enable the IRQ affinity service to ensure NUMA-optimized interrupt handling persists across reboots.

sudo systemctl daemon-reload
sudo systemctl enable numa-irq-affinity.service
sudo systemctl start numa-irq-affinity.service
sudo systemctl status numa-irq-affinity.service

Monitor and benchmark NUMA performance improvements

Install performance testing tools

Install additional tools for comprehensive NUMA performance testing and benchmarking.

sudo apt install -y sysbench stress-ng mbw likwid-tools
sudo dnf install -y sysbench stress-ng

Install mbw from source on RHEL-based systems

wget http://www.coker.com.au/mbw/mbw-1.5.tar.gz tar -xzf mbw-1.5.tar.gz cd mbw-1.5 make sudo cp mbw /usr/local/bin/

Create NUMA benchmark script

Develop a comprehensive benchmark script to measure NUMA performance improvements across different workload types.

#!/bin/bash

NUMA Performance Benchmark Script

BENCH_LOG="/var/log/numa-benchmark.log" DATE=$(date '+%Y-%m-%d %H:%M:%S') echo "[$DATE] Starting NUMA Performance Benchmark" | tee -a $BENCH_LOG

Memory bandwidth test - local vs remote

echo "\n=== Memory Bandwidth Test ===" | tee -a $BENCH_LOG for node in 0 1; do if [ -d "/sys/devices/system/node/node$node" ]; then echo "Testing memory bandwidth on NUMA node $node:" | tee -a $BENCH_LOG numactl --cpunodebind=$node --membind=$node mbw -t0 512 2>&1 | tee -a $BENCH_LOG fi done

CPU performance test per NUMA node

echo "\n=== CPU Performance Test ===" | tee -a $BENCH_LOG for node in 0 1; do if [ -d "/sys/devices/system/node/node$node" ]; then echo "Testing CPU performance on NUMA node $node:" | tee -a $BENCH_LOG numactl --cpunodebind=$node sysbench cpu --cpu-max-prime=10000 --threads=4 run 2>&1 | grep -E "events per second|total time" | tee -a $BENCH_LOG fi done

Memory latency test

echo "\n=== Memory Latency Test ===" | tee -a $BENCH_LOG for node in 0 1; do if [ -d "/sys/devices/system/node/node$node" ]; then echo "Testing memory latency on NUMA node $node:" | tee -a $BENCH_LOG numactl --cpunodebind=$node --membind=$node sysbench memory --memory-block-size=1K --memory-total-size=1G run 2>&1 | grep -E "transferred|total time" | tee -a $BENCH_LOG fi done

Cross-node memory access test

echo "\n=== Cross-Node Memory Access Test ===" | tee -a $BENCH_LOG if [ -d "/sys/devices/system/node/node0" ] && [ -d "/sys/devices/system/node/node1" ]; then echo "Testing cross-node memory access (CPU node 0, Memory node 1):" | tee -a $BENCH_LOG numactl --cpunodebind=0 --membind=1 sysbench memory --memory-block-size=1K --memory-total-size=1G run 2>&1 | grep -E "transferred|total time" | tee -a $BENCH_LOG fi echo "\n[$DATE] NUMA Performance Benchmark Complete" | tee -a $BENCH_LOG echo "Results saved to $BENCH_LOG" | tee -a $BENCH_LOG

Run initial performance baseline

Execute the benchmark script to establish a performance baseline and verify NUMA optimizations are working correctly.

sudo chmod +x /usr/local/bin/numa-benchmark.sh
sudo /usr/local/bin/numa-benchmark.sh

Create NUMA performance dashboard script

Set up a real-time performance monitoring dashboard to track NUMA metrics and identify optimization opportunities.

#!/bin/bash

Real-time NUMA Performance Dashboard

while true; do clear echo "=== NUMA Performance Dashboard ===" echo "Last updated: $(date)" echo # NUMA topology summary echo "=== NUMA Topology ===" lscpu | grep -E "NUMA node|CPU\(s\):" echo # Memory usage per NUMA node echo "=== Memory Usage per NUMA Node ===" for node in /sys/devices/system/node/node*; do if [ -d "$node" ]; then node_id=$(basename $node) memtotal=$(grep MemTotal $node/meminfo | awk '{print $4" "$5}') memfree=$(grep MemFree $node/meminfo | awk '{print $4" "$5}') echo "$node_id: Total $memtotal, Free $memfree" fi done echo # NUMA statistics echo "=== NUMA Hit/Miss Statistics ===" numastat | head -10 echo # CPU utilization per NUMA node echo "=== CPU Utilization ===" top -bn1 | grep "Cpu" | head -2 echo # Network IRQ distribution echo "=== Network IRQ Distribution ===" grep eth0 /proc/interrupts | head -3 echo echo "Press Ctrl+C to exit" sleep 5 done

Make dashboard script executable

Enable the NUMA dashboard script and test its functionality for real-time monitoring.

sudo chmod +x /usr/local/bin/numa-dashboard.sh

Test the dashboard (run for a few seconds then exit with Ctrl+C)

sudo /usr/local/bin/numa-dashboard.sh

Verify your setup

# Check NUMA optimization settings
sudo sysctl -a | grep -E "numa|zone_reclaim"

Verify NUMA statistics

numastat

Check service NUMA binding

sudo systemctl status postgresql | grep -A 5 "Main PID" sudo systemctl status nginx | grep -A 5 "Main PID"

Verify IRQ affinity

cat /proc/interrupts | grep eth0

Test NUMA-aware application launch

/usr/local/bin/numa-launch-app.sh test-app bind 0 echo "NUMA test successful"

Check CPU governor settings

cpupower frequency-info | grep governor

Monitor NUMA performance

tail -n 20 /var/log/numa-performance.log

Common issues

SymptomCauseFix
High remote memory accessApplications not NUMA-awareUse numactl --cpunodebind to bind processes to specific nodes
Uneven CPU utilizationIRQ affinity not configuredRun /usr/local/bin/numa-irq-affinity.sh for network interfaces
Poor memory bandwidthZone reclaim enabledSet vm.zone_reclaim_mode = 0 in sysctl configuration
Service fails to start with NUMAInvalid NUMA node specificationCheck available nodes with numactl --hardware
Inconsistent performanceCPU frequency scalingSet CPU governor to performance mode with cpupower frequency-set -g performance
Memory allocation failuresInsufficient memory on bound nodeUse interleave policy or increase memory on target node

Next steps

Running this in production?

Need this handled professionally? Running NUMA optimization at scale adds complexity: capacity planning across nodes, performance regression detection, and hardware-specific tuning. Our managed platform covers monitoring, performance optimization and 24/7 response by default.

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle infrastructure performance optimization for businesses that depend on uptime. From initial setup to ongoing operations.