Optimize multi-socket server performance by configuring NUMA memory policies, CPU affinity, and topology-aware application placement. Achieve significant performance gains through proper NUMA optimization.
Prerequisites
- Multi-socket server with NUMA architecture
- Root or sudo access
- Basic understanding of Linux system administration
- Hardware with at least 2 NUMA nodes
What this solves
NUMA (Non-Uniform Memory Access) optimization is critical for multi-socket servers where memory access times vary based on the physical location of CPU cores and memory modules. Without proper NUMA configuration, applications may experience up to 40% performance degradation due to remote memory access penalties. This tutorial configures NUMA memory policies, CPU affinity, and topology-aware placement to maximize server performance.
Understanding NUMA topology and hardware architecture
Check NUMA topology
First, examine your system's NUMA topology to understand the hardware layout. This shows how CPU cores, memory, and I/O devices are distributed across NUMA nodes.
sudo apt update
sudo apt install -y numactl hwloc-nox
Analyze NUMA topology
Display detailed NUMA topology information including CPU cores, memory distribution, and interconnect distances between nodes.
numactl --hardware
lscpu | grep NUMA
lstopo-no-graphics --of txt
Check NUMA statistics
Monitor current NUMA memory allocation and access patterns to identify potential optimization opportunities.
numastat
cat /proc/meminfo | grep -i numa
cat /sys/devices/system/node/node*/meminfo
Configure NUMA memory policies and CPU affinity
Configure kernel NUMA parameters
Optimize kernel NUMA behavior by tuning zone reclaim mode, balancing, and automatic NUMA balancing settings.
# Disable zone reclaim to prefer remote memory over swap
vm.zone_reclaim_mode = 0
Enable automatic NUMA balancing for better memory locality
kernel.numa_balancing = 1
Configure NUMA balancing scan delay
kernel.numa_balancing_scan_delay_ms = 1000
Set NUMA balancing scan period
kernel.numa_balancing_scan_period_min_ms = 1000
kernel.numa_balancing_scan_period_max_ms = 60000
Configure memory compaction
vm.compact_memory = 1
vm.compaction_proactiveness = 20
Apply NUMA kernel parameters
Load the new NUMA optimization settings and verify they are applied correctly.
sudo sysctl -p /etc/sysctl.d/99-numa-optimization.conf
sudo sysctl -a | grep numa
sudo sysctl vm.zone_reclaim_mode
Configure CPU governor for NUMA
Set CPU frequency scaling governor to performance mode for consistent NUMA performance across all cores.
echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
cpupower frequency-info
cpupower frequency-set -g performance
Create NUMA monitoring script
Set up automated monitoring of NUMA statistics to track performance and identify memory access patterns.
#!/bin/bash
NUMA Performance Monitor Script
LOG_FILE="/var/log/numa-performance.log"
DATE=$(date '+%Y-%m-%d %H:%M:%S')
echo "[$DATE] NUMA Performance Stats" >> $LOG_FILE
Log NUMA hit/miss statistics
echo "NUMA Statistics:" >> $LOG_FILE
numastat >> $LOG_FILE
Log memory distribution per node
echo "\nMemory per NUMA node:" >> $LOG_FILE
for node in /sys/devices/system/node/node*; do
if [ -d "$node" ]; then
node_id=$(basename $node)
echo "$node_id: $(cat $node/meminfo | grep MemTotal)" >> $LOG_FILE
fi
done
Log CPU utilization per NUMA node
echo "\nCPU utilization per NUMA node:" >> $LOG_FILE
for node in /sys/devices/system/node/node*; do
if [ -d "$node" ]; then
node_id=$(basename $node)
cpus=$(cat $node/cpulist)
echo "$node_id CPUs: $cpus" >> $LOG_FILE
fi
done
echo "" >> $LOG_FILE
Make monitoring script executable and schedule it
Enable the NUMA monitoring script and schedule it to run every 5 minutes for continuous performance tracking.
sudo chmod +x /usr/local/bin/numa-monitor.sh
sudo mkdir -p /var/log
sudo touch /var/log/numa-performance.log
Add to crontab for regular monitoring
echo "/5 * /usr/local/bin/numa-monitor.sh" | sudo crontab -
Optimize application NUMA placement with numactl
Configure database NUMA optimization
Create systemd override for database services to bind them to specific NUMA nodes for optimal memory locality.
sudo mkdir -p /etc/systemd/system/postgresql.service.d
[Service]
Bind PostgreSQL to NUMA node 0 with local memory allocation
ExecStart=
ExecStart=/usr/bin/numactl --cpunodebind=0 --membind=0 /usr/lib/postgresql/15/bin/postgres -D /var/lib/postgresql/15/main -c config_file=/etc/postgresql/15/main/postgresql.conf
Set CPU affinity for better cache locality
CPUAffinity=0-7
Memory allocation policy
Environment="NUMA_POLICY=bind"
Environment="NUMA_NODE=0"
Configure web server NUMA optimization
Optimize web server placement across NUMA nodes to balance load and improve response times.
sudo mkdir -p /etc/systemd/system/nginx.service.d
[Service]
Distribute Nginx workers across NUMA nodes
ExecStart=
ExecStart=/usr/bin/numactl --interleave=all /usr/sbin/nginx -g 'daemon on; master_process on;'
Allow access to all CPUs for load balancing
CPUAffinity=0-15
Set memory allocation policy for balanced access
Environment="NUMA_POLICY=interleave"
Configure application-specific NUMA policies
Create wrapper scripts for applications that require specific NUMA placement strategies.
#!/bin/bash
NUMA-aware application launcher
APP_NAME="$1"
NUMA_POLICY="$2"
NUMA_NODE="$3"
shift 3
APP_COMMAND="$@"
case "$NUMA_POLICY" in
"bind")
echo "Launching $APP_NAME with memory bound to node $NUMA_NODE"
numactl --cpunodebind="$NUMA_NODE" --membind="$NUMA_NODE" $APP_COMMAND
;;
"interleave")
echo "Launching $APP_NAME with interleaved memory allocation"
numactl --interleave=all $APP_COMMAND
;;
"preferred")
echo "Launching $APP_NAME with preferred node $NUMA_NODE"
numactl --preferred="$NUMA_NODE" $APP_COMMAND
;;
*)
echo "Unknown NUMA policy: $NUMA_POLICY"
echo "Usage: $0 "
exit 1
;;
esac
Reload systemd and restart services
Apply the NUMA optimization configurations by reloading systemd and restarting the configured services.
sudo chmod +x /usr/local/bin/numa-launch-app.sh
sudo systemctl daemon-reload
sudo systemctl restart postgresql
sudo systemctl restart nginx
Configure IRQ affinity for network interfaces
Optimize interrupt handling by binding network interface IRQs to specific NUMA nodes for better network performance.
#!/bin/bash
Configure IRQ affinity for NUMA optimization
INTERFACE="$1"
NUMA_NODE="$2"
if [ -z "$INTERFACE" ] || [ -z "$NUMA_NODE" ]; then
echo "Usage: $0 "
echo "Example: $0 eth0 0"
exit 1
fi
Get CPUs for the specified NUMA node
NUMA_CPUS=$(cat /sys/devices/system/node/node${NUMA_NODE}/cpulist)
CPU_MASK=$(echo $NUMA_CPUS | sed 's/-/,/g')
echo "Configuring IRQ affinity for $INTERFACE on NUMA node $NUMA_NODE (CPUs: $NUMA_CPUS)"
Find IRQs for the network interface
for irq in $(grep $INTERFACE /proc/interrupts | awk -F: '{print $1}' | tr -d ' '); do
echo "Setting IRQ $irq affinity to CPUs $CPU_MASK"
echo $CPU_MASK > /proc/irq/$irq/smp_affinity_list
done
echo "IRQ affinity configuration complete for $INTERFACE"
Apply IRQ affinity optimization
Configure network interface IRQ affinity for optimal NUMA performance and create a systemd service for persistent configuration.
sudo chmod +x /usr/local/bin/numa-irq-affinity.sh
Apply to primary network interface (adjust interface name as needed)
sudo /usr/local/bin/numa-irq-affinity.sh eth0 0
[Unit]
Description=NUMA IRQ Affinity Configuration
After=network.target
[Service]
Type=oneshot
ExecStart=/usr/local/bin/numa-irq-affinity.sh eth0 0
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target
Enable IRQ affinity service
Enable the IRQ affinity service to ensure NUMA-optimized interrupt handling persists across reboots.
sudo systemctl daemon-reload
sudo systemctl enable numa-irq-affinity.service
sudo systemctl start numa-irq-affinity.service
sudo systemctl status numa-irq-affinity.service
Monitor and benchmark NUMA performance improvements
Install performance testing tools
Install additional tools for comprehensive NUMA performance testing and benchmarking.
sudo apt install -y sysbench stress-ng mbw likwid-tools
Create NUMA benchmark script
Develop a comprehensive benchmark script to measure NUMA performance improvements across different workload types.
#!/bin/bash
NUMA Performance Benchmark Script
BENCH_LOG="/var/log/numa-benchmark.log"
DATE=$(date '+%Y-%m-%d %H:%M:%S')
echo "[$DATE] Starting NUMA Performance Benchmark" | tee -a $BENCH_LOG
Memory bandwidth test - local vs remote
echo "\n=== Memory Bandwidth Test ===" | tee -a $BENCH_LOG
for node in 0 1; do
if [ -d "/sys/devices/system/node/node$node" ]; then
echo "Testing memory bandwidth on NUMA node $node:" | tee -a $BENCH_LOG
numactl --cpunodebind=$node --membind=$node mbw -t0 512 2>&1 | tee -a $BENCH_LOG
fi
done
CPU performance test per NUMA node
echo "\n=== CPU Performance Test ===" | tee -a $BENCH_LOG
for node in 0 1; do
if [ -d "/sys/devices/system/node/node$node" ]; then
echo "Testing CPU performance on NUMA node $node:" | tee -a $BENCH_LOG
numactl --cpunodebind=$node sysbench cpu --cpu-max-prime=10000 --threads=4 run 2>&1 | grep -E "events per second|total time" | tee -a $BENCH_LOG
fi
done
Memory latency test
echo "\n=== Memory Latency Test ===" | tee -a $BENCH_LOG
for node in 0 1; do
if [ -d "/sys/devices/system/node/node$node" ]; then
echo "Testing memory latency on NUMA node $node:" | tee -a $BENCH_LOG
numactl --cpunodebind=$node --membind=$node sysbench memory --memory-block-size=1K --memory-total-size=1G run 2>&1 | grep -E "transferred|total time" | tee -a $BENCH_LOG
fi
done
Cross-node memory access test
echo "\n=== Cross-Node Memory Access Test ===" | tee -a $BENCH_LOG
if [ -d "/sys/devices/system/node/node0" ] && [ -d "/sys/devices/system/node/node1" ]; then
echo "Testing cross-node memory access (CPU node 0, Memory node 1):" | tee -a $BENCH_LOG
numactl --cpunodebind=0 --membind=1 sysbench memory --memory-block-size=1K --memory-total-size=1G run 2>&1 | grep -E "transferred|total time" | tee -a $BENCH_LOG
fi
echo "\n[$DATE] NUMA Performance Benchmark Complete" | tee -a $BENCH_LOG
echo "Results saved to $BENCH_LOG" | tee -a $BENCH_LOG
Run initial performance baseline
Execute the benchmark script to establish a performance baseline and verify NUMA optimizations are working correctly.
sudo chmod +x /usr/local/bin/numa-benchmark.sh
sudo /usr/local/bin/numa-benchmark.sh
Create NUMA performance dashboard script
Set up a real-time performance monitoring dashboard to track NUMA metrics and identify optimization opportunities.
#!/bin/bash
Real-time NUMA Performance Dashboard
while true; do
clear
echo "=== NUMA Performance Dashboard ==="
echo "Last updated: $(date)"
echo
# NUMA topology summary
echo "=== NUMA Topology ==="
lscpu | grep -E "NUMA node|CPU\(s\):"
echo
# Memory usage per NUMA node
echo "=== Memory Usage per NUMA Node ==="
for node in /sys/devices/system/node/node*; do
if [ -d "$node" ]; then
node_id=$(basename $node)
memtotal=$(grep MemTotal $node/meminfo | awk '{print $4" "$5}')
memfree=$(grep MemFree $node/meminfo | awk '{print $4" "$5}')
echo "$node_id: Total $memtotal, Free $memfree"
fi
done
echo
# NUMA statistics
echo "=== NUMA Hit/Miss Statistics ==="
numastat | head -10
echo
# CPU utilization per NUMA node
echo "=== CPU Utilization ==="
top -bn1 | grep "Cpu" | head -2
echo
# Network IRQ distribution
echo "=== Network IRQ Distribution ==="
grep eth0 /proc/interrupts | head -3
echo
echo "Press Ctrl+C to exit"
sleep 5
done
Make dashboard script executable
Enable the NUMA dashboard script and test its functionality for real-time monitoring.
sudo chmod +x /usr/local/bin/numa-dashboard.sh
Test the dashboard (run for a few seconds then exit with Ctrl+C)
sudo /usr/local/bin/numa-dashboard.sh
Verify your setup
# Check NUMA optimization settings
sudo sysctl -a | grep -E "numa|zone_reclaim"
Verify NUMA statistics
numastat
Check service NUMA binding
sudo systemctl status postgresql | grep -A 5 "Main PID"
sudo systemctl status nginx | grep -A 5 "Main PID"
Verify IRQ affinity
cat /proc/interrupts | grep eth0
Test NUMA-aware application launch
/usr/local/bin/numa-launch-app.sh test-app bind 0 echo "NUMA test successful"
Check CPU governor settings
cpupower frequency-info | grep governor
Monitor NUMA performance
tail -n 20 /var/log/numa-performance.log
Common issues
| Symptom | Cause | Fix |
|---|---|---|
| High remote memory access | Applications not NUMA-aware | Use numactl --cpunodebind to bind processes to specific nodes |
| Uneven CPU utilization | IRQ affinity not configured | Run /usr/local/bin/numa-irq-affinity.sh for network interfaces |
| Poor memory bandwidth | Zone reclaim enabled | Set vm.zone_reclaim_mode = 0 in sysctl configuration |
| Service fails to start with NUMA | Invalid NUMA node specification | Check available nodes with numactl --hardware |
| Inconsistent performance | CPU frequency scaling | Set CPU governor to performance mode with cpupower frequency-set -g performance |
| Memory allocation failures | Insufficient memory on bound node | Use interleave policy or increase memory on target node |
Next steps
- Optimize Linux system performance with kernel parameters and system tuning for additional performance improvements
- Configure Linux memory management and swap optimization for high-performance workloads to complement NUMA optimizations
- Configure NUMA-aware database clustering for multi-socket servers for advanced database NUMA optimization
- Implement NUMA-aware container orchestration with Kubernetes for containerized workloads
- Monitor NUMA performance with Prometheus and Grafana dashboards for comprehensive performance monitoring
Running this in production?
Automated install script
Run this to automate the entire setup
#!/usr/bin/env bash
set -euo pipefail
# NUMA Optimization Install Script for Multi-Socket Servers
# Supports: Ubuntu, Debian, AlmaLinux, Rocky Linux, CentOS, RHEL
# Color definitions
readonly RED='\033[0;31m'
readonly GREEN='\033[0;32m'
readonly YELLOW='\033[1;33m'
readonly NC='\033[0m'
# Configuration
readonly LOG_FILE="/var/log/numa-performance.log"
readonly SYSCTL_FILE="/etc/sysctl.d/99-numa-optimization.conf"
readonly MONITOR_SCRIPT="/usr/local/bin/numa-monitor.sh"
# Cleanup function for errors
cleanup() {
echo -e "${RED}[ERROR] Installation failed. Cleaning up...${NC}"
rm -f "$SYSCTL_FILE" "$MONITOR_SCRIPT" 2>/dev/null || true
exit 1
}
trap cleanup ERR
# Helper functions
log_success() {
echo -e "${GREEN}[SUCCESS]${NC} $1"
}
log_warning() {
echo -e "${YELLOW}[WARNING]${NC} $1"
}
log_error() {
echo -e "${RED}[ERROR]${NC} $1"
}
# Check if running as root or with sudo
check_privileges() {
if [[ $EUID -ne 0 ]]; then
log_error "This script must be run as root or with sudo"
exit 1
fi
}
# Detect distribution and set package manager
detect_distro() {
if [[ ! -f /etc/os-release ]]; then
log_error "/etc/os-release not found. Cannot detect distribution."
exit 1
fi
. /etc/os-release
case "$ID" in
ubuntu|debian)
PKG_MGR="apt"
PKG_UPDATE="apt update"
PKG_INSTALL="apt install -y"
;;
almalinux|rocky|centos|rhel|ol|fedora)
PKG_MGR="dnf"
PKG_UPDATE="dnf update -y"
PKG_INSTALL="dnf install -y"
;;
amzn)
PKG_MGR="yum"
PKG_UPDATE="yum update -y"
PKG_INSTALL="yum install -y"
;;
*)
log_error "Unsupported distribution: $ID"
exit 1
;;
esac
log_success "Detected distribution: $PRETTY_NAME ($PKG_MGR)"
}
# Check if system has multiple NUMA nodes
check_numa_support() {
echo "[1/8] Checking NUMA support..."
local numa_nodes
numa_nodes=$(ls /sys/devices/system/node/ | grep -c "^node[0-9]" || echo "0")
if [[ $numa_nodes -lt 2 ]]; then
log_warning "System has only $numa_nodes NUMA node(s). NUMA optimization may not be beneficial."
read -p "Continue anyway? (y/N): " -n 1 -r
echo
if [[ ! $REPLY =~ ^[Yy]$ ]]; then
exit 0
fi
else
log_success "Found $numa_nodes NUMA nodes"
fi
}
# Install required packages
install_packages() {
echo "[2/8] Installing required packages..."
$PKG_UPDATE
local packages="numactl"
# Add hwloc package with distro-specific names
case "$PKG_MGR" in
apt)
packages="$packages hwloc-nox"
;;
dnf|yum)
packages="$packages hwloc"
;;
esac
$PKG_INSTALL $packages
log_success "Packages installed successfully"
}
# Display current NUMA topology
show_numa_topology() {
echo "[3/8] Analyzing NUMA topology..."
echo "NUMA Hardware Information:"
numactl --hardware
echo -e "\nCPU NUMA Information:"
lscpu | grep NUMA || echo "No NUMA CPU information available"
if command -v lstopo-no-graphics >/dev/null 2>&1; then
echo -e "\nTopology Overview:"
lstopo-no-graphics --of txt | head -20
fi
log_success "NUMA topology analysis complete"
}
# Configure kernel NUMA parameters
configure_numa_kernel() {
echo "[4/8] Configuring kernel NUMA parameters..."
cat > "$SYSCTL_FILE" << 'EOF'
# NUMA Optimization Settings
# Disable zone reclaim to prefer remote memory over swap
vm.zone_reclaim_mode = 0
# Enable automatic NUMA balancing for better memory locality
kernel.numa_balancing = 1
# Configure NUMA balancing scan delay (ms)
kernel.numa_balancing_scan_delay_ms = 1000
# Set NUMA balancing scan period (ms)
kernel.numa_balancing_scan_period_min_ms = 1000
kernel.numa_balancing_scan_period_max_ms = 60000
# Configure memory compaction
vm.compact_memory = 1
vm.compaction_proactiveness = 20
EOF
chmod 644 "$SYSCTL_FILE"
chown root:root "$SYSCTL_FILE"
# Apply settings
sysctl -p "$SYSCTL_FILE"
log_success "NUMA kernel parameters configured and applied"
}
# Configure CPU frequency governor
configure_cpu_governor() {
echo "[5/8] Configuring CPU frequency governor..."
# Check if cpufreq is available
if [[ ! -d /sys/devices/system/cpu/cpu0/cpufreq ]]; then
log_warning "CPU frequency scaling not available or not supported"
return 0
fi
# Set performance governor for all CPUs
for cpu in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do
if [[ -w $cpu ]]; then
echo performance > "$cpu" 2>/dev/null || true
fi
done
# Install and configure cpupower if available
if command -v cpupower >/dev/null 2>&1; then
cpupower frequency-set -g performance 2>/dev/null || true
log_success "CPU governor set to performance mode"
else
log_success "CPU governor configured (cpupower not available)"
fi
}
# Create NUMA monitoring script
create_monitoring_script() {
echo "[6/8] Creating NUMA monitoring script..."
cat > "$MONITOR_SCRIPT" << 'EOF'
#!/bin/bash
# NUMA Performance Monitor Script
LOG_FILE="/var/log/numa-performance.log"
DATE=$(date '+%Y-%m-%d %H:%M:%S')
echo "[$DATE] NUMA Performance Stats" >> "$LOG_FILE"
# Log NUMA hit/miss statistics
echo "NUMA Statistics:" >> "$LOG_FILE"
numastat >> "$LOG_FILE" 2>/dev/null || echo "numastat not available" >> "$LOG_FILE"
# Log memory distribution per node
echo "" >> "$LOG_FILE"
echo "Memory per NUMA node:" >> "$LOG_FILE"
for node in /sys/devices/system/node/node*; do
if [[ -d "$node" ]]; then
node_id=$(basename "$node")
mem_total=$(grep MemTotal "$node/meminfo" 2>/dev/null || echo "MemTotal: N/A")
echo "$node_id: $mem_total" >> "$LOG_FILE"
fi
done
# Log CPU utilization per NUMA node
echo "" >> "$LOG_FILE"
echo "CPU configuration per NUMA node:" >> "$LOG_FILE"
for node in /sys/devices/system/node/node*; do
if [[ -d "$node" ]]; then
node_id=$(basename "$node")
cpus=$(cat "$node/cpulist" 2>/dev/null || echo "N/A")
echo "$node_id CPUs: $cpus" >> "$LOG_FILE"
fi
done
echo "" >> "$LOG_FILE"
EOF
chmod 755 "$MONITOR_SCRIPT"
chown root:root "$MONITOR_SCRIPT"
# Create log file with proper permissions
touch "$LOG_FILE"
chmod 644 "$LOG_FILE"
chown root:root "$LOG_FILE"
log_success "NUMA monitoring script created"
}
# Setup monitoring cron job
setup_monitoring() {
echo "[7/8] Setting up NUMA monitoring..."
# Create cron job for root user (runs every 5 minutes)
(crontab -l 2>/dev/null | grep -v numa-monitor.sh; echo "*/5 * * * * $MONITOR_SCRIPT") | crontab -
# Run initial monitoring
"$MONITOR_SCRIPT"
log_success "NUMA monitoring scheduled (every 5 minutes)"
}
# Verify installation and show status
verify_installation() {
echo "[8/8] Verifying installation..."
# Check if numactl is working
if numactl --hardware >/dev/null 2>&1; then
log_success "numactl is working correctly"
else
log_error "numactl verification failed"
return 1
fi
# Check kernel parameters
local zone_reclaim
zone_reclaim=$(sysctl -n vm.zone_reclaim_mode 2>/dev/null || echo "unknown")
if [[ $zone_reclaim == "0" ]]; then
log_success "Zone reclaim mode properly disabled"
else
log_warning "Zone reclaim mode: $zone_reclaim"
fi
# Check NUMA balancing
local numa_balancing
numa_balancing=$(sysctl -n kernel.numa_balancing 2>/dev/null || echo "unknown")
if [[ $numa_balancing == "1" ]]; then
log_success "NUMA balancing enabled"
else
log_warning "NUMA balancing: $numa_balancing"
fi
# Show current NUMA statistics
echo -e "\nCurrent NUMA Statistics:"
numastat 2>/dev/null || echo "numastat output not available"
echo -e "\nMonitoring log location: $LOG_FILE"
echo -e "Monitoring script: $MONITOR_SCRIPT"
echo -e "Kernel config: $SYSCTL_FILE"
log_success "NUMA optimization installation completed successfully"
}
# Main execution
main() {
echo "NUMA Optimization Setup for Multi-Socket Servers"
echo "================================================"
check_privileges
detect_distro
check_numa_support
install_packages
show_numa_topology
configure_numa_kernel
configure_cpu_governor
create_monitoring_script
setup_monitoring
verify_installation
echo -e "\n${GREEN}Installation completed successfully!${NC}"
echo "System will benefit from NUMA optimizations after the next reboot."
echo "Monitor performance with: tail -f $LOG_FILE"
}
main "$@"
Review the script before running. Execute with: bash install.sh