Optimize Linux I/O performance with kernel tuning and storage schedulers for high-throughput workloads

Intermediate 25 min Apr 03, 2026 38 views
Ubuntu 24.04 Ubuntu 22.04 Debian 12 AlmaLinux 9 Rocky Linux 9 Fedora 41

Learn how to optimize Linux I/O performance through kernel parameter tuning, storage scheduler configuration, and filesystem optimizations. This tutorial covers scheduler selection, queue depth tuning, and performance monitoring for high-throughput applications.

Prerequisites

  • Root access to the Linux server
  • Basic understanding of Linux command line
  • Storage devices to optimize (SSD, NVMe, or HDD)

What this solves

Poor I/O performance can severely impact database servers, web applications, and data processing workloads. Linux I/O schedulers and kernel parameters aren't optimized for all workload types by default, leading to bottlenecks in high-throughput scenarios.

This tutorial shows you how to identify I/O bottlenecks, select appropriate schedulers for different storage types, tune kernel parameters, and optimize filesystem mount options for maximum throughput and minimum latency.

Step-by-step configuration

Install monitoring and benchmarking tools

Install essential tools for monitoring I/O performance and benchmarking storage devices.

sudo apt update
sudo apt install -y sysstat iotop fio hdparm nvme-cli
sudo dnf install -y sysstat iotop fio hdparm nvme-cli

Analyze current I/O performance

Check current I/O scheduler settings and baseline performance before making changes.

# Check current schedulers for all block devices
for dev in /sys/block/*/queue/scheduler; do
  echo "$dev: $(cat $dev)"
done

Show current I/O statistics

iostat -x 1 3

Check NVMe device information (if applicable)

sudo nvme list

Configure I/O schedulers for different storage types

Set optimal schedulers based on storage technology. NVMe SSDs benefit from none or mq-deadline, while HDDs work better with bfq or mq-deadline.

# Check storage type and set appropriate scheduler
lsblk -d -o name,rota

For NVMe/SSD (rota=0) - use none scheduler

echo none | sudo tee /sys/block/nvme0n1/queue/scheduler

For HDDs (rota=1) - use bfq scheduler

echo bfq | sudo tee /sys/block/sda/queue/scheduler

For high-performance SSDs - use mq-deadline

echo mq-deadline | sudo tee /sys/block/nvme0n1/queue/scheduler

Make scheduler changes persistent

Create udev rules to automatically apply scheduler settings on boot based on device type.

# Set scheduler for NVMe devices
ACTION=="add|change", KERNEL=="nvme[0-9]n[0-9]", ATTR{queue/scheduler}="none"

Set scheduler for SSDs

ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="mq-deadline"

Set scheduler for HDDs

ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="1", ATTR{queue/scheduler}="bfq"
# Apply udev rules
sudo udevadm control --reload-rules
sudo udevadm trigger

Optimize I/O queue depths

Adjust queue depths to match storage capabilities and workload requirements. Higher queue depths improve throughput but may increase latency.

# Check current queue depth
cat /sys/block/nvme0n1/queue/nr_requests

Increase queue depth for high-throughput NVMe

echo 1024 | sudo tee /sys/block/nvme0n1/queue/nr_requests

Set read-ahead for sequential workloads (in KB)

echo 4096 | sudo tee /sys/block/nvme0n1/queue/read_ahead_kb

For databases, reduce read-ahead to minimize memory usage

echo 128 | sudo tee /sys/block/nvme0n1/queue/read_ahead_kb

Configure kernel I/O parameters

Optimize kernel parameters for better I/O performance, including dirty page handling and CPU scaling.

# Reduce dirty page writeback for consistent performance
vm.dirty_background_ratio = 5
vm.dirty_ratio = 10
vm.dirty_writeback_centisecs = 100
vm.dirty_expire_centisecs = 200

Optimize for I/O intensive workloads

vm.swappiness = 1 vm.vfs_cache_pressure = 50

Increase maximum number of memory map areas

vm.max_map_count = 262144

TCP buffer tuning for network I/O

net.core.rmem_max = 134217728 net.core.wmem_max = 134217728 net.ipv4.tcp_rmem = 4096 87380 134217728 net.ipv4.tcp_wmem = 4096 65536 134217728

CPU scheduler for I/O bound processes

kernel.sched_migration_cost_ns = 5000000
# Apply kernel parameters
sudo sysctl -p /etc/sysctl.d/99-io-performance.conf
Note: The dirty page settings above optimize for consistent write performance. Lower values cause more frequent but smaller writes, reducing latency spikes.

Optimize filesystem mount options

Configure mount options for better I/O performance based on filesystem type and use case.

# High-performance ext4 options for databases
/dev/nvme0n1p1 /var/lib/mysql ext4 defaults,noatime,nobarrier,data=writeback 0 2

Balanced ext4 options for general use

/dev/nvme0n1p2 /opt/data ext4 defaults,noatime,commit=30 0 2

XFS options for large files and high throughput

/dev/nvme0n1p3 /var/lib/backups xfs defaults,noatime,logbsize=256k,largeio 0 2
# Test mount options before rebooting
sudo mount -o remount,noatime /var/lib/mysql

Verify mount options

mount | grep nvme
Warning: The nobarrier option improves performance but may risk data integrity during power failures. Only use with UPS protection or for non-critical data.

Configure per-process I/O scheduling

Set I/O priority classes for different applications to prevent I/O interference.

# Set real-time I/O priority for database
sudo ionice -c 1 -n 4 -p $(pgrep mysqld)

Set idle priority for backup processes

sudo ionice -c 3 -p $(pgrep backup)

Check I/O priorities

for pid in $(pgrep -f mysql); do echo "PID $pid: $(ionice -p $pid)" done

Create I/O performance monitoring script

Set up continuous monitoring to track I/O performance improvements and identify bottlenecks.

#!/bin/bash

LOGFILE="/var/log/io-performance.log"
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')

echo "=== I/O Performance Report - $TIMESTAMP ===" >> $LOGFILE

Device statistics

echo "Device Statistics:" >> $LOGFILE iostat -x 1 1 | grep -E '(Device|nvme|sd[a-z])' >> $LOGFILE

Top I/O processes

echo "Top I/O Processes:" >> $LOGFILE iotop -a -o -d 1 -n 1 | head -20 >> $LOGFILE

Queue depths and schedulers

echo "Scheduler Configuration:" >> $LOGFILE for dev in /sys/block/*/queue/scheduler; do device=$(echo $dev | cut -d'/' -f4) scheduler=$(cat $dev | grep -o '\[.*\]' | tr -d '[]') queue_depth=$(cat /sys/block/$device/queue/nr_requests) echo "$device: scheduler=$scheduler, queue_depth=$queue_depth" >> $LOGFILE done echo "" >> $LOGFILE
sudo chmod 755 /usr/local/bin/io-monitor.sh

Create systemd timer for regular monitoring

sudo tee /etc/systemd/system/io-monitor.timer > /dev/null << 'EOF' [Unit] Description=I/O Performance Monitor Timer [Timer] OnCalendar=*:0/10 Persistent=true [Install] WantedBy=timers.target EOF sudo tee /etc/systemd/system/io-monitor.service > /dev/null << 'EOF' [Unit] Description=I/O Performance Monitor [Service] Type=oneshot ExecStart=/usr/local/bin/io-monitor.sh EOF

Enable monitoring timer

sudo systemctl daemon-reload sudo systemctl enable --now io-monitor.timer

Benchmark I/O improvements

Run comprehensive I/O benchmarks

Use fio to test different I/O patterns and measure performance improvements.

# Random read performance (database-like workload)
sudo fio --name=randread --ioengine=libaio --iodepth=16 --rw=randread --bs=4k --direct=1 --size=1G --numjobs=4 --runtime=60 --group_reporting --filename=/dev/nvme0n1

Sequential write performance (backup/logging workload)

sudo fio --name=seqwrite --ioengine=libaio --iodepth=32 --rw=write --bs=64k --direct=1 --size=2G --numjobs=2 --runtime=60 --group_reporting --filename=/dev/nvme0n1

Mixed workload test

sudo fio --name=mixed --ioengine=libaio --iodepth=16 --rw=randrw --rwmixread=70 --bs=4k --direct=1 --size=1G --numjobs=4 --runtime=60 --group_reporting --filename=/dev/nvme0n1
Warning: These fio tests write directly to the block device and will destroy data. Use a test partition or ensure you have backups.

Verify your setup

# Check active I/O schedulers
for dev in /sys/block/*/queue/scheduler; do
  echo "$dev: $(cat $dev)"
done

Verify kernel parameters

sudo sysctl -a | grep -E 'vm.dirty|vm.swappiness'

Check I/O statistics

iostat -x 1 3

Monitor top I/O processes

iotop -a -o -d 2

Check monitoring timer status

sudo systemctl status io-monitor.timer

Common issues

SymptomCauseFix
High I/O wait timesWrong scheduler for storage typeSwitch to appropriate scheduler (none for NVMe, bfq for HDD)
Inconsistent write performanceLarge dirty page ratioReduce vm.dirty_ratio to 10 or lower
Scheduler changes don't persistMissing udev rulesCreate /etc/udev/rules.d/60-io-schedulers.rules
Database slowdowns during backupsI/O priority conflictsSet backup processes to idle priority with ionice -c 3
Low throughput on NVMeInsufficient queue depthIncrease nr_requests to 1024 or higher
High memory usageExcessive read-ahead bufferingReduce read_ahead_kb for random access workloads

Next steps

Automated install script

Run this to automate the entire setup

#linux io performance #storage scheduler #nvme tuning #kernel parameters #iostat #fio benchmarking

Need help?

Don't want to manage this yourself?

We handle infrastructure for businesses that depend on uptime. From initial setup to ongoing operations.

Talk to an engineer