Optimize ClickHouse performance for high-throughput workloads with advanced tuning and memory management

Advanced 45 min Apr 22, 2026
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Learn how to optimize ClickHouse for high-throughput analytics workloads through advanced memory configuration, query performance tuning, storage engine optimization, and connection pooling strategies.

Prerequisites

  • ClickHouse server installed
  • Root or sudo access
  • Minimum 16GB RAM recommended

What this solves

ClickHouse excels at analytical workloads but requires careful tuning for high-throughput scenarios with millions of rows per second. This tutorial shows you how to optimize memory allocation, configure query execution parameters, tune storage engines, and implement connection pooling for maximum performance in production environments.

Step-by-step optimization

Update system and install dependencies

Start with system updates and install performance monitoring tools.

sudo apt update && sudo apt upgrade -y
sudo apt install -y htop iotop sysstat
sudo dnf update -y
sudo dnf install -y htop iotop sysstat

Configure memory allocation settings

Optimize ClickHouse memory usage for high-throughput workloads by configuring memory limits and background processing.



    
    0.8
    
    
    4000000000
    
    
    32
    4
    
    
    4194304
    0
    
    
    10737418240
    8589934592
    134217728

Optimize query execution settings

Configure query processing parameters for maximum throughput and parallel execution.



    
    500
    200
    300
    
    
    0
    8
    1048576
    
    
    1024
    300
    
    
    268435456
    1000
    50000
    
    
    0
    0

Configure storage engine optimization

Tune MergeTree engine settings for high-throughput insertions and efficient storage.



    
    
        10
        300
        600
        1
        100000
        8192
        8
        161061273600
    
    
    
    32
    8
    128
    16
    
    
    
        
            lz4
        
    

Set up connection pooling configuration

Configure connection limits and timeouts for high-concurrency scenarios.



    
    8123
    9000
    9004
    9005
    
    
    4096
    10
    4096
    
    
    0
    false
    
    
    300
    300
    false
    
    
    
        system
        query_log
toYYYYMM(event_date) 7500 1048576 8192 524288

Configure system kernel parameters

Optimize Linux kernel settings for high-throughput database workloads.

# Network optimization
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 30000
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 16384 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
net.ipv4.tcp_congestion_control = bbr
net.ipv4.tcp_slow_start_after_idle = 0

Memory management

vm.swappiness = 1 vm.dirty_ratio = 15 vm.dirty_background_ratio = 5 vm.vfs_cache_pressure = 50

File descriptor limits

fs.file-max = 1000000 fs.nr_open = 1000000

Process limits

kernel.pid_max = 4194304

Apply the kernel parameters:

sudo sysctl -p /etc/sysctl.d/99-clickhouse.conf

Configure systemd limits

Set process and file descriptor limits for the ClickHouse service.

[Service]
LimitNOFILE=1000000
LimitNPROC=1000000
LimitCORE=infinity
LimitMEMLOCK=infinity

Create the override directory if it doesn't exist:

sudo mkdir -p /etc/systemd/system/clickhouse-server.service.d

Set up user profile optimization

Configure default user settings for optimal query performance.



    
        
            
            8000000000
            2000000000
            2000000000
            
            
            0
            3600
            10
            
            
            268435456
            100000
            10
            300
            300
            
            
            100000
            50000000
            
            
            hash
            1000000000
            
            
            0
            60000
            1
            16
            0
            10000000
            200
            1000
        
    
    
    
        
            high_performance
        
    

Configure transparent huge pages

Optimize memory management for large datasets by configuring transparent huge pages.

echo 'never' | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
echo 'never' | sudo tee /sys/kernel/mm/transparent_hugepage/defrag

Make the change persistent across reboots:

#!/bin/bash
echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled
echo 'never' > /sys/kernel/mm/transparent_hugepage/defrag
exit 0
sudo chmod +x /etc/rc.local

Apply configuration and restart services

Reload systemd configuration and restart ClickHouse to apply all optimizations.

sudo systemctl daemon-reload
sudo systemctl restart clickhouse-server
sudo systemctl status clickhouse-server

Create performance monitoring queries

Set up queries to monitor ClickHouse performance and identify bottlenecks.

clickhouse-client --query "CREATE TABLE IF NOT EXISTS performance_metrics (
    timestamp DateTime,
    query_duration_ms UInt64,
    memory_usage UInt64,
    read_rows UInt64,
    read_bytes UInt64,
    written_rows UInt64,
    written_bytes UInt64
) ENGINE = MergeTree()
ORDER BY timestamp
TTL timestamp + INTERVAL 30 DAY"

Verify your setup

Test the optimized configuration with performance benchmarks and monitoring queries.

# Check ClickHouse status and memory usage
sudo systemctl status clickhouse-server
clickhouse-client --query "SELECT formatReadableSize(value) FROM system.metrics WHERE metric = 'MemoryTracking'"

Test query performance

clickhouse-client --query "SELECT count() FROM system.numbers LIMIT 100000000" --time

Monitor active queries

clickhouse-client --query "SELECT query, elapsed, memory_usage FROM system.processes WHERE query != ''"

Check system resource usage

clickhouse-client --query "SELECT * FROM system.metrics WHERE metric LIKE '%Memory%' OR metric LIKE '%Thread%'"

Verify async insert settings

clickhouse-client --query "SELECT name, value FROM system.settings WHERE name LIKE 'async_insert%'"

Advanced indexing strategies

Configure skip indexes for better query performance

Create specialized indexes for common query patterns to reduce data scanning.

clickhouse-client --query "CREATE TABLE events_optimized (
    timestamp DateTime,
    user_id UInt64,
    event_type String,
    category LowCardinality(String),
    value Float64,
    INDEX idx_timestamp timestamp TYPE minmax GRANULARITY 1,
    INDEX idx_user_id user_id TYPE bloom_filter GRANULARITY 1,
    INDEX idx_event_type event_type TYPE set(100) GRANULARITY 1,
    INDEX idx_category category TYPE set(0) GRANULARITY 1
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (category, timestamp, user_id)
SETTINGS index_granularity = 8192"

Implement table partitioning strategy

Configure optimal partitioning for time-series data to improve query performance and data management.

clickhouse-client --query "CREATE TABLE analytics_partitioned (
    event_date Date,
    timestamp DateTime,
    session_id String,
    user_agent String,
    ip_address IPv4,
    page_url String,
    referrer String,
    country_code FixedString(2),
    INDEX idx_session session_id TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_country country_code TYPE set(300) GRANULARITY 1
) ENGINE = MergeTree()
PARTITION BY event_date
ORDER BY (country_code, timestamp)
SETTINGS index_granularity = 8192,
         merge_with_ttl_timeout = 86400"

Connection pooling implementation

Configure connection pooling for applications

Set up connection pooling in your application code to efficiently manage database connections.



    
    Ok.
    1048576
    
    
    3600
    3600
    60
    
    
    
        1073741824
        1024
        1048576
        30000000
    

Implement application-level connection pooling

Example Python connection pool configuration for high-throughput applications.

from clickhouse_driver import Client
from clickhouse_pool import ChPool
import threading
import time

Connection pool configuration

pool = ChPool( host='localhost', port=9000, database='default', user='default', password='', connections_min=10, connections_max=100, connect_timeout=30, send_receive_timeout=300 ) def execute_query_with_pool(query): """Execute query using connection pool""" with pool.get_client() as client: try: result = client.execute(query) return result except Exception as e: print(f"Query execution error: {e}") return None

Async insert example

def async_insert_batch(data_batch): """Insert data asynchronously for high throughput""" with pool.get_client() as client: client.execute( "INSERT INTO events_optimized FORMAT JSONEachRow", data_batch, settings={ 'async_insert': 1, 'wait_for_async_insert': 0, 'async_insert_busy_timeout_ms': 200 } )

Common issues

SymptomCauseFix
High memory usageInefficient memory settingsReduce max_memory_usage and tune cache sizes
Slow insert performanceSmall insert blocksIncrease max_insert_block_size and use async inserts
Connection timeoutsInsufficient connection limitsIncrease max_connections and tune timeouts
Query performance issuesMissing indexesAdd skip indexes and optimize table structure
Merge operations slowToo many small partsTune parts_to_delay_insert settings
High CPU usageExcessive parallelismLimit max_threads and reduce concurrency

Monitoring and maintenance

Set up automated monitoring to track ClickHouse performance metrics and identify optimization opportunities. You can integrate with existing monitoring solutions like the ClickHouse monitoring with Prometheus and Grafana setup for comprehensive observability.

Create performance monitoring dashboard

Set up queries to track key performance indicators.

# Monitor query performance over time
clickhouse-client --query "SELECT 
    toStartOfMinute(event_time) as minute,
    avg(query_duration_ms) as avg_duration,
    max(memory_usage) as peak_memory,
    sum(read_rows) as total_rows
FROM system.query_log 
WHERE event_time >= now() - INTERVAL 1 HOUR
GROUP BY minute
ORDER BY minute"

Check system resource utilization

clickhouse-client --query "SELECT metric, value, description FROM system.asynchronous_metrics WHERE metric LIKE '%CPU%' OR metric LIKE '%Memory%'"

Monitor merge operations

clickhouse-client --query "SELECT table, count() as active_merges, sum(bytes_read_uncompressed) as bytes_processing FROM system.merges GROUP BY table"

Next steps

Running this in production?

Want this handled for you? Running this at scale adds a second layer of work: capacity planning, failover drills, cost control, and on-call. See how we run infrastructure like this for European teams.

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle infrastructure performance optimization for businesses that depend on uptime. From initial setup to ongoing operations.