Optimize ClickHouse Performance for High-Throughput

Learn how to optimize ClickHouse for high-throughput analytics workloads through advanced memory configuration, query performance tuning, storage engine optimization, and connection pooling strategies.

Prerequisites

ClickHouse server installed
Root or sudo access
Minimum 16GB RAM recommended

What this solves

ClickHouse excels at analytical workloads but requires careful tuning for high-throughput scenarios with millions of rows per second. This tutorial shows you how to optimize memory allocation, configure query execution parameters, tune storage engines, and implement connection pooling for maximum performance in production environments.

Step-by-step optimization

Update system and install dependencies

Start with system updates and install performance monitoring tools.

sudo apt update && sudo apt upgrade -y
sudo apt install -y htop iotop sysstat

sudo dnf update -y
sudo dnf install -y htop iotop sysstat

Configure memory allocation settings

Optimize ClickHouse memory usage for high-throughput workloads by configuring memory limits and background processing.



    
    0.8
    
    
    4000000000
    
    
    32
    4
    
    
    4194304
    0
    
    
    10737418240
    8589934592
    134217728

Optimize query execution settings

Configure query processing parameters for maximum throughput and parallel execution.

Configure storage engine optimization

Tune MergeTree engine settings for high-throughput insertions and efficient storage.



    
    
        10
        300
        600
        1
        100000
        8192
        8
        161061273600
    
    
    
    32
    8
    128
    16
    
    
    
        
            lz4

Set up connection pooling configuration

Configure connection limits and timeouts for high-concurrency scenarios.



    
    8123
    9000
    9004
    9005
    
    
    4096
    10
    4096
    
    
    0
    false
    
    
    300
    300
    false
    
    
    
        system
        query_log
        toYYYYMM(event_date)
        7500
        1048576
        8192
        524288

Configure system kernel parameters

Optimize Linux kernel settings for high-throughput database workloads.

# Network optimization
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 30000
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 16384 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
net.ipv4.tcp_congestion_control = bbr
net.ipv4.tcp_slow_start_after_idle = 0

Memory management
vm.swappiness = 1
vm.dirty_ratio = 15
vm.dirty_background_ratio = 5
vm.vfs_cache_pressure = 50

File descriptor limits
fs.file-max = 1000000
fs.nr_open = 1000000

Process limits
kernel.pid_max = 4194304

Apply the kernel parameters:

sudo sysctl -p /etc/sysctl.d/99-clickhouse.conf

Configure systemd limits

Set process and file descriptor limits for the ClickHouse service.

[Service]
LimitNOFILE=1000000
LimitNPROC=1000000
LimitCORE=infinity
LimitMEMLOCK=infinity

Create the override directory if it doesn't exist:

sudo mkdir -p /etc/systemd/system/clickhouse-server.service.d

Set up user profile optimization

Configure default user settings for optimal query performance.



    
        
            
            8000000000
            2000000000
            2000000000
            
            
            0
            3600
            10
            
            
            268435456
            100000
            10
            300
            300
            
            
            100000
            50000000
            
            
            hash
            1000000000
            
            
            0
            60000
            1
            16
            0
            10000000
            200
            1000
        
    
    
    
        
            high_performance

Configure transparent huge pages

Optimize memory management for large datasets by configuring transparent huge pages.

echo 'never' | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
echo 'never' | sudo tee /sys/kernel/mm/transparent_hugepage/defrag

Make the change persistent across reboots:

#!/bin/bash
echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled
echo 'never' > /sys/kernel/mm/transparent_hugepage/defrag
exit 0

sudo chmod +x /etc/rc.local

Apply configuration and restart services

Reload systemd configuration and restart ClickHouse to apply all optimizations.

sudo systemctl daemon-reload
sudo systemctl restart clickhouse-server
sudo systemctl status clickhouse-server

Create performance monitoring queries

Set up queries to monitor ClickHouse performance and identify bottlenecks.

clickhouse-client --query "CREATE TABLE IF NOT EXISTS performance_metrics (
    timestamp DateTime,
    query_duration_ms UInt64,
    memory_usage UInt64,
    read_rows UInt64,
    read_bytes UInt64,
    written_rows UInt64,
    written_bytes UInt64
) ENGINE = MergeTree()
ORDER BY timestamp
TTL timestamp + INTERVAL 30 DAY"

Verify your setup

Test the optimized configuration with performance benchmarks and monitoring queries.

# Check ClickHouse status and memory usage
sudo systemctl status clickhouse-server
clickhouse-client --query "SELECT formatReadableSize(value) FROM system.metrics WHERE metric = 'MemoryTracking'"

Test query performance
clickhouse-client --query "SELECT count() FROM system.numbers LIMIT 100000000" --time

Monitor active queries
clickhouse-client --query "SELECT query, elapsed, memory_usage FROM system.processes WHERE query != ''"

Check system resource usage
clickhouse-client --query "SELECT * FROM system.metrics WHERE metric LIKE '%Memory%' OR metric LIKE '%Thread%'"

Verify async insert settings
clickhouse-client --query "SELECT name, value FROM system.settings WHERE name LIKE 'async_insert%'"

Advanced indexing strategies

Configure skip indexes for better query performance

Create specialized indexes for common query patterns to reduce data scanning.

clickhouse-client --query "CREATE TABLE events_optimized (
    timestamp DateTime,
    user_id UInt64,
    event_type String,
    category LowCardinality(String),
    value Float64,
    INDEX idx_timestamp timestamp TYPE minmax GRANULARITY 1,
    INDEX idx_user_id user_id TYPE bloom_filter GRANULARITY 1,
    INDEX idx_event_type event_type TYPE set(100) GRANULARITY 1,
    INDEX idx_category category TYPE set(0) GRANULARITY 1
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (category, timestamp, user_id)
SETTINGS index_granularity = 8192"

Implement table partitioning strategy

Configure optimal partitioning for time-series data to improve query performance and data management.

clickhouse-client --query "CREATE TABLE analytics_partitioned (
    event_date Date,
    timestamp DateTime,
    session_id String,
    user_agent String,
    ip_address IPv4,
    page_url String,
    referrer String,
    country_code FixedString(2),
    INDEX idx_session session_id TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_country country_code TYPE set(300) GRANULARITY 1
) ENGINE = MergeTree()
PARTITION BY event_date
ORDER BY (country_code, timestamp)
SETTINGS index_granularity = 8192,
         merge_with_ttl_timeout = 86400"

Connection pooling implementation

Configure connection pooling for applications

Set up connection pooling in your application code to efficiently manage database connections.

Implement application-level connection pooling

Example Python connection pool configuration for high-throughput applications.

from clickhouse_driver import Client
from clickhouse_pool import ChPool
import threading
import time

Connection pool configuration
pool = ChPool(
    host='localhost',
    port=9000,
    database='default',
    user='default',
    password='',
    connections_min=10,
    connections_max=100,
    connect_timeout=30,
    send_receive_timeout=300
)

def execute_query_with_pool(query):
    """Execute query using connection pool"""
    with pool.get_client() as client:
        try:
            result = client.execute(query)
            return result
        except Exception as e:
            print(f"Query execution error: {e}")
            return None

Async insert example
def async_insert_batch(data_batch):
    """Insert data asynchronously for high throughput"""
    with pool.get_client() as client:
        client.execute(
            "INSERT INTO events_optimized FORMAT JSONEachRow",
            data_batch,
            settings={
                'async_insert': 1,
                'wait_for_async_insert': 0,
                'async_insert_busy_timeout_ms': 200
            }
        )

Common issues

Symptom	Cause	Fix
High memory usage	Inefficient memory settings	Reduce `max_memory_usage` and tune cache sizes
Slow insert performance	Small insert blocks	Increase `max_insert_block_size` and use async inserts
Connection timeouts	Insufficient connection limits	Increase `max_connections` and tune timeouts
Query performance issues	Missing indexes	Add skip indexes and optimize table structure
Merge operations slow	Too many small parts	Tune `parts_to_delay_insert` settings
High CPU usage	Excessive parallelism	Limit `max_threads` and reduce concurrency

Monitoring and maintenance

Set up automated monitoring to track ClickHouse performance metrics and identify optimization opportunities. You can integrate with existing monitoring solutions like the ClickHouse monitoring with Prometheus and Grafana setup for comprehensive observability.

Create performance monitoring dashboard

Set up queries to track key performance indicators.

# Monitor query performance over time
clickhouse-client --query "SELECT 
    toStartOfMinute(event_time) as minute,
    avg(query_duration_ms) as avg_duration,
    max(memory_usage) as peak_memory,
    sum(read_rows) as total_rows
FROM system.query_log 
WHERE event_time >= now() - INTERVAL 1 HOUR
GROUP BY minute
ORDER BY minute"

Check system resource utilization
clickhouse-client --query "SELECT 
    metric,
    value,
    description 
FROM system.asynchronous_metrics 
WHERE metric LIKE '%CPU%' OR metric LIKE '%Memory%'"

Monitor merge operations
clickhouse-client --query "SELECT 
    table,
    count() as active_merges,
    sum(bytes_read_uncompressed) as bytes_processing
FROM system.merges 
GROUP BY table"

Next steps

Running this in production?

Want this handled for you? Running this at scale adds a second layer of work: capacity planning, failover drills, cost control, and on-call. See how we run infrastructure like this for European teams.

Automated install script

Run this to automate the entire setup

install.sh

#!/usr/bin/env bash

set -euo pipefail

# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color

# Configuration variables
CLICKHOUSE_CONFIG_DIR="/etc/clickhouse-server"
CLICKHOUSE_USER="clickhouse"

# Cleanup function for rollback
cleanup() {
    echo -e "${RED}Error occurred. Cleaning up...${NC}"
    systemctl stop clickhouse-server 2>/dev/null || true
    exit 1
}

trap cleanup ERR

# Check if running as root
if [[ $EUID -ne 0 ]]; then
    echo -e "${RED}This script must be run as root${NC}"
    exit 1
fi

# Auto-detect distribution
if [ -f /etc/os-release ]; then
    . /etc/os-release
    case "$ID" in
        ubuntu|debian) 
            PKG_MGR="apt"
            PKG_UPDATE="apt update && apt upgrade -y"
            PKG_INSTALL="apt install -y"
            SYSCTL_DIR="/etc/sysctl.d"
            ;;
        almalinux|rocky|centos|rhel|ol|fedora) 
            PKG_MGR="dnf"
            PKG_UPDATE="dnf update -y"
            PKG_INSTALL="dnf install -y"
            SYSCTL_DIR="/etc/sysctl.d"
            ;;
        amzn) 
            PKG_MGR="yum"
            PKG_UPDATE="yum update -y"
            PKG_INSTALL="yum install -y"
            SYSCTL_DIR="/etc/sysctl.d"
            ;;
        *) 
            echo -e "${RED}Unsupported distro: $ID${NC}"
            exit 1 
            ;;
    esac
else
    echo -e "${RED}Cannot detect distribution${NC}"
    exit 1
fi

echo -e "${GREEN}ClickHouse Performance Optimization Setup${NC}"
echo -e "${BLUE}Detected OS: $ID${NC}"

# Step 1: Update system and install dependencies
echo -e "${BLUE}[1/9] Updating system and installing dependencies...${NC}"
$PKG_UPDATE
$PKG_INSTALL htop iotop sysstat

# Step 2: Install ClickHouse if not present
echo -e "${BLUE}[2/9] Installing ClickHouse...${NC}"
if ! command -v clickhouse-server &> /dev/null; then
    case "$PKG_MGR" in
        "apt")
            curl -fsSL 'https://packages.clickhouse.com/rpm/lts/repodata/repomd.xml.key' | gpg --dearmor -o /usr/share/keyrings/clickhouse-keyring.gpg
            echo "deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] https://packages.clickhouse.com/deb stable main" | tee /etc/apt/sources.list.d/clickhouse.list
            apt update
            $PKG_INSTALL clickhouse-server clickhouse-client
            ;;
        "dnf"|"yum")
            $PKG_INSTALL yum-utils
            rpm --import https://packages.clickhouse.com/rpm/lts/repodata/repomd.xml.key
            $PKG_INSTALL clickhouse-server clickhouse-client
            ;;
    esac
fi

# Step 3: Configure memory allocation settings
echo -e "${BLUE}[3/9] Configuring memory allocation settings...${NC}"
mkdir -p "$CLICKHOUSE_CONFIG_DIR/config.d"
cat > "$CLICKHOUSE_CONFIG_DIR/config.d/memory.xml" << 'EOF'
<?xml version="1.0"?>
<yandex>
    <max_server_memory_usage_to_ram_ratio>0.8</max_server_memory_usage_to_ram_ratio>
    <max_memory_usage>4000000000</max_memory_usage>
    
    <background_pool_size>32</background_pool_size>
    <background_merges_mutations_concurrency_ratio>4</background_merges_mutations_concurrency_ratio>
    
    <uncompressed_cache_size>4194304</uncompressed_cache_size>
    <use_uncompressed_cache>0</use_uncompressed_cache>
    
    <max_concurrent_queries>10737418240</max_concurrent_queries>
    <max_table_size_to_drop>8589934592</max_table_size_to_drop>
    <max_partition_size_to_drop>134217728</max_partition_size_to_drop>
</yandex>
EOF

# Step 4: Optimize query execution settings
echo -e "${BLUE}[4/9] Configuring query execution settings...${NC}"
cat > "$CLICKHOUSE_CONFIG_DIR/config.d/query.xml" << 'EOF'
<?xml version="1.0"?>
<yandex>
    <max_connections>500</max_connections>
    <keep_alive_timeout>200</keep_alive_timeout>
    <max_concurrent_queries>300</max_concurrent_queries>
    
    <compiled_expression_cache_size>0</compiled_expression_cache_size>
    <max_threads>8</max_threads>
    <max_block_size>1048576</max_block_size>
    
    <max_insert_block_size>1024</max_insert_block_size>
    <min_insert_block_size_rows>300</min_insert_block_size_rows>
    
    <max_query_size>268435456</max_query_size>
    <interactive_delay>1000</interactive_delay>
    <connect_timeout>50000</connect_timeout>
    
    <log_queries>0</log_queries>
    <log_queries_min_query_duration_ms>0</log_queries_min_query_duration_ms>
</yandex>
EOF

# Step 5: Configure storage engine optimization
echo -e "${BLUE}[5/9] Configuring storage engine optimization...${NC}"
cat > "$CLICKHOUSE_CONFIG_DIR/config.d/storage.xml" << 'EOF'
<?xml version="1.0"?>
<yandex>
    <merge_tree>
        <max_suspicious_broken_parts>10</max_suspicious_broken_parts>
        <parts_to_delay_insert>300</parts_to_delay_insert>
        <parts_to_throw_insert>600</parts_to_throw_insert>
        <max_delay_to_insert>1</max_delay_to_insert>
        <max_parts_in_total>100000</max_parts_in_total>
        <merge_max_block_size>8192</merge_max_block_size>
        <number_of_free_entries_in_pool_to_lower_max_size_of_merge>8</number_of_free_entries_in_pool_to_lower_max_size_of_merge>
        <max_bytes_to_merge_at_max_space_in_pool>161061273600</max_bytes_to_merge_at_max_space_in_pool>
    </merge_tree>
    
    <max_partitions_per_insert_block>32</max_partitions_per_insert_block>
    <max_parts_to_merge_at_once>8</max_parts_to_merge_at_once>
    <merge_selecting_sleep_ms>128</merge_selecting_sleep_ms>
    <max_blocks_to_send_insert_queries>16</max_blocks_to_send_insert_queries>
    
    <compression>
        <case>
            <method>lz4</method>
        </case>
    </compression>
</yandex>
EOF

# Step 6: Set up connection pooling configuration
echo -e "${BLUE}[6/9] Configuring connection pooling...${NC}"
cat > "$CLICKHOUSE_CONFIG_DIR/config.d/network.xml" << 'EOF'
<?xml version="1.0"?>
<yandex>
    <http_port>8123</http_port>
    <tcp_port>9000</tcp_port>
    <mysql_port>9004</mysql_port>
    <postgresql_port>9005</postgresql_port>
    
    <max_connections>4096</max_connections>
    <listen_backlog>10</listen_backlog>
    <max_open_files>4096</max_open_files>
    
    <listen_reuse_port>0</listen_reuse_port>
    <listen_try_use_priority_pool>false</listen_try_use_priority_pool>
    
    <http_receive_timeout>300</http_receive_timeout>
    <http_send_timeout>300</http_send_timeout>
    <cancel_http_readonly_queries_on_client_close>false</cancel_http_readonly_queries_on_client_close>
</yandex>
EOF

# Step 7: Configure system kernel parameters
echo -e "${BLUE}[7/9] Configuring kernel parameters...${NC}"
cat > "$SYSCTL_DIR/99-clickhouse.conf" << 'EOF'
# Network optimization
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 30000
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 16384 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
net.ipv4.tcp_congestion_control = bbr
net.ipv4.tcp_slow_start_after_idle = 0

# Memory management
vm.swappiness = 1
vm.dirty_ratio = 15
vm.dirty_background_ratio = 5
vm.vfs_cache_pressure = 50

# File descriptor limits
fs.file-max = 1000000
fs.nr_open = 1000000

# Process limits
kernel.pid_max = 4194304
EOF

sysctl -p "$SYSCTL_DIR/99-clickhouse.conf"

# Step 8: Configure systemd limits and user profiles
echo -e "${BLUE}[8/9] Configuring systemd limits and user profiles...${NC}"
mkdir -p /etc/systemd/system/clickhouse-server.service.d
cat > /etc/systemd/system/clickhouse-server.service.d/override.conf << 'EOF'
[Service]
LimitNOFILE=1000000
LimitNPROC=1000000
LimitCORE=infinity
LimitMEMLOCK=infinity
EOF

mkdir -p "$CLICKHOUSE_CONFIG_DIR/users.d"
cat > "$CLICKHOUSE_CONFIG_DIR/users.d/performance.xml" << 'EOF'
<?xml version="1.0"?>
<yandex>
    <users>
        <default>
            <max_memory_usage>8000000000</max_memory_usage>
            <max_bytes_before_external_group_by>2000000000</max_bytes_before_external_group_by>
            <max_bytes_before_external_sort>2000000000</max_bytes_before_external_sort>
            
            <use_uncompressed_cache>0</use_uncompressed_cache>
            <query_cache_max_size>3600</query_cache_max_size>
            <max_threads>10</max_threads>
            
            <max_query_size>268435456</max_query_size>
            <interactive_delay>100000</interactive_delay>
            <connect_timeout>10</connect_timeout>
            <receive_timeout>300</receive_timeout>
            <send_timeout>300</send_timeout>
        </default>
    </users>
    
    <profiles>
        <default>
            <load_balancing>hash</load_balancing>
        </default>
    </profiles>
</yandex>
EOF

# Configure transparent huge pages
echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled 2>/dev/null || true
echo 'never' > /sys/kernel/mm/transparent_hugepage/defrag 2>/dev/null || true

# Set proper ownership and permissions
chown -R $CLICKHOUSE_USER:$CLICKHOUSE_USER "$CLICKHOUSE_CONFIG_DIR"
chmod 755 "$CLICKHOUSE_CONFIG_DIR"
find "$CLICKHOUSE_CONFIG_DIR" -type f -exec chmod 644 {} \;

# Reload systemd and restart services
systemctl daemon-reload
systemctl enable clickhouse-server
systemctl restart clickhouse-server

# Step 9: Verification
echo -e "${BLUE}[9/9] Verifying installation...${NC}"
sleep 5

if systemctl is-active --quiet clickhouse-server; then
    echo -e "${GREEN}✓ ClickHouse server is running${NC}"
else
    echo -e "${RED}✗ ClickHouse server failed to start${NC}"
    exit 1
fi

if clickhouse-client --query "SELECT 1" &>/dev/null; then
    echo -e "${GREEN}✓ ClickHouse client connection successful${NC}"
else
    echo -e "${RED}✗ ClickHouse client connection failed${NC}"
    exit 1
fi

echo -e "${GREEN}ClickHouse performance optimization completed successfully!${NC}"
echo -e "${YELLOW}Configuration files created in: $CLICKHOUSE_CONFIG_DIR${NC}"
echo -e "${YELLOW}Service status: systemctl status clickhouse-server${NC}"
echo -e "${YELLOW}Connect using: clickhouse-client${NC}"

Review the script before running. Execute with: bash install.sh

#clickhouse #performance #memory-tuning #analytics #database-optimization

Optimize ClickHouse performance for high-throughput workloads with advanced tuning and memory management

Prerequisites

What this solves

Step-by-step optimization

Update system and install dependencies

Configure memory allocation settings

Optimize query execution settings

Configure storage engine optimization

Set up connection pooling configuration

Configure system kernel parameters

Memory management

File descriptor limits

Process limits

Configure systemd limits

Set up user profile optimization

Configure transparent huge pages

Apply configuration and restart services

Create performance monitoring queries

Verify your setup

Test query performance

Monitor active queries

Check system resource usage

Verify async insert settings

Advanced indexing strategies

Configure skip indexes for better query performance

Implement table partitioning strategy

Connection pooling implementation

Configure connection pooling for applications

Implement application-level connection pooling

Connection pool configuration

Async insert example

Common issues

Monitoring and maintenance

Create performance monitoring dashboard

Check system resource utilization

Monitor merge operations

Next steps

Running this in production?

Related tutorials

Configure Cherokee caching and compression for improved performance

Implement Spark SQL performance optimization with Catalyst optimizer and advanced tuning

Configure Nginx Redis cluster caching for high availability and performance optimization

Don't want to manage this yourself?