Learn how to optimize ClickHouse for high-throughput analytics workloads through advanced memory configuration, query performance tuning, storage engine optimization, and connection pooling strategies.
Prerequisites
- ClickHouse server installed
- Root or sudo access
- Minimum 16GB RAM recommended
What this solves
ClickHouse excels at analytical workloads but requires careful tuning for high-throughput scenarios with millions of rows per second. This tutorial shows you how to optimize memory allocation, configure query execution parameters, tune storage engines, and implement connection pooling for maximum performance in production environments.
Step-by-step optimization
Update system and install dependencies
Start with system updates and install performance monitoring tools.
sudo apt update && sudo apt upgrade -y
sudo apt install -y htop iotop sysstat
Configure memory allocation settings
Optimize ClickHouse memory usage for high-throughput workloads by configuring memory limits and background processing.
0.8
4000000000
32
4
4194304
0
10737418240
8589934592
134217728
Optimize query execution settings
Configure query processing parameters for maximum throughput and parallel execution.
500
200
300
0
8
1048576
1024
300
268435456
1000
50000
0
0
Configure storage engine optimization
Tune MergeTree engine settings for high-throughput insertions and efficient storage.
10
300
600
1
100000
8192
8
161061273600
32
8
128
16
lz4
Set up connection pooling configuration
Configure connection limits and timeouts for high-concurrency scenarios.
8123
9000
9004
9005
4096
10
4096
0
false
300
300
false
system
query_log
toYYYYMM(event_date)
7500
1048576
8192
524288
Configure system kernel parameters
Optimize Linux kernel settings for high-throughput database workloads.
# Network optimization
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 30000
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 16384 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
net.ipv4.tcp_congestion_control = bbr
net.ipv4.tcp_slow_start_after_idle = 0
Memory management
vm.swappiness = 1
vm.dirty_ratio = 15
vm.dirty_background_ratio = 5
vm.vfs_cache_pressure = 50
File descriptor limits
fs.file-max = 1000000
fs.nr_open = 1000000
Process limits
kernel.pid_max = 4194304
Apply the kernel parameters:
sudo sysctl -p /etc/sysctl.d/99-clickhouse.conf
Configure systemd limits
Set process and file descriptor limits for the ClickHouse service.
[Service]
LimitNOFILE=1000000
LimitNPROC=1000000
LimitCORE=infinity
LimitMEMLOCK=infinity
Create the override directory if it doesn't exist:
sudo mkdir -p /etc/systemd/system/clickhouse-server.service.d
Set up user profile optimization
Configure default user settings for optimal query performance.
8000000000
2000000000
2000000000
0
3600
10
268435456
100000
10
300
300
100000
50000000
hash
1000000000
0
60000
1
16
0
10000000
200
1000
high_performance
Configure transparent huge pages
Optimize memory management for large datasets by configuring transparent huge pages.
echo 'never' | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
echo 'never' | sudo tee /sys/kernel/mm/transparent_hugepage/defrag
Make the change persistent across reboots:
#!/bin/bash
echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled
echo 'never' > /sys/kernel/mm/transparent_hugepage/defrag
exit 0
sudo chmod +x /etc/rc.local
Apply configuration and restart services
Reload systemd configuration and restart ClickHouse to apply all optimizations.
sudo systemctl daemon-reload
sudo systemctl restart clickhouse-server
sudo systemctl status clickhouse-server
Create performance monitoring queries
Set up queries to monitor ClickHouse performance and identify bottlenecks.
clickhouse-client --query "CREATE TABLE IF NOT EXISTS performance_metrics (
timestamp DateTime,
query_duration_ms UInt64,
memory_usage UInt64,
read_rows UInt64,
read_bytes UInt64,
written_rows UInt64,
written_bytes UInt64
) ENGINE = MergeTree()
ORDER BY timestamp
TTL timestamp + INTERVAL 30 DAY"
Verify your setup
Test the optimized configuration with performance benchmarks and monitoring queries.
# Check ClickHouse status and memory usage
sudo systemctl status clickhouse-server
clickhouse-client --query "SELECT formatReadableSize(value) FROM system.metrics WHERE metric = 'MemoryTracking'"
Test query performance
clickhouse-client --query "SELECT count() FROM system.numbers LIMIT 100000000" --time
Monitor active queries
clickhouse-client --query "SELECT query, elapsed, memory_usage FROM system.processes WHERE query != ''"
Check system resource usage
clickhouse-client --query "SELECT * FROM system.metrics WHERE metric LIKE '%Memory%' OR metric LIKE '%Thread%'"
Verify async insert settings
clickhouse-client --query "SELECT name, value FROM system.settings WHERE name LIKE 'async_insert%'"
Advanced indexing strategies
Configure skip indexes for better query performance
Create specialized indexes for common query patterns to reduce data scanning.
clickhouse-client --query "CREATE TABLE events_optimized (
timestamp DateTime,
user_id UInt64,
event_type String,
category LowCardinality(String),
value Float64,
INDEX idx_timestamp timestamp TYPE minmax GRANULARITY 1,
INDEX idx_user_id user_id TYPE bloom_filter GRANULARITY 1,
INDEX idx_event_type event_type TYPE set(100) GRANULARITY 1,
INDEX idx_category category TYPE set(0) GRANULARITY 1
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (category, timestamp, user_id)
SETTINGS index_granularity = 8192"
Implement table partitioning strategy
Configure optimal partitioning for time-series data to improve query performance and data management.
clickhouse-client --query "CREATE TABLE analytics_partitioned (
event_date Date,
timestamp DateTime,
session_id String,
user_agent String,
ip_address IPv4,
page_url String,
referrer String,
country_code FixedString(2),
INDEX idx_session session_id TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_country country_code TYPE set(300) GRANULARITY 1
) ENGINE = MergeTree()
PARTITION BY event_date
ORDER BY (country_code, timestamp)
SETTINGS index_granularity = 8192,
merge_with_ttl_timeout = 86400"
Connection pooling implementation
Configure connection pooling for applications
Set up connection pooling in your application code to efficiently manage database connections.
Ok.
1048576
3600
3600
60
1073741824
1024
1048576
30000000
Implement application-level connection pooling
Example Python connection pool configuration for high-throughput applications.
from clickhouse_driver import Client
from clickhouse_pool import ChPool
import threading
import time
Connection pool configuration
pool = ChPool(
host='localhost',
port=9000,
database='default',
user='default',
password='',
connections_min=10,
connections_max=100,
connect_timeout=30,
send_receive_timeout=300
)
def execute_query_with_pool(query):
"""Execute query using connection pool"""
with pool.get_client() as client:
try:
result = client.execute(query)
return result
except Exception as e:
print(f"Query execution error: {e}")
return None
Async insert example
def async_insert_batch(data_batch):
"""Insert data asynchronously for high throughput"""
with pool.get_client() as client:
client.execute(
"INSERT INTO events_optimized FORMAT JSONEachRow",
data_batch,
settings={
'async_insert': 1,
'wait_for_async_insert': 0,
'async_insert_busy_timeout_ms': 200
}
)
Common issues
| Symptom | Cause | Fix |
|---|---|---|
| High memory usage | Inefficient memory settings | Reduce max_memory_usage and tune cache sizes |
| Slow insert performance | Small insert blocks | Increase max_insert_block_size and use async inserts |
| Connection timeouts | Insufficient connection limits | Increase max_connections and tune timeouts |
| Query performance issues | Missing indexes | Add skip indexes and optimize table structure |
| Merge operations slow | Too many small parts | Tune parts_to_delay_insert settings |
| High CPU usage | Excessive parallelism | Limit max_threads and reduce concurrency |
Monitoring and maintenance
Set up automated monitoring to track ClickHouse performance metrics and identify optimization opportunities. You can integrate with existing monitoring solutions like the ClickHouse monitoring with Prometheus and Grafana setup for comprehensive observability.
Create performance monitoring dashboard
Set up queries to track key performance indicators.
# Monitor query performance over time
clickhouse-client --query "SELECT
toStartOfMinute(event_time) as minute,
avg(query_duration_ms) as avg_duration,
max(memory_usage) as peak_memory,
sum(read_rows) as total_rows
FROM system.query_log
WHERE event_time >= now() - INTERVAL 1 HOUR
GROUP BY minute
ORDER BY minute"
Check system resource utilization
clickhouse-client --query "SELECT
metric,
value,
description
FROM system.asynchronous_metrics
WHERE metric LIKE '%CPU%' OR metric LIKE '%Memory%'"
Monitor merge operations
clickhouse-client --query "SELECT
table,
count() as active_merges,
sum(bytes_read_uncompressed) as bytes_processing
FROM system.merges
GROUP BY table"
Next steps
- Configure ClickHouse materialized views for real-time analytics
- Configure ClickHouse users and RBAC for production environments
- Implement automated ClickHouse backups with S3 storage
- Set up ClickHouse and Kafka real-time data pipeline
- Configure ClickHouse cluster with sharding and replication
Running this in production?
Automated install script
Run this to automate the entire setup
#!/usr/bin/env bash
set -euo pipefail
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color
# Configuration variables
CLICKHOUSE_CONFIG_DIR="/etc/clickhouse-server"
CLICKHOUSE_USER="clickhouse"
# Cleanup function for rollback
cleanup() {
echo -e "${RED}Error occurred. Cleaning up...${NC}"
systemctl stop clickhouse-server 2>/dev/null || true
exit 1
}
trap cleanup ERR
# Check if running as root
if [[ $EUID -ne 0 ]]; then
echo -e "${RED}This script must be run as root${NC}"
exit 1
fi
# Auto-detect distribution
if [ -f /etc/os-release ]; then
. /etc/os-release
case "$ID" in
ubuntu|debian)
PKG_MGR="apt"
PKG_UPDATE="apt update && apt upgrade -y"
PKG_INSTALL="apt install -y"
SYSCTL_DIR="/etc/sysctl.d"
;;
almalinux|rocky|centos|rhel|ol|fedora)
PKG_MGR="dnf"
PKG_UPDATE="dnf update -y"
PKG_INSTALL="dnf install -y"
SYSCTL_DIR="/etc/sysctl.d"
;;
amzn)
PKG_MGR="yum"
PKG_UPDATE="yum update -y"
PKG_INSTALL="yum install -y"
SYSCTL_DIR="/etc/sysctl.d"
;;
*)
echo -e "${RED}Unsupported distro: $ID${NC}"
exit 1
;;
esac
else
echo -e "${RED}Cannot detect distribution${NC}"
exit 1
fi
echo -e "${GREEN}ClickHouse Performance Optimization Setup${NC}"
echo -e "${BLUE}Detected OS: $ID${NC}"
# Step 1: Update system and install dependencies
echo -e "${BLUE}[1/9] Updating system and installing dependencies...${NC}"
$PKG_UPDATE
$PKG_INSTALL htop iotop sysstat
# Step 2: Install ClickHouse if not present
echo -e "${BLUE}[2/9] Installing ClickHouse...${NC}"
if ! command -v clickhouse-server &> /dev/null; then
case "$PKG_MGR" in
"apt")
curl -fsSL 'https://packages.clickhouse.com/rpm/lts/repodata/repomd.xml.key' | gpg --dearmor -o /usr/share/keyrings/clickhouse-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] https://packages.clickhouse.com/deb stable main" | tee /etc/apt/sources.list.d/clickhouse.list
apt update
$PKG_INSTALL clickhouse-server clickhouse-client
;;
"dnf"|"yum")
$PKG_INSTALL yum-utils
rpm --import https://packages.clickhouse.com/rpm/lts/repodata/repomd.xml.key
$PKG_INSTALL clickhouse-server clickhouse-client
;;
esac
fi
# Step 3: Configure memory allocation settings
echo -e "${BLUE}[3/9] Configuring memory allocation settings...${NC}"
mkdir -p "$CLICKHOUSE_CONFIG_DIR/config.d"
cat > "$CLICKHOUSE_CONFIG_DIR/config.d/memory.xml" << 'EOF'
<?xml version="1.0"?>
<yandex>
<max_server_memory_usage_to_ram_ratio>0.8</max_server_memory_usage_to_ram_ratio>
<max_memory_usage>4000000000</max_memory_usage>
<background_pool_size>32</background_pool_size>
<background_merges_mutations_concurrency_ratio>4</background_merges_mutations_concurrency_ratio>
<uncompressed_cache_size>4194304</uncompressed_cache_size>
<use_uncompressed_cache>0</use_uncompressed_cache>
<max_concurrent_queries>10737418240</max_concurrent_queries>
<max_table_size_to_drop>8589934592</max_table_size_to_drop>
<max_partition_size_to_drop>134217728</max_partition_size_to_drop>
</yandex>
EOF
# Step 4: Optimize query execution settings
echo -e "${BLUE}[4/9] Configuring query execution settings...${NC}"
cat > "$CLICKHOUSE_CONFIG_DIR/config.d/query.xml" << 'EOF'
<?xml version="1.0"?>
<yandex>
<max_connections>500</max_connections>
<keep_alive_timeout>200</keep_alive_timeout>
<max_concurrent_queries>300</max_concurrent_queries>
<compiled_expression_cache_size>0</compiled_expression_cache_size>
<max_threads>8</max_threads>
<max_block_size>1048576</max_block_size>
<max_insert_block_size>1024</max_insert_block_size>
<min_insert_block_size_rows>300</min_insert_block_size_rows>
<max_query_size>268435456</max_query_size>
<interactive_delay>1000</interactive_delay>
<connect_timeout>50000</connect_timeout>
<log_queries>0</log_queries>
<log_queries_min_query_duration_ms>0</log_queries_min_query_duration_ms>
</yandex>
EOF
# Step 5: Configure storage engine optimization
echo -e "${BLUE}[5/9] Configuring storage engine optimization...${NC}"
cat > "$CLICKHOUSE_CONFIG_DIR/config.d/storage.xml" << 'EOF'
<?xml version="1.0"?>
<yandex>
<merge_tree>
<max_suspicious_broken_parts>10</max_suspicious_broken_parts>
<parts_to_delay_insert>300</parts_to_delay_insert>
<parts_to_throw_insert>600</parts_to_throw_insert>
<max_delay_to_insert>1</max_delay_to_insert>
<max_parts_in_total>100000</max_parts_in_total>
<merge_max_block_size>8192</merge_max_block_size>
<number_of_free_entries_in_pool_to_lower_max_size_of_merge>8</number_of_free_entries_in_pool_to_lower_max_size_of_merge>
<max_bytes_to_merge_at_max_space_in_pool>161061273600</max_bytes_to_merge_at_max_space_in_pool>
</merge_tree>
<max_partitions_per_insert_block>32</max_partitions_per_insert_block>
<max_parts_to_merge_at_once>8</max_parts_to_merge_at_once>
<merge_selecting_sleep_ms>128</merge_selecting_sleep_ms>
<max_blocks_to_send_insert_queries>16</max_blocks_to_send_insert_queries>
<compression>
<case>
<method>lz4</method>
</case>
</compression>
</yandex>
EOF
# Step 6: Set up connection pooling configuration
echo -e "${BLUE}[6/9] Configuring connection pooling...${NC}"
cat > "$CLICKHOUSE_CONFIG_DIR/config.d/network.xml" << 'EOF'
<?xml version="1.0"?>
<yandex>
<http_port>8123</http_port>
<tcp_port>9000</tcp_port>
<mysql_port>9004</mysql_port>
<postgresql_port>9005</postgresql_port>
<max_connections>4096</max_connections>
<listen_backlog>10</listen_backlog>
<max_open_files>4096</max_open_files>
<listen_reuse_port>0</listen_reuse_port>
<listen_try_use_priority_pool>false</listen_try_use_priority_pool>
<http_receive_timeout>300</http_receive_timeout>
<http_send_timeout>300</http_send_timeout>
<cancel_http_readonly_queries_on_client_close>false</cancel_http_readonly_queries_on_client_close>
</yandex>
EOF
# Step 7: Configure system kernel parameters
echo -e "${BLUE}[7/9] Configuring kernel parameters...${NC}"
cat > "$SYSCTL_DIR/99-clickhouse.conf" << 'EOF'
# Network optimization
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 30000
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 16384 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
net.ipv4.tcp_congestion_control = bbr
net.ipv4.tcp_slow_start_after_idle = 0
# Memory management
vm.swappiness = 1
vm.dirty_ratio = 15
vm.dirty_background_ratio = 5
vm.vfs_cache_pressure = 50
# File descriptor limits
fs.file-max = 1000000
fs.nr_open = 1000000
# Process limits
kernel.pid_max = 4194304
EOF
sysctl -p "$SYSCTL_DIR/99-clickhouse.conf"
# Step 8: Configure systemd limits and user profiles
echo -e "${BLUE}[8/9] Configuring systemd limits and user profiles...${NC}"
mkdir -p /etc/systemd/system/clickhouse-server.service.d
cat > /etc/systemd/system/clickhouse-server.service.d/override.conf << 'EOF'
[Service]
LimitNOFILE=1000000
LimitNPROC=1000000
LimitCORE=infinity
LimitMEMLOCK=infinity
EOF
mkdir -p "$CLICKHOUSE_CONFIG_DIR/users.d"
cat > "$CLICKHOUSE_CONFIG_DIR/users.d/performance.xml" << 'EOF'
<?xml version="1.0"?>
<yandex>
<users>
<default>
<max_memory_usage>8000000000</max_memory_usage>
<max_bytes_before_external_group_by>2000000000</max_bytes_before_external_group_by>
<max_bytes_before_external_sort>2000000000</max_bytes_before_external_sort>
<use_uncompressed_cache>0</use_uncompressed_cache>
<query_cache_max_size>3600</query_cache_max_size>
<max_threads>10</max_threads>
<max_query_size>268435456</max_query_size>
<interactive_delay>100000</interactive_delay>
<connect_timeout>10</connect_timeout>
<receive_timeout>300</receive_timeout>
<send_timeout>300</send_timeout>
</default>
</users>
<profiles>
<default>
<load_balancing>hash</load_balancing>
</default>
</profiles>
</yandex>
EOF
# Configure transparent huge pages
echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled 2>/dev/null || true
echo 'never' > /sys/kernel/mm/transparent_hugepage/defrag 2>/dev/null || true
# Set proper ownership and permissions
chown -R $CLICKHOUSE_USER:$CLICKHOUSE_USER "$CLICKHOUSE_CONFIG_DIR"
chmod 755 "$CLICKHOUSE_CONFIG_DIR"
find "$CLICKHOUSE_CONFIG_DIR" -type f -exec chmod 644 {} \;
# Reload systemd and restart services
systemctl daemon-reload
systemctl enable clickhouse-server
systemctl restart clickhouse-server
# Step 9: Verification
echo -e "${BLUE}[9/9] Verifying installation...${NC}"
sleep 5
if systemctl is-active --quiet clickhouse-server; then
echo -e "${GREEN}✓ ClickHouse server is running${NC}"
else
echo -e "${RED}✗ ClickHouse server failed to start${NC}"
exit 1
fi
if clickhouse-client --query "SELECT 1" &>/dev/null; then
echo -e "${GREEN}✓ ClickHouse client connection successful${NC}"
else
echo -e "${RED}✗ ClickHouse client connection failed${NC}"
exit 1
fi
echo -e "${GREEN}ClickHouse performance optimization completed successfully!${NC}"
echo -e "${YELLOW}Configuration files created in: $CLICKHOUSE_CONFIG_DIR${NC}"
echo -e "${YELLOW}Service status: systemctl status clickhouse-server${NC}"
echo -e "${YELLOW}Connect using: clickhouse-client${NC}"
Review the script before running. Execute with: bash install.sh