Set up ClickHouse materialized views to transform raw data into real-time aggregations. Configure performance optimization with memory tuning and monitoring for high-throughput analytics workloads.
Prerequisites
- ClickHouse server installed and running
- Administrative access to ClickHouse configuration
- Basic understanding of SQL and time-series data
What this solves
ClickHouse materialized views automatically transform incoming data into pre-aggregated results, enabling real-time analytics on massive datasets. This tutorial shows you how to create materialized views for common analytics patterns like time-series aggregations, user behavior analysis, and performance metrics. You'll also optimize memory usage and set up monitoring to handle high-throughput data ingestion.
Prerequisites and ClickHouse setup verification
Verify ClickHouse installation
Check that ClickHouse server is running and accessible. If you need to install ClickHouse first, follow our ClickHouse installation guide.
sudo systemctl status clickhouse-server
clickhouse-client --query "SELECT version()"
Create database for analytics
Set up a dedicated database for your analytics workload with appropriate settings.
clickhouse-client --query "CREATE DATABASE analytics_db"
Configure memory settings
Optimize ClickHouse memory settings for materialized view processing. These settings improve performance for real-time aggregations.
10000000000
8000000000
8000000000
0
1000000000000
Restart ClickHouse service
Apply the memory configuration changes by restarting the service.
sudo systemctl restart clickhouse-server
sudo systemctl status clickhouse-server
Create source tables and data ingestion
Create events source table
Set up a source table that receives raw event data. This uses MergeTree engine optimized for time-series data.
clickhouse-client --database=analytics_db --query "
CREATE TABLE events (
timestamp DateTime64(3),
user_id UInt32,
event_type String,
page_url String,
session_id String,
country String,
device_type String,
revenue Decimal64(4)
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (timestamp, user_id)
TTL timestamp + INTERVAL 1 YEAR
SETTINGS index_granularity = 8192
"
Create metrics source table
Set up a table for system metrics and performance data with high ingestion rates.
clickhouse-client --database=analytics_db --query "
CREATE TABLE metrics (
timestamp DateTime64(3),
metric_name String,
metric_value Float64,
host String,
service String,
environment String,
tags Map(String, String)
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (timestamp, metric_name, host)
TTL timestamp + INTERVAL 6 MONTH
SETTINGS index_granularity = 8192
"
Insert sample data
Add test data to verify materialized views work correctly with realistic datasets.
clickhouse-client --database=analytics_db --query "
INSERT INTO events VALUES
('2024-01-15 10:00:00.000', 1001, 'page_view', '/home', 'sess_001', 'US', 'desktop', 0),
('2024-01-15 10:01:30.000', 1001, 'click', '/product/123', 'sess_001', 'US', 'desktop', 0),
('2024-01-15 10:02:15.000', 1002, 'page_view', '/home', 'sess_002', 'UK', 'mobile', 0),
('2024-01-15 10:03:00.000', 1001, 'purchase', '/checkout', 'sess_001', 'US', 'desktop', 99.99),
('2024-01-15 10:04:22.000', 1003, 'page_view', '/about', 'sess_003', 'DE', 'tablet', 0)
"
Insert sample metrics data
Add performance metrics data for system monitoring aggregations.
clickhouse-client --database=analytics_db --query "
INSERT INTO metrics VALUES
('2024-01-15 10:00:00.000', 'cpu_usage', 45.2, 'web-01', 'nginx', 'production', {'region': 'us-east-1'}),
('2024-01-15 10:01:00.000', 'cpu_usage', 52.8, 'web-01', 'nginx', 'production', {'region': 'us-east-1'}),
('2024-01-15 10:01:00.000', 'memory_usage', 78.5, 'web-01', 'nginx', 'production', {'region': 'us-east-1'}),
('2024-01-15 10:02:00.000', 'response_time', 145.3, 'web-01', 'nginx', 'production', {'region': 'us-east-1'}),
('2024-01-15 10:02:00.000', 'cpu_usage', 38.1, 'db-01', 'clickhouse', 'production', {'region': 'us-east-1'})
"
Configure materialized views for real-time aggregations
Create hourly events aggregation
Set up a materialized view that aggregates events by hour, country, and device type for real-time dashboard queries.
clickhouse-client --database=analytics_db --query "
CREATE TABLE events_hourly (
hour DateTime,
country String,
device_type String,
total_events UInt64,
unique_users UInt64,
page_views UInt64,
purchases UInt64,
total_revenue Decimal64(4)
) ENGINE = SummingMergeTree()
PARTITION BY toYYYYMM(hour)
ORDER BY (hour, country, device_type)
TTL hour + INTERVAL 2 YEAR
"
Create materialized view for hourly aggregation
Configure the materialized view that automatically processes incoming events data.
clickhouse-client --database=analytics_db --query "
CREATE MATERIALIZED VIEW events_hourly_mv TO events_hourly AS
SELECT
toStartOfHour(timestamp) as hour,
country,
device_type,
count() as total_events,
uniq(user_id) as unique_users,
countIf(event_type = 'page_view') as page_views,
countIf(event_type = 'purchase') as purchases,
sum(revenue) as total_revenue
FROM events
GROUP BY hour, country, device_type
"
Create user session analysis table
Build aggregations for user behavior analysis with session-level metrics.
clickhouse-client --database=analytics_db --query "
CREATE TABLE user_sessions (
date Date,
user_id UInt32,
country String,
device_type String,
session_count UInt32,
total_events UInt32,
session_duration UInt32,
total_revenue Decimal64(4)
) ENGINE = ReplacingMergeTree()
PARTITION BY toYYYYMM(date)
ORDER BY (date, user_id)
TTL date + INTERVAL 1 YEAR
"
Create user session materialized view
Process events into user session summaries for behavioral analytics.
clickhouse-client --database=analytics_db --query "
CREATE MATERIALIZED VIEW user_sessions_mv TO user_sessions AS
SELECT
toDate(timestamp) as date,
user_id,
any(country) as country,
any(device_type) as device_type,
uniq(session_id) as session_count,
count() as total_events,
toUInt32(max(timestamp) - min(timestamp)) as session_duration,
sum(revenue) as total_revenue
FROM events
GROUP BY date, user_id
"
Create metrics aggregation table
Set up real-time metrics aggregation for system monitoring dashboards.
clickhouse-client --database=analytics_db --query "
CREATE TABLE metrics_5min (
time_bucket DateTime,
metric_name String,
host String,
service String,
environment String,
avg_value Float64,
max_value Float64,
min_value Float64,
count_values UInt64
) ENGINE = SummingMergeTree()
PARTITION BY toYYYYMM(time_bucket)
ORDER BY (time_bucket, metric_name, host, service)
TTL time_bucket + INTERVAL 3 MONTH
"
Create metrics materialized view
Aggregate metrics data into 5-minute buckets for real-time monitoring.
clickhouse-client --database=analytics_db --query "
CREATE MATERIALIZED VIEW metrics_5min_mv TO metrics_5min AS
SELECT
toStartOfFiveMinute(timestamp) as time_bucket,
metric_name,
host,
service,
environment,
avg(metric_value) as avg_value,
max(metric_value) as max_value,
min(metric_value) as min_value,
count() as count_values
FROM metrics
GROUP BY time_bucket, metric_name, host, service, environment
"
Optimize materialized view performance and monitoring
Configure async insert settings
Enable asynchronous inserts to improve performance with high-volume data ingestion.
1
0
200
10485760
16
0
Optimize merge settings for materialized views
Configure merge behavior to handle frequent small inserts from materialized views efficiently.
3000
1000
10000
161061273600
5000
1000
Create monitoring queries
Set up queries to monitor materialized view performance and data freshness.
clickhouse-client --database=analytics_db --query "
CREATE TABLE mv_monitoring (
timestamp DateTime DEFAULT now(),
view_name String,
source_table String,
target_table String,
rows_processed UInt64,
last_update DateTime,
processing_time_ms UInt64
) ENGINE = MergeTree()
ORDER BY (timestamp, view_name)
TTL timestamp + INTERVAL 30 DAY
"
Create performance monitoring view
Monitor materialized view execution statistics and identify performance bottlenecks.
clickhouse-client --database=analytics_db --query "
CREATE VIEW mv_performance_stats AS
SELECT
database,
table as view_name,
formatReadableSize(total_bytes) as data_size,
total_rows,
formatReadableSize(total_bytes_uncompressed) as uncompressed_size,
round(total_bytes / total_bytes_uncompressed, 3) as compression_ratio,
partition_key,
sorting_key
FROM system.tables
WHERE database = 'analytics_db'
AND engine LIKE '%MergeTree'
ORDER BY total_bytes DESC
"
Restart ClickHouse with optimized settings
Apply all performance optimizations by restarting the service.
sudo systemctl restart clickhouse-server
sudo systemctl status clickhouse-server
Test materialized views with bulk data
Insert additional data to verify materialized views process records correctly.
clickhouse-client --database=analytics_db --query "
INSERT INTO events SELECT
now() - INTERVAL number MINUTE as timestamp,
1000 + (number % 100) as user_id,
['page_view', 'click', 'purchase'][1 + (number % 3)] as event_type,
'/page/' || toString(number % 20) as page_url,
'session_' || toString(number % 50) as session_id,
['US', 'UK', 'DE', 'FR', 'JP'][1 + (number % 5)] as country,
['desktop', 'mobile', 'tablet'][1 + (number % 3)] as device_type,
if(number % 10 = 0, 50.0 + (number % 100), 0) as revenue
FROM numbers(10000)
"
Verify your setup
Check that materialized views are processing data correctly and aggregations are updating in real-time.
# Check materialized view status
clickhouse-client --database=analytics_db --query "SHOW TABLES LIKE '%_mv'"
Verify hourly aggregations
clickhouse-client --database=analytics_db --query "
SELECT hour, country, device_type, total_events, unique_users, total_revenue
FROM events_hourly
ORDER BY hour DESC, total_events DESC
LIMIT 10"
Check user session data
clickhouse-client --database=analytics_db --query "
SELECT date, count() as active_users, sum(total_revenue) as daily_revenue
FROM user_sessions
GROUP BY date
ORDER BY date DESC
LIMIT 5"
Monitor metrics aggregation
clickhouse-client --database=analytics_db --query "
SELECT time_bucket, metric_name, host, avg_value, max_value
FROM metrics_5min
ORDER BY time_bucket DESC
LIMIT 10"
Check materialized view performance
clickhouse-client --database=analytics_db --query "SELECT * FROM mv_performance_stats"
SYSTEM FLUSH LOGS to ensure all query statistics are written to system tables for monitoring.Common issues
| Symptom | Cause | Fix |
|---|---|---|
| Materialized view not updating | View creation failed or syntax error | SHOW CREATE TABLE view_name and check system.query_log |
| High memory usage during aggregation | Insufficient memory for GROUP BY operations | Increase max_memory_usage and max_bytes_before_external_group_by |
| Slow materialized view processing | Too many small parts or inefficient ORDER BY | Optimize merge settings and review table structure |
| Data duplication in SummingMergeTree | Non-optimal ORDER BY clause | Ensure ORDER BY includes all dimension columns |
| Materialized view consuming too much disk | Missing TTL or compression settings | Add TTL settings and enable compression |
Next steps
- Set up ClickHouse monitoring with Prometheus and Grafana for comprehensive observability
- Optimize ClickHouse performance for large datasets with advanced tuning techniques
- Set up ClickHouse and Kafka real-time data pipeline for streaming analytics
- Configure automated ClickHouse backups with S3 storage and monitoring
- Configure ClickHouse cluster replication for high availability production deployments
Running this in production?
Automated install script
Run this to automate the entire setup
#!/usr/bin/env bash
set -euo pipefail
# ClickHouse Materialized Views Setup Script
# Configures ClickHouse with materialized views for real-time analytics
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m'
# Configuration
CLICKHOUSE_CONFIG="/etc/clickhouse-server/config.xml"
CLICKHOUSE_USERS_CONFIG="/etc/clickhouse-server/users.xml"
DATABASE_NAME="analytics_db"
CLICKHOUSE_USER="clickhouse"
print_status() {
echo -e "${BLUE}[INFO]${NC} $1"
}
print_success() {
echo -e "${GREEN}[SUCCESS]${NC} $1"
}
print_error() {
echo -e "${RED}[ERROR]${NC} $1"
}
print_warning() {
echo -e "${YELLOW}[WARNING]${NC} $1"
}
cleanup_on_error() {
print_error "Installation failed. Attempting cleanup..."
systemctl stop clickhouse-server 2>/dev/null || true
exit 1
}
trap cleanup_on_error ERR
check_root() {
if [[ $EUID -ne 0 ]]; then
print_error "This script must be run as root or with sudo"
exit 1
fi
}
detect_distro() {
if [ -f /etc/os-release ]; then
. /etc/os-release
case "$ID" in
ubuntu|debian)
PKG_MGR="apt"
PKG_INSTALL="apt install -y"
PKG_UPDATE="apt update"
FIREWALL_CMD="ufw"
;;
almalinux|rocky|centos|rhel|ol|fedora)
PKG_MGR="dnf"
PKG_INSTALL="dnf install -y"
PKG_UPDATE="dnf update -y"
FIREWALL_CMD="firewall-cmd"
;;
amzn)
PKG_MGR="yum"
PKG_INSTALL="yum install -y"
PKG_UPDATE="yum update -y"
FIREWALL_CMD="firewall-cmd"
;;
*)
print_error "Unsupported distro: $ID"
exit 1
;;
esac
else
print_error "Cannot detect distribution"
exit 1
fi
}
install_clickhouse() {
print_status "[1/8] Installing ClickHouse..."
case "$ID" in
ubuntu|debian)
$PKG_UPDATE
$PKG_INSTALL apt-transport-https ca-certificates dirmngr
apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 8919F6BD2B48D754
echo "deb https://packages.clickhouse.com/deb stable main" > /etc/apt/sources.list.d/clickhouse.list
$PKG_UPDATE
$PKG_INSTALL clickhouse-server clickhouse-client
;;
almalinux|rocky|centos|rhel|ol|fedora|amzn)
$PKG_INSTALL yum-utils
rpm --import https://packages.clickhouse.com/rpm/lts/repodata/repomd.xml.key
$PKG_INSTALL yum-config-manager || $PKG_INSTALL dnf-plugins-core
yum-config-manager --add-repo https://packages.clickhouse.com/rpm/clickhouse.repo || \
dnf config-manager --add-repo https://packages.clickhouse.com/rpm/clickhouse.repo
$PKG_INSTALL clickhouse-server clickhouse-client
;;
esac
print_success "ClickHouse installed successfully"
}
configure_memory_settings() {
print_status "[2/8] Configuring memory settings..."
# Backup original config
cp "$CLICKHOUSE_CONFIG" "${CLICKHOUSE_CONFIG}.backup"
# Update memory settings
cat > /tmp/clickhouse_memory.xml << 'EOF'
<?xml version="1.0"?>
<clickhouse>
<max_memory_usage>10000000000</max_memory_usage>
<max_bytes_before_external_group_by>8000000000</max_bytes_before_external_group_by>
<max_bytes_before_external_sort>8000000000</max_bytes_before_external_sort>
<max_server_memory_usage>0</max_server_memory_usage>
<max_temporary_data_on_disk_size_for_query>1000000000000</max_temporary_data_on_disk_size_for_query>
</clickhouse>
EOF
# Merge settings into main config
if grep -q "<max_memory_usage>" "$CLICKHOUSE_CONFIG"; then
print_warning "Memory settings already exist in config, updating..."
sed -i '/<max_memory_usage>/,/<\/max_memory_usage>/d' "$CLICKHOUSE_CONFIG"
fi
# Insert new settings before closing tag
sed -i '/<\/clickhouse>/i\ <max_memory_usage>10000000000</max_memory_usage>\n <max_bytes_before_external_group_by>8000000000</max_bytes_before_external_group_by>\n <max_bytes_before_external_sort>8000000000</max_bytes_before_external_sort>\n <max_server_memory_usage>0</max_server_memory_usage>\n <max_temporary_data_on_disk_size_for_query>1000000000000</max_temporary_data_on_disk_size_for_query>' "$CLICKHOUSE_CONFIG"
chown $CLICKHOUSE_USER:$CLICKHOUSE_USER "$CLICKHOUSE_CONFIG"
chmod 644 "$CLICKHOUSE_CONFIG"
print_success "Memory settings configured"
}
start_clickhouse() {
print_status "[3/8] Starting ClickHouse service..."
systemctl enable clickhouse-server
systemctl start clickhouse-server
# Wait for service to be ready
sleep 10
# Verify service is running
if ! systemctl is-active --quiet clickhouse-server; then
print_error "ClickHouse service failed to start"
exit 1
fi
print_success "ClickHouse service started"
}
create_database() {
print_status "[4/8] Creating analytics database..."
clickhouse-client --query "CREATE DATABASE IF NOT EXISTS $DATABASE_NAME"
print_success "Analytics database created"
}
create_source_tables() {
print_status "[5/8] Creating source tables..."
# Create events table
clickhouse-client --database="$DATABASE_NAME" --query "
CREATE TABLE IF NOT EXISTS events (
timestamp DateTime64(3),
user_id UInt32,
event_type String,
page_url String,
session_id String,
country String,
device_type String,
revenue Decimal64(4)
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (timestamp, user_id)
TTL timestamp + INTERVAL 1 YEAR
SETTINGS index_granularity = 8192"
# Create metrics table
clickhouse-client --database="$DATABASE_NAME" --query "
CREATE TABLE IF NOT EXISTS metrics (
timestamp DateTime64(3),
metric_name String,
metric_value Float64,
host String,
service String,
environment String,
tags Map(String, String)
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (timestamp, metric_name, host)
TTL timestamp + INTERVAL 6 MONTH
SETTINGS index_granularity = 8192"
print_success "Source tables created"
}
create_materialized_views() {
print_status "[6/8] Creating materialized views..."
# Create hourly events aggregation table
clickhouse-client --database="$DATABASE_NAME" --query "
CREATE TABLE IF NOT EXISTS events_hourly_agg (
hour DateTime,
country String,
device_type String,
total_events UInt64,
unique_users UInt64,
page_views UInt64,
purchases UInt64,
total_revenue Decimal64(4)
) ENGINE = SummingMergeTree()
PARTITION BY toYYYYMM(hour)
ORDER BY (hour, country, device_type)"
# Create materialized view for hourly aggregation
clickhouse-client --database="$DATABASE_NAME" --query "
CREATE MATERIALIZED VIEW IF NOT EXISTS events_hourly_mv
TO events_hourly_agg AS
SELECT
toStartOfHour(timestamp) as hour,
country,
device_type,
count() as total_events,
uniq(user_id) as unique_users,
countIf(event_type = 'page_view') as page_views,
countIf(event_type = 'purchase') as purchases,
sum(revenue) as total_revenue
FROM events
GROUP BY hour, country, device_type"
# Create metrics aggregation table
clickhouse-client --database="$DATABASE_NAME" --query "
CREATE TABLE IF NOT EXISTS metrics_5min_agg (
time_bucket DateTime,
metric_name String,
host String,
service String,
avg_value Float64,
max_value Float64,
min_value Float64,
count_samples UInt64
) ENGINE = ReplacingMergeTree()
PARTITION BY toYYYYMM(time_bucket)
ORDER BY (time_bucket, metric_name, host, service)"
# Create materialized view for metrics aggregation
clickhouse-client --database="$DATABASE_NAME" --query "
CREATE MATERIALIZED VIEW IF NOT EXISTS metrics_5min_mv
TO metrics_5min_agg AS
SELECT
toStartOfFiveMinutes(timestamp) as time_bucket,
metric_name,
host,
service,
avg(metric_value) as avg_value,
max(metric_value) as max_value,
min(metric_value) as min_value,
count() as count_samples
FROM metrics
GROUP BY time_bucket, metric_name, host, service"
print_success "Materialized views created"
}
insert_sample_data() {
print_status "[7/8] Inserting sample data..."
# Insert sample events data
clickhouse-client --database="$DATABASE_NAME" --query "
INSERT INTO events VALUES
(now() - INTERVAL 1 HOUR, 1001, 'page_view', '/home', 'sess_001', 'US', 'desktop', 0),
(now() - INTERVAL 58 MINUTE, 1001, 'click', '/product/123', 'sess_001', 'US', 'desktop', 0),
(now() - INTERVAL 55 MINUTE, 1002, 'page_view', '/home', 'sess_002', 'UK', 'mobile', 0),
(now() - INTERVAL 50 MINUTE, 1001, 'purchase', '/checkout', 'sess_001', 'US', 'desktop', 99.99),
(now() - INTERVAL 45 MINUTE, 1003, 'page_view', '/about', 'sess_003', 'DE', 'tablet', 0)"
# Insert sample metrics data
clickhouse-client --database="$DATABASE_NAME" --query "
INSERT INTO metrics VALUES
(now() - INTERVAL 10 MINUTE, 'cpu_usage', 45.2, 'web-01', 'nginx', 'production', {'region': 'us-east-1'}),
(now() - INTERVAL 9 MINUTE, 'cpu_usage', 52.8, 'web-01', 'nginx', 'production', {'region': 'us-east-1'}),
(now() - INTERVAL 8 MINUTE, 'memory_usage', 78.5, 'web-01', 'nginx', 'production', {'region': 'us-east-1'}),
(now() - INTERVAL 7 MINUTE, 'response_time', 145.3, 'web-01', 'nginx', 'production', {'region': 'us-east-1'}),
(now() - INTERVAL 6 MINUTE, 'cpu_usage', 38.1, 'db-01', 'clickhouse', 'production', {'region': 'us-east-1'})"
print_success "Sample data inserted"
}
verify_installation() {
print_status "[8/8] Verifying installation..."
# Check ClickHouse version
VERSION=$(clickhouse-client --query "SELECT version()")
print_success "ClickHouse version: $VERSION"
# Verify database exists
DATABASES=$(clickhouse-client --query "SHOW DATABASES" | grep -c "$DATABASE_NAME" || true)
if [ "$DATABASES" -eq 1 ]; then
print_success "Analytics database verified"
else
print_error "Analytics database not found"
exit 1
fi
# Verify tables exist
TABLES=$(clickhouse-client --database="$DATABASE_NAME" --query "SHOW TABLES" | wc -l)
if [ "$TABLES" -ge 4 ]; then
print_success "Tables created successfully ($TABLES tables found)"
else
print_error "Not all tables were created"
exit 1
fi
# Test materialized view data
AGG_COUNT=$(clickhouse-client --database="$DATABASE_NAME" --query "SELECT count() FROM events_hourly_agg" 2>/dev/null || echo "0")
if [ "$AGG_COUNT" -gt 0 ]; then
print_success "Materialized views are working (found $AGG_COUNT aggregated records)"
else
print_warning "No aggregated data found yet (this is normal for fresh installation)"
fi
print_success "Installation verification completed"
}
main() {
print_status "Starting ClickHouse Materialized Views setup..."
check_root
detect_distro
install_clickhouse
configure_memory_settings
start_clickhouse
create_database
create_source_tables
create_materialized_views
insert_sample_data
verify_installation
echo
print_success "ClickHouse materialized views setup completed!"
echo
echo "Next steps:"
echo "1. Connect to ClickHouse: clickhouse-client --database=$DATABASE_NAME"
echo "2. Query aggregated data: SELECT * FROM events_hourly_agg LIMIT 10"
echo "3. Monitor real-time aggregations as new data arrives"
echo
}
main "$@"
Review the script before running. Execute with: bash install.sh