Master advanced Cassandra optimization with data modeling best practices, partition strategies, JVM tuning, and comprehensive monitoring. Learn to design efficient schemas, optimize queries, and implement Prometheus integration for production-grade performance.
Prerequisites
- Running Cassandra cluster (3+ nodes recommended)
- Administrator access to all cluster nodes
- Basic understanding of Cassandra architecture
- Prometheus and Grafana installed (or ability to install them)
- At least 16GB RAM per Cassandra node for production tuning
What this solves
Apache Cassandra's distributed architecture provides exceptional scalability, but achieving optimal performance requires careful data modeling, query optimization, and system tuning. This tutorial covers advanced techniques for designing efficient partition strategies, optimizing query patterns, tuning JVM parameters, and implementing comprehensive monitoring with Prometheus integration.
You'll learn to identify and resolve performance bottlenecks, implement proper data modeling patterns, and establish monitoring systems that provide deep insights into your Cassandra cluster's behavior. These techniques are essential for production environments handling high-throughput workloads.
Prerequisites and planning
Before starting the optimization process, ensure you have a running Cassandra cluster and understand your application's data access patterns. You'll need administrator access to modify configuration files and restart services.
Assess current cluster health
Start by examining your cluster's current state and identifying performance metrics.
nodetool status
nodetool info
nodetool tpstats
nodetool compactionstats
Install monitoring tools
Install JMX monitoring tools and Prometheus components for comprehensive metrics collection.
sudo apt update
sudo apt install -y openjdk-11-jre-headless wget curl
Data modeling optimization
Analyze current partition patterns
Examine your existing tables to identify partition hotspots and inefficient data distribution.
nodetool cfstats keyspace_name.table_name
nodetool tablehistograms keyspace_name table_name
Design optimal partition keys
Create partition keys that distribute data evenly across nodes while supporting your query patterns.
-- Bad: Creates hotspots
CREATE TABLE user_events_bad (
event_date date,
user_id uuid,
event_type text,
data text,
PRIMARY KEY (event_date, user_id, event_type)
);
-- Good: Better distribution
CREATE TABLE user_events_optimized (
user_id uuid,
event_date date,
event_type text,
event_time timestamp,
data text,
PRIMARY KEY ((user_id, event_date), event_time, event_type)
) WITH CLUSTERING ORDER BY (event_time DESC);
Implement bucketing strategies
Use time-based or hash-based bucketing to prevent partition growth issues.
-- Time-based bucketing for high-volume data
CREATE TABLE metrics_by_hour (
metric_name text,
bucket_hour timestamp,
recorded_at timestamp,
value double,
tags map,
PRIMARY KEY ((metric_name, bucket_hour), recorded_at)
) WITH CLUSTERING ORDER BY (recorded_at DESC)
AND compaction = {'class': 'TimeWindowCompactionStrategy',
'compaction_window_unit': 'HOURS',
'compaction_window_size': 1};
Configure proper compaction strategies
Choose and configure compaction strategies based on your workload patterns.
-- For time-series data
ALTER TABLE time_series_data WITH compaction = {
'class': 'TimeWindowCompactionStrategy',
'compaction_window_unit': 'DAYS',
'compaction_window_size': 1,
'max_threshold': 32,
'min_threshold': 4
};
-- For write-heavy workloads
ALTER TABLE write_heavy_table WITH compaction = {
'class': 'LeveledCompactionStrategy',
'sstable_size_in_mb': 160
};
-- For read-heavy workloads
ALTER TABLE read_heavy_table WITH compaction = {
'class': 'SizeTieredCompactionStrategy',
'max_threshold': 32,
'min_threshold': 4
};
Query optimization techniques
Enable query tracing
Use Cassandra's built-in tracing to identify slow queries and optimization opportunities.
cqlsh -e "TRACING ON; SELECT * FROM keyspace.table WHERE partition_key = 'value';"
nodetool settraceprobability 0.01
Optimize read patterns with secondary indexes
Create efficient secondary indexes for non-primary key queries while avoiding performance pitfalls.
-- SASI index for range queries
CREATE CUSTOM INDEX user_age_idx ON users (age)
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {'mode': 'SPARSE'};
-- Regular secondary index for equality queries
CREATE INDEX user_status_idx ON users (status);
-- Materialized view for complex query patterns
CREATE MATERIALIZED VIEW users_by_department AS
SELECT user_id, department, name, email, created_at
FROM users
WHERE department IS NOT NULL AND user_id IS NOT NULL
PRIMARY KEY (department, user_id);
Configure read repair and consistency
Balance consistency requirements with performance by tuning read repair and consistency levels.
# Reduce read repair chance for better performance
read_request_timeout_in_ms: 5000
range_request_timeout_in_ms: 10000
write_request_timeout_in_ms: 2000
counter_write_request_timeout_in_ms: 5000
cas_contention_timeout_in_ms: 1000
truncate_request_timeout_in_ms: 60000
request_timeout_in_ms: 10000
Optimize hints for better consistency
hinted_handoff_enabled: true
max_hint_window_in_ms: 10800000 # 3 hours
hinted_handoff_throttle_in_kb: 1024
JVM tuning and memory management
Configure heap size and garbage collection
Optimize JVM heap settings and garbage collection for your workload and available memory.
# Set heap size (typically 1/4 to 1/2 of system RAM, max 32GB)
-Xms8G
-Xmx8G
Use G1GC for better latency
-XX:+UseG1GC
-XX:G1RSetUpdatingPauseTimePercent=5
-XX:MaxGCPauseMillis=300
-XX:InitiatingHeapOccupancyPercent=70
Enable GC logging
-XX:+PrintGC
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintGCApplicationStoppedTime
-Xloggc:/var/log/cassandra/gc.log
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=10
-XX:GCLogFileSize=10M
Optimize for large heaps
-XX:+UnlockExperimentalVMOptions
-XX:+UseCGroupMemoryLimitForHeap
-XX:+UseStringDeduplication
Configure off-heap memory settings
Tune off-heap memory pools for optimal caching performance.
# Key cache settings
key_cache_size_in_mb: 1024
key_cache_save_period: 14400
Row cache (use cautiously)
row_cache_size_in_mb: 512
row_cache_save_period: 14400
row_cache_provider: SerializingCacheProvider
Counter cache
counter_cache_size_in_mb: 256
counter_cache_save_period: 7200
Memtable settings
memtable_allocation_type: heap_buffers
memtable_heap_space_in_mb: 2048
memtable_offheap_space_in_mb: 2048
Optimize native transport and networking
Configure networking parameters for better throughput and connection handling.
# Native transport optimization
native_transport_max_threads: 128
native_transport_max_frame_size_in_mb: 256
Internode communication
inter_dc_tcp_nodelay: true
internode_compression: dc
Concurrent operations
concurrent_reads: 32
concurrent_writes: 32
concurrent_counter_writes: 32
concurrent_materialized_view_writes: 32
Commit log optimization
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
commitlog_segment_size_in_mb: 32
Advanced performance monitoring with Prometheus
Install and configure JMX exporter
Set up the Prometheus JMX exporter to collect Cassandra metrics.
cd /opt
sudo wget https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.19.0/jmx_prometheus_javaagent-0.19.0.jar
sudo chown cassandra:cassandra jmx_prometheus_javaagent-0.19.0.jar
Configure JMX exporter for Cassandra
Create a comprehensive JMX configuration to expose Cassandra metrics to Prometheus.
rules:
# Cassandra specific metrics
- pattern: org.apache.cassandra.metrics<>Value
name: cassandra_table_$3
labels:
keyspace: $1
table: $2
- pattern: org.apache.cassandra.metrics<>Value
name: cassandra_keyspace_$2
labels:
keyspace: $1
- pattern: org.apache.cassandra.metrics<>Value
name: cassandra_threadpool_$3
labels:
path: $1
scope: $2
- pattern: org.apache.cassandra.metrics<>(\w+)
name: cassandra_client_request_$2_$3
labels:
request_type: $1
- pattern: org.apache.cassandra.metrics<>Value
name: cassandra_storage_$1
- pattern: org.apache.cassandra.metrics<>Value
name: cassandra_compaction_$1
- pattern: org.apache.cassandra.metrics<>Value
name: cassandra_commitlog_$1
# JVM metrics
- pattern: java.langused
name: jvm_memory_heap_used
- pattern: java.langused
name: jvm_memory_nonheap_used
- pattern: java.lang
name: jvm_gc_collections_total
labels:
gc: $1
- pattern: java.lang
name: jvm_gc_collection_seconds_total
labels:
gc: $1
valueFactor: 0.001
Enable JMX exporter in Cassandra
Add the JMX exporter agent to Cassandra's JVM options.
# Add JMX Prometheus exporter
-javaagent:/opt/jmx_prometheus_javaagent-0.19.0.jar=7070:/opt/cassandra-jmx-config.yaml
Enable JMX
-Dcom.sun.management.jmxremote.port=7199
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
Configure Prometheus to scrape Cassandra metrics
Add Cassandra targets to your Prometheus configuration.
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'cassandra'
static_configs:
- targets: ['localhost:7070']
scrape_interval: 30s
metrics_path: /metrics
relabel_configs:
- source_labels: [__address__]
target_label: instance
regex: '([^:]+):\d+'
replacement: '${1}'
- source_labels: [__address__]
target_label: cassandra_cluster
replacement: 'production'
- job_name: 'node-exporter'
static_configs:
- targets: ['localhost:9100']
scrape_interval: 15s
Restart services and verify metrics collection
Restart Cassandra and Prometheus to begin collecting metrics.
sudo systemctl restart cassandra
sudo systemctl restart prometheus
Wait for Cassandra to start
sleep 30
Verify JMX exporter is working
curl -s localhost:7070/metrics | grep cassandra_table_read_latency | head -5
Check Prometheus targets
curl -s localhost:9090/api/v1/targets | jq '.data.activeTargets[] | select(.job=="cassandra") | .health'
Performance monitoring dashboards
Create Grafana dashboards for Cassandra monitoring
Set up comprehensive dashboards to monitor Cassandra performance metrics. If you don't have Grafana installed, check our Cassandra monitoring guide.
{
"dashboard": {
"title": "Cassandra Cluster Overview",
"panels": [
{
"title": "Read Latency (99th percentile)",
"type": "graph",
"targets": [{
"expr": "cassandra_table_read_latency{quantile=\"0.99\"}",
"legendFormat": "{{keyspace}}.{{table}}"
}]
},
{
"title": "Write Latency (99th percentile)",
"type": "graph",
"targets": [{
"expr": "cassandra_table_write_latency{quantile=\"0.99\"}",
"legendFormat": "{{keyspace}}.{{table}}"
}]
},
{
"title": "Compaction Tasks",
"type": "graph",
"targets": [{
"expr": "cassandra_compaction_pending_tasks",
"legendFormat": "Pending Compactions"
}]
}
]
}
}
Set up alerting rules
Configure Prometheus alerting rules for critical Cassandra metrics.
groups:
- name: cassandra-alerts
rules:
- alert: CassandraHighReadLatency
expr: cassandra_table_read_latency{quantile="0.99"} > 100
for: 2m
labels:
severity: warning
annotations:
summary: "High Cassandra read latency detected"
description: "Read latency is {{ $value }}ms on {{ $labels.keyspace }}.{{ $labels.table }}"
- alert: CassandraHighWriteLatency
expr: cassandra_table_write_latency{quantile="0.99"} > 100
for: 2m
labels:
severity: warning
annotations:
summary: "High Cassandra write latency detected"
description: "Write latency is {{ $value }}ms on {{ $labels.keyspace }}.{{ $labels.table }}"
- alert: CassandraPendingCompactions
expr: cassandra_compaction_pending_tasks > 100
for: 5m
labels:
severity: critical
annotations:
summary: "High number of pending compactions"
description: "{{ $value }} compactions are pending"
- alert: CassandraNodeDown
expr: up{job="cassandra"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Cassandra node is down"
description: "Cassandra node {{ $labels.instance }} is not responding"
Advanced tuning strategies
Implement read and write optimization
Configure advanced settings for read and write path optimization.
# Streaming optimization
stream_throughput_outbound_megabits_per_sec: 400
inter_dc_stream_throughput_outbound_megabits_per_sec: 200
SSTable optimization
sstable_preemptive_open_interval_in_mb: 50
index_summary_capacity_in_mb: 1024
index_summary_resize_interval_in_minutes: 60
Bloom filter optimization
bloom_filter_fp_chance: 0.01
Partition summary optimization
index_interval: 128
Configure cluster-wide consistency tuning
Set up consistency levels and repair strategies for optimal performance.
# Schedule regular repair operations
sudo crontab -e
Add: 0 2 0 /usr/bin/nodetool repair -pr -j 2
Configure hinted handoff for better consistency
nodetool sethintedhandoffthrottlekb 2048
nodetool setcompactionthreshold keyspace_name table_name 4 32
Verify your optimization
Test your optimized Cassandra cluster to ensure performance improvements and monitor key metrics.
# Check cluster health
nodetool status
nodetool info
Monitor performance metrics
nodetool tpstats
nodetool cfstats
nodetool compactionstats
Test JMX exporter
curl localhost:7070/metrics | grep cassandra_table_read_latency
Verify Prometheus is collecting metrics
curl -s localhost:9090/api/v1/query?query=cassandra_table_read_latency
Run performance test
cassandra-stress write n=100000 -rate threads=50
cassandra-stress read n=100000 -rate threads=50
Common issues
| Symptom | Cause | Fix |
|---|---|---|
| High read latency | Poor partition design or large partitions | Redesign partition keys, implement bucketing, check nodetool tablehistograms |
| JMX exporter not starting | Port conflict or config file issues | Check port 7070 availability with netstat -tulpn | grep 7070 |
| High GC pause times | Incorrect heap sizing or GC settings | Adjust heap size in jvm.options, monitor with tail -f /var/log/cassandra/gc.log |
| Pending compactions building up | Insufficient I/O or CPU resources | Increase concurrent_compactors, check disk I/O with iostat -x 1 |
| Prometheus missing metrics | JMX authentication or connectivity issues | Verify JMX settings, check firewall rules for port 7070 |
| Write timeouts | Network issues or overloaded nodes | Check network connectivity, monitor CPU/memory usage |
Next steps
- Set up automated Cassandra backups to protect your optimized cluster
- Monitor system time drift for accurate timestamp handling
- Configure multi-datacenter replication for geographic distribution
- Implement security hardening for production clusters
- Set up load testing to validate your optimizations
Running this in production?
Automated install script
Run this to automate the entire setup
#!/usr/bin/env bash
set -euo pipefail
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color
# Global variables
PROMETHEUS_VERSION="2.45.0"
JMX_EXPORTER_VERSION="0.19.0"
CASSANDRA_USER="cassandra"
# Usage function
usage() {
echo "Usage: $0 [--prometheus-port PORT] [--jmx-port PORT] [--cassandra-config-dir DIR]"
echo " --prometheus-port: Port for Prometheus (default: 9090)"
echo " --jmx-port: Port for JMX exporter (default: 9501)"
echo " --cassandra-config-dir: Cassandra config directory (auto-detected if not specified)"
exit 1
}
# Parse arguments
PROMETHEUS_PORT=9090
JMX_PORT=9501
CASSANDRA_CONFIG_DIR=""
while [[ $# -gt 0 ]]; do
case $1 in
--prometheus-port)
PROMETHEUS_PORT="$2"
shift 2
;;
--jmx-port)
JMX_PORT="$2"
shift 2
;;
--cassandra-config-dir)
CASSANDRA_CONFIG_DIR="$2"
shift 2
;;
-h|--help)
usage
;;
*)
echo -e "${RED}Unknown option: $1${NC}"
usage
;;
esac
done
# Cleanup function
cleanup() {
echo -e "${RED}Installation failed. Cleaning up...${NC}"
systemctl stop prometheus 2>/dev/null || true
rm -rf /opt/prometheus /etc/prometheus
userdel prometheus 2>/dev/null || true
}
trap cleanup ERR
# Check if running as root
if [[ $EUID -ne 0 ]]; then
echo -e "${RED}This script must be run as root${NC}"
exit 1
fi
echo -e "${GREEN}[1/12] Detecting distribution...${NC}"
if [ -f /etc/os-release ]; then
. /etc/os-release
case "$ID" in
ubuntu|debian)
PKG_MGR="apt"
PKG_INSTALL="apt install -y"
PKG_UPDATE="apt update"
JAVA_PACKAGE="openjdk-11-jre-headless"
;;
almalinux|rocky|centos|rhel|ol|fedora)
PKG_MGR="dnf"
PKG_INSTALL="dnf install -y"
PKG_UPDATE="dnf update -y"
JAVA_PACKAGE="java-11-openjdk-headless"
;;
amzn)
PKG_MGR="yum"
PKG_INSTALL="yum install -y"
PKG_UPDATE="yum update -y"
JAVA_PACKAGE="java-11-openjdk-headless"
;;
*)
echo -e "${RED}Unsupported distro: $ID${NC}"
exit 1
;;
esac
else
echo -e "${RED}Cannot detect distribution${NC}"
exit 1
fi
echo -e "${GREEN}Detected: $PRETTY_NAME${NC}"
echo -e "${GREEN}[2/12] Updating package repositories...${NC}"
$PKG_UPDATE
echo -e "${GREEN}[3/12] Installing prerequisites...${NC}"
$PKG_INSTALL $JAVA_PACKAGE wget curl tar
echo -e "${GREEN}[4/12] Detecting Cassandra configuration...${NC}"
if [ -z "$CASSANDRA_CONFIG_DIR" ]; then
for dir in /etc/cassandra /opt/cassandra/conf /usr/local/cassandra/conf; do
if [ -d "$dir" ]; then
CASSANDRA_CONFIG_DIR="$dir"
break
fi
done
fi
if [ -z "$CASSANDRA_CONFIG_DIR" ] || [ ! -d "$CASSANDRA_CONFIG_DIR" ]; then
echo -e "${YELLOW}Warning: Cassandra config directory not found. Continuing with monitoring setup...${NC}"
CASSANDRA_CONFIG_DIR="/etc/cassandra"
fi
echo -e "${GREEN}Using Cassandra config directory: $CASSANDRA_CONFIG_DIR${NC}"
echo -e "${GREEN}[5/12] Creating prometheus user...${NC}"
if ! id "prometheus" &>/dev/null; then
useradd --system --shell /bin/false --home-dir /var/lib/prometheus --create-home prometheus
fi
echo -e "${GREEN}[6/12] Downloading Prometheus...${NC}"
cd /tmp
wget -q "https://github.com/prometheus/prometheus/releases/download/v${PROMETHEUS_VERSION}/prometheus-${PROMETHEUS_VERSION}.linux-amd64.tar.gz"
tar xzf "prometheus-${PROMETHEUS_VERSION}.linux-amd64.tar.gz"
echo -e "${GREEN}[7/12] Installing Prometheus...${NC}"
mkdir -p /opt/prometheus /etc/prometheus /var/lib/prometheus
cp "prometheus-${PROMETHEUS_VERSION}.linux-amd64/prometheus" /opt/prometheus/
cp "prometheus-${PROMETHEUS_VERSION}.linux-amd64/promtool" /opt/prometheus/
cp -r "prometheus-${PROMETHEUS_VERSION}.linux-amd64/consoles" /etc/prometheus/
cp -r "prometheus-${PROMETHEUS_VERSION}.linux-amd64/console_libraries" /etc/prometheus/
chown -R prometheus:prometheus /opt/prometheus /etc/prometheus /var/lib/prometheus
chmod 755 /opt/prometheus/prometheus /opt/prometheus/promtool
echo -e "${GREEN}[8/12] Downloading JMX Exporter...${NC}"
wget -q -O /opt/prometheus/jmx_prometheus_javaagent.jar \
"https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/${JMX_EXPORTER_VERSION}/jmx_prometheus_javaagent-${JMX_EXPORTER_VERSION}.jar"
chown prometheus:prometheus /opt/prometheus/jmx_prometheus_javaagent.jar
echo -e "${GREEN}[9/12] Creating JMX exporter configuration...${NC}"
cat > /etc/prometheus/cassandra-jmx.yml << EOF
rules:
- pattern: org.apache.cassandra.metrics<type=(\w+), name=(\w+)><>Value
name: cassandra_\$1_\$2
type: GAUGE
- pattern: org.apache.cassandra.metrics<type=(\w+), name=(\w+), scope=(\w+)><>Value
name: cassandra_\$1_\$2
type: GAUGE
labels:
scope: "\$3"
- pattern: org.apache.cassandra.db<type=(\w+)><>(\w+)
name: cassandra_db_\$1_\$2
type: GAUGE
EOF
chown prometheus:prometheus /etc/prometheus/cassandra-jmx.yml
chmod 644 /etc/prometheus/cassandra-jmx.yml
echo -e "${GREEN}[10/12] Creating Prometheus configuration...${NC}"
cat > /etc/prometheus/prometheus.yml << EOF
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'cassandra'
static_configs:
- targets: ['localhost:$JMX_PORT']
scrape_interval: 30s
metrics_path: /metrics
EOF
chown prometheus:prometheus /etc/prometheus/prometheus.yml
chmod 644 /etc/prometheus/prometheus.yml
echo -e "${GREEN}[11/12] Creating systemd service...${NC}"
cat > /etc/systemd/system/prometheus.service << EOF
[Unit]
Description=Prometheus Server
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/opt/prometheus/prometheus \\
--config.file=/etc/prometheus/prometheus.yml \\
--storage.tsdb.path=/var/lib/prometheus \\
--web.console.templates=/etc/prometheus/consoles \\
--web.console.libraries=/etc/prometheus/console_libraries \\
--web.listen-address=0.0.0.0:$PROMETHEUS_PORT \\
--web.enable-lifecycle
ExecReload=/bin/kill -HUP \$MAINPID
Restart=always
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable prometheus
echo -e "${GREEN}[12/12] Starting services and configuring firewall...${NC}"
systemctl start prometheus
# Configure firewall based on distro
if command -v ufw >/dev/null 2>&1; then
ufw allow $PROMETHEUS_PORT/tcp
ufw allow $JMX_PORT/tcp
elif command -v firewall-cmd >/dev/null 2>&1; then
firewall-cmd --permanent --add-port=$PROMETHEUS_PORT/tcp
firewall-cmd --permanent --add-port=$JMX_PORT/tcp
firewall-cmd --reload
fi
# Clean up downloads
rm -rf /tmp/prometheus-${PROMETHEUS_VERSION}.linux-amd64*
echo -e "${GREEN}Installation completed successfully!${NC}"
echo
echo -e "${GREEN}Verification:${NC}"
echo "- Prometheus status: $(systemctl is-active prometheus)"
echo "- Prometheus web interface: http://localhost:$PROMETHEUS_PORT"
echo "- JMX exporter port configured: $JMX_PORT"
echo
echo -e "${YELLOW}Next steps:${NC}"
echo "1. Configure Cassandra JVM to use JMX exporter:"
echo " Add to cassandra-env.sh: JVM_OPTS=\"\$JVM_OPTS -javaagent:/opt/prometheus/jmx_prometheus_javaagent.jar=$JMX_PORT:/etc/prometheus/cassandra-jmx.yml\""
echo "2. Restart Cassandra to enable JMX monitoring"
echo "3. Access Prometheus at http://your-server:$PROMETHEUS_PORT"
echo "4. Run 'nodetool status' to check cluster health"
echo "5. Use 'nodetool tpstats' to monitor thread pools"
Review the script before running. Execute with: bash install.sh