Set up a production-ready Apache Cassandra cluster with multiple nodes, configure replication strategies, and optimize JVM settings and performance parameters for high-throughput distributed NoSQL workloads.
Prerequisites
- 3+ servers with 8GB+ RAM each
- Network connectivity between nodes
- Java 11 support
- SSD storage recommended
What this solves
Apache Cassandra provides a distributed NoSQL database that scales horizontally across multiple nodes without single points of failure. This tutorial helps you configure a production-ready multi-node Cassandra cluster with proper replication, consistency levels, and performance optimization for high-availability applications that require linear scalability and fault tolerance.
Step-by-step installation
Update system packages and install Java
Cassandra requires Java 8 or 11. Install OpenJDK 11 which provides optimal performance for Cassandra workloads.
sudo apt update && sudo apt upgrade -y
sudo apt install -y openjdk-11-jdk python3-pip wget gnupg
Add Apache Cassandra repository
Add the official Apache Cassandra repository to install the latest stable version with security updates.
wget -q -O - https://downloads.apache.org/cassandra/KEYS | sudo apt-key add -
echo "deb https://downloads.apache.org/cassandra/debian 41x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
sudo apt update
Install Apache Cassandra
Install Cassandra and its dependencies. This creates the cassandra user and sets up the service configuration.
sudo apt install -y cassandra
Configure system limits for Cassandra
Cassandra requires increased file descriptor and process limits for optimal performance. Configure systemd and PAM limits.
cassandra soft memlock unlimited
cassandra hard memlock unlimited
cassandra soft nofile 100000
cassandra hard nofile 100000
cassandra soft nproc 32768
cassandra hard nproc 32768
cassandra soft as unlimited
cassandra hard as unlimited
For more details on configuring system resource limits, see our guide on configuring Linux system resource limits.
Configure Cassandra cluster settings
Configure the main Cassandra configuration file with cluster name, seed nodes, and network settings. This example configures a three-node cluster.
cluster_name: 'ProductionCluster'
num_tokens: 256
allocate_tokens_for_keyspace: system_auth
authenticator: PasswordAuthenticator
authorizer: CassandraAuthorizer
role_manager: CassandraRoleManager
roles_validity_in_ms: 2000
permissions_validity_in_ms: 2000
credentials_validity_in_ms: 2000
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "203.0.113.10,203.0.113.11,203.0.113.12"
listen_address: 203.0.113.10
broadcast_address: 203.0.113.10
rpc_address: 0.0.0.0
broadcast_rpc_address: 203.0.113.10
endpoint_snitch: GossipingPropertyFileSnitch
data_file_directories:
- /var/lib/cassandra/data
commitlog_directory: /var/lib/cassandra/commitlog
saved_caches_directory: /var/lib/cassandra/saved_caches
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
commitlog_segment_size_in_mb: 32
concurrent_reads: 32
concurrent_writes: 32
concurrent_counter_writes: 32
concurrent_materialized_view_writes: 32
disk_optimization_strategy: ssd
memtable_allocation_type: heap_buffers
index_summary_capacity_in_mb: 100
index_summary_resize_interval_in_minutes: 60
trickle_fsync: false
trickle_fsync_interval_in_kb: 10240
storage_port: 7000
ssl_storage_port: 7001
native_transport_port: 9042
rpc_port: 9160
start_native_transport: true
native_transport_max_threads: 128
native_transport_max_frame_size_in_mb: 256
thrift_framed_transport_size_in_mb: 15
incremental_backups: false
snapshot_before_compaction: false
auto_snapshot: true
column_index_size_in_kb: 64
batch_size_warn_threshold_in_kb: 5
batch_size_fail_threshold_in_kb: 50
compaction_throughput_mb_per_sec: 16
compaction_large_partition_warning_threshold_mb: 100
read_request_timeout_in_ms: 5000
range_request_timeout_in_ms: 10000
write_request_timeout_in_ms: 2000
counter_write_request_timeout_in_ms: 5000
cas_contention_timeout_in_ms: 1000
truncate_request_timeout_in_ms: 60000
request_timeout_in_ms: 10000
internode_compression: dc
inter_dc_tcp_nodelay: false
stream_throughput_outbound_megabits_per_sec: 200
inter_dc_stream_throughput_outbound_megabits_per_sec: 200
Configure datacenter and rack information
Set up datacenter and rack information for proper replica placement and network topology awareness.
dc=dc1
rack=rack1
prefer_local=true
Optimize JVM heap settings
Configure JVM heap size and garbage collection settings. Set heap to 25-50% of system RAM, maximum 31GB to avoid compressed OOP overhead.
-Xms8G
-Xmx8G
-XX:+UseG1GC
-XX:+UnlockExperimentalVMOptions
-XX:G1MaxNewSizePercent=75
-XX:G1NewSizePercent=40
-XX:MaxGCPauseMillis=100
-XX:+UseStringDeduplication
-XX:+UseTLAB
-XX:+ResizeTLAB
-XX:+UseNUMA
-XX:+PerfDisableSharedMem
-XX:+AlwaysPreTouch
-XX:+UseBiasedLocking
-XX:+OptimizeStringConcat
-XX:+UseCondCardMark
-Djava.net.preferIPv4Stack=true
-Dcom.sun.management.jmxremote.port=7199
-Dcom.sun.management.jmxremote.rmi.port=7199
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Djava.rmi.server.hostname=203.0.113.10
Set up data directory permissions
Ensure Cassandra has proper ownership of its data directories. Never use chmod 777 as it creates security vulnerabilities.
sudo chown -R cassandra:cassandra /var/lib/cassandra
sudo chown -R cassandra:cassandra /var/log/cassandra
sudo chmod 755 /var/lib/cassandra
sudo chmod 755 /var/log/cassandra
Configure firewall rules
Open necessary ports for Cassandra cluster communication. Port 7000 for inter-node communication, 9042 for client connections.
sudo ufw allow from 203.0.113.0/24 to any port 7000
sudo ufw allow from 203.0.113.0/24 to any port 7001
sudo ufw allow from 203.0.113.0/24 to any port 9042
sudo ufw allow from 203.0.113.0/24 to any port 7199
Start Cassandra service
Enable and start the Cassandra service. The service will automatically join the cluster using the seed node configuration.
sudo systemctl enable cassandra
sudo systemctl start cassandra
sudo systemctl status cassandra
Repeat configuration on additional nodes
Repeat steps 1-9 on each additional cluster node, changing only the listen_address, broadcast_address, and broadcast_rpc_address to match each node's IP address. Keep the seed list identical across all nodes.
Configure authentication and create admin user
Change the default cassandra user password and create administrative users for production security.
cqlsh -u cassandra -p cassandra
ALTER KEYSPACE system_auth WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': 3};
CREATE ROLE admin WITH PASSWORD = 'SecurePassword123!' AND LOGIN = true AND SUPERUSER = true;
ALTER ROLE cassandra WITH PASSWORD = 'NewCassandraPassword123!';
EXIT;
Configure replication strategy and consistency
Set up keyspaces with appropriate replication strategies for your data. NetworkTopologyStrategy provides better performance and availability.
cqlsh -u admin -p 'SecurePassword123!'
CREATE KEYSPACE production WITH replication = {
'class': 'NetworkTopologyStrategy',
'dc1': 3
} AND durable_writes = true;
CREATE TABLE production.users (
user_id UUID PRIMARY KEY,
username TEXT,
email TEXT,
created_at TIMESTAMP
) WITH compaction = {'class': 'SizeTieredCompactionStrategy'};
EXIT;
Verify your setup
Check cluster status and verify all nodes are operational with proper token distribution.
nodetool status
nodetool info
nodetool tpstats
cqlsh -u admin -p 'SecurePassword123!' -e "DESCRIBE KEYSPACES;"
nodetool ring
The nodetool status command should show all nodes as UN (Up/Normal) with roughly equal token ranges distributed across the cluster.
Performance optimization
Fine-tune Cassandra performance parameters based on your workload characteristics and hardware specifications.
Optimize compaction settings
Configure compaction strategies based on your read/write patterns. STCS for write-heavy, LCS for read-heavy workloads.
cqlsh -u admin -p 'SecurePassword123!'
ALTER TABLE production.users WITH compaction = {
'class': 'LeveledCompactionStrategy',
'sstable_size_in_mb': 160
};
EXIT;
Configure kernel parameters for performance
Optimize kernel settings for better I/O performance and network handling. These settings improve throughput for database workloads.
vm.max_map_count = 1048575
vm.swappiness = 1
vm.dirty_background_ratio = 5
vm.dirty_ratio = 10
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.rmem_default = 16777216
net.core.wmem_default = 16777216
net.core.optmem_max = 40960
net.ipv4.tcp_rmem = 4096 65536 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
sudo sysctl -p /etc/sysctl.d/99-cassandra.conf
For comprehensive kernel optimization, see our guide on optimizing Linux system performance.
Set up monitoring and alerting
Configure JMX monitoring and integrate with monitoring systems for production observability.
nodetool sjk mxb -b :7199 -mc
pip3 install cassandra-driver
sudo mkdir -p /opt/cassandra-monitoring
sudo chown cassandra:cassandra /opt/cassandra-monitoring
Common issues
| Symptom | Cause | Fix |
|---|---|---|
| Nodes show as DOWN in nodetool status | Network connectivity or firewall blocking | Check firewall rules and network connectivity between nodes |
| OutOfMemoryError in logs | JVM heap size too small or memory leak | Increase heap size or investigate heap dumps with jmap |
| High read/write latency | Inappropriate consistency level or disk I/O bottleneck | Lower consistency level or upgrade to SSD storage |
| Authentication failed errors | Replication factor for system_auth too low | ALTER system_auth keyspace with proper replication factor |
| Compaction behind warnings | Insufficient I/O capacity for compaction | Increase compaction_throughput_mb_per_sec or add faster storage |
| Permission denied on data directories | Incorrect file ownership or permissions | chown -R cassandra:cassandra /var/lib/cassandra, chmod 755 directories |
Next steps
- Install and configure Elasticsearch 8 for search capabilities alongside Cassandra
- Configure Cassandra SSL encryption and authentication for enhanced security
- Set up Cassandra backup automation with nodetool for data protection
- Monitor Cassandra cluster with Prometheus and Grafana for comprehensive observability
- Optimize Cassandra data modeling and query performance for application efficiency
Automated install script
Run this to automate the entire setup
#!/usr/bin/env bash
set -euo pipefail
# Colors
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'
# Default values
CLUSTER_NAME="${1:-TestCluster}"
SEED_NODES="${2:-127.0.0.1}"
NODE_IP="${3:-127.0.0.1}"
# Usage message
usage() {
echo "Usage: $0 [CLUSTER_NAME] [SEED_NODES] [NODE_IP]"
echo "Example: $0 MyCluster 10.0.0.1,10.0.0.2 10.0.0.3"
exit 1
}
# Logging functions
log_info() { echo -e "${GREEN}[INFO]${NC} $1"; }
log_warn() { echo -e "${YELLOW}[WARN]${NC} $1"; }
log_error() { echo -e "${RED}[ERROR]${NC} $1"; }
# Cleanup on failure
cleanup() {
log_error "Installation failed. Cleaning up..."
systemctl stop cassandra 2>/dev/null || true
case "$PKG_MGR" in
apt) apt remove -y cassandra 2>/dev/null || true ;;
dnf|yum) $PKG_INSTALL remove -y cassandra 2>/dev/null || true ;;
esac
rm -f /etc/apt/sources.list.d/cassandra.sources.list 2>/dev/null || true
rm -f /etc/yum.repos.d/cassandra.repo 2>/dev/null || true
}
trap cleanup ERR
# Check if running as root or with sudo
if [[ $EUID -ne 0 ]]; then
log_error "This script must be run as root or with sudo"
exit 1
fi
# Detect distribution
if [ -f /etc/os-release ]; then
. /etc/os-release
case "$ID" in
ubuntu|debian)
PKG_MGR="apt"
PKG_INSTALL="apt install -y"
PKG_UPDATE="apt update"
JAVA_PKG="openjdk-11-jdk"
;;
almalinux|rocky|centos|rhel|ol|fedora)
PKG_MGR="dnf"
PKG_INSTALL="dnf install -y"
PKG_UPDATE="dnf update -y"
JAVA_PKG="java-11-openjdk-devel"
;;
amzn)
PKG_MGR="yum"
PKG_INSTALL="yum install -y"
PKG_UPDATE="yum update -y"
JAVA_PKG="java-11-openjdk-devel"
;;
*)
log_error "Unsupported distribution: $ID"
exit 1
;;
esac
else
log_error "Cannot detect distribution"
exit 1
fi
log_info "Detected distribution: $ID using $PKG_MGR"
echo "[1/8] Updating system packages and installing dependencies..."
$PKG_UPDATE
$PKG_INSTALL $JAVA_PKG python3-pip wget gnupg curl
# Install gnupg2 for RHEL-based systems
if [[ "$PKG_MGR" != "apt" ]]; then
$PKG_INSTALL gnupg2
fi
echo "[2/8] Setting up Java environment..."
JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")
echo "export JAVA_HOME=$JAVA_HOME" >> /etc/environment
export JAVA_HOME=$JAVA_HOME
log_info "Java version: $(java -version 2>&1 | head -n1)"
echo "[3/8] Adding Apache Cassandra repository..."
case "$PKG_MGR" in
apt)
wget -qO - https://downloads.apache.org/cassandra/KEYS | apt-key add -
echo "deb https://downloads.apache.org/cassandra/debian 41x main" > /etc/apt/sources.list.d/cassandra.sources.list
$PKG_UPDATE
;;
dnf|yum)
rpm --import https://downloads.apache.org/cassandra/KEYS
cat > /etc/yum.repos.d/cassandra.repo << 'EOF'
[cassandra]
name=Apache Cassandra
baseurl=https://downloads.apache.org/cassandra/redhat/41x/
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://downloads.apache.org/cassandra/KEYS
EOF
;;
esac
echo "[4/8] Installing Apache Cassandra..."
$PKG_INSTALL cassandra
echo "[5/8] Configuring Cassandra cluster settings..."
CASSANDRA_CONF="/etc/cassandra/cassandra.yaml"
CASSANDRA_ENV="/etc/cassandra/cassandra-env.sh"
# Backup original configuration
cp $CASSANDRA_CONF $CASSANDRA_CONF.backup
cp $CASSANDRA_ENV $CASSANDRA_ENV.backup
# Configure cassandra.yaml
sed -i "s/cluster_name: 'Test Cluster'/cluster_name: '$CLUSTER_NAME'/" $CASSANDRA_CONF
sed -i "s/- seeds: \"127.0.0.1\"/- seeds: \"$SEED_NODES\"/" $CASSANDRA_CONF
sed -i "s/listen_address: localhost/listen_address: $NODE_IP/" $CASSANDRA_CONF
sed -i "s/rpc_address: localhost/rpc_address: $NODE_IP/" $CASSANDRA_CONF
sed -i "s/# broadcast_address: 1.2.3.4/broadcast_address: $NODE_IP/" $CASSANDRA_CONF
sed -i "s/# broadcast_rpc_address: 1.2.3.4/broadcast_rpc_address: $NODE_IP/" $CASSANDRA_CONF
# Performance optimizations
sed -i 's/concurrent_reads: 32/concurrent_reads: 64/' $CASSANDRA_CONF
sed -i 's/concurrent_writes: 32/concurrent_writes: 64/' $CASSANDRA_CONF
sed -i 's/concurrent_counter_writes: 32/concurrent_counter_writes: 64/' $CASSANDRA_CONF
# Configure JVM heap size based on available memory
TOTAL_MEM=$(free -m | awk '/^Mem:/{print $2}')
HEAP_SIZE=$((TOTAL_MEM / 4))
if [ $HEAP_SIZE -gt 8192 ]; then
HEAP_SIZE=8192
elif [ $HEAP_SIZE -lt 1024 ]; then
HEAP_SIZE=1024
fi
echo "MAX_HEAP_SIZE=\"${HEAP_SIZE}M\"" >> $CASSANDRA_ENV
echo "HEAP_NEWSIZE=\"${HEAP_SIZE}M\"" >> $CASSANDRA_ENV
echo "[6/8] Setting up firewall rules..."
CASSANDRA_PORTS=(7000 7001 7199 9042 9160)
case "$ID" in
ubuntu|debian)
if command -v ufw >/dev/null 2>&1; then
for port in "${CASSANDRA_PORTS[@]}"; do
ufw allow $port/tcp
done
fi
;;
*)
if command -v firewall-cmd >/dev/null 2>&1; then
for port in "${CASSANDRA_PORTS[@]}"; do
firewall-cmd --permanent --add-port=$port/tcp
done
firewall-cmd --reload
fi
;;
esac
echo "[7/8] Setting proper permissions and ownership..."
chown -R cassandra:cassandra /var/lib/cassandra
chown -R cassandra:cassandra /var/log/cassandra
chown cassandra:cassandra $CASSANDRA_CONF
chmod 644 $CASSANDRA_CONF
chmod 755 /var/lib/cassandra
chmod 755 /var/log/cassandra
# Configure SELinux if present
if command -v setsebool >/dev/null 2>&1; then
setsebool -P cassandra_can_network_connect 1 2>/dev/null || true
fi
echo "[8/8] Starting and enabling Cassandra service..."
systemctl daemon-reload
systemctl enable cassandra
systemctl start cassandra
# Wait for Cassandra to start
log_info "Waiting for Cassandra to start..."
sleep 30
# Verification
echo "Verifying Cassandra installation..."
if systemctl is-active --quiet cassandra; then
log_info "✓ Cassandra service is running"
else
log_error "✗ Cassandra service is not running"
systemctl status cassandra
exit 1
fi
# Check if Cassandra is listening on expected ports
for port in 7000 9042; do
if netstat -ln 2>/dev/null | grep -q ":$port " || ss -ln 2>/dev/null | grep -q ":$port "; then
log_info "✓ Cassandra is listening on port $port"
else
log_warn "⚠ Cassandra may not be listening on port $port"
fi
done
# Test connection
if timeout 10 cqlsh $NODE_IP -e "DESCRIBE keyspaces;" >/dev/null 2>&1; then
log_info "✓ CQL connection successful"
else
log_warn "⚠ CQL connection test failed - may need more time to initialize"
fi
log_info "Cassandra cluster node installation completed!"
log_info "Cluster Name: $CLUSTER_NAME"
log_info "Node IP: $NODE_IP"
log_info "Seed Nodes: $SEED_NODES"
log_info ""
log_info "Next steps:"
log_info "1. Install and configure other nodes in your cluster"
log_info "2. Use 'nodetool status' to check cluster health"
log_info "3. Connect using: cqlsh $NODE_IP"
log_info "4. Monitor logs: tail -f /var/log/cassandra/cassandra.log"
Review the script before running. Execute with: bash install.sh