Set up complete ScyllaDB cluster monitoring using Prometheus for metrics collection and Grafana for visualization. Configure alerting rules for proactive performance monitoring and issue detection.
Prerequisites
- Root or sudo access
- Three servers for ScyllaDB cluster
- One server for monitoring stack
- 8GB RAM per ScyllaDB node
- Basic understanding of NoSQL databases
What this solves
ScyllaDB clusters require continuous monitoring to track performance, resource usage, and cluster health. This tutorial sets up comprehensive monitoring using Prometheus to collect ScyllaDB metrics and Grafana for visualization, helping you detect issues before they impact your applications.
Step-by-step installation
Install ScyllaDB cluster nodes
Set up a three-node ScyllaDB cluster for high availability and performance monitoring.
sudo apt update && sudo apt upgrade -y
wget -qO - https://downloads.scylladb.com/deb/ubuntu/scylla-5.4-$(lsb_release -s -c).list | sudo tee /etc/apt/sources.list.d/scylla.list
wget -qO - https://downloads.scylladb.com/downloads/scylla-drivers-repo/scylla.key | sudo apt-key add -
sudo apt update
sudo apt install -y scyllaConfigure ScyllaDB for monitoring
Enable Prometheus metrics endpoints on each ScyllaDB node by configuring the monitoring settings.
cluster_name: 'ScyllaCluster'
seeds: "203.0.113.10,203.0.113.11,203.0.113.12"
listen_address: 203.0.113.10
rpc_address: 0.0.0.0
broadcast_rpc_address: 203.0.113.10
endpoint_snitch: GossipingPropertyFileSnitch
prometheus_port: 9180
prometheus_address: 0.0.0.0Start ScyllaDB cluster
Enable and start ScyllaDB on all cluster nodes, then verify cluster formation.
sudo scylla_setup --no-raid-setup --no-fstrim-setup --no-coredump-setup --no-sysconfig-setup --no-bootparam-setup --no-ec2-check
sudo systemctl enable scylla-server
sudo systemctl start scylla-server
sudo systemctl status scylla-serverVerify cluster status
Check that all nodes have joined the cluster successfully and are in UN (Up Normal) status.
nodetool status
nodetool describeclusterInstall Prometheus server
Install Prometheus on a dedicated monitoring server to collect metrics from ScyllaDB nodes.
sudo apt update
sudo apt install -y prometheus prometheus-node-exporterConfigure Prometheus for ScyllaDB
Set up Prometheus configuration to scrape metrics from ScyllaDB cluster nodes and node exporters.
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "/etc/prometheus/rules/*.yml"
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'scylla'
static_configs:
- targets:
- '203.0.113.10:9180'
- '203.0.113.11:9180'
- '203.0.113.12:9180'
scrape_interval: 10s
metrics_path: /metrics
- job_name: 'node-exporter'
static_configs:
- targets:
- '203.0.113.10:9100'
- '203.0.113.11:9100'
- '203.0.113.12:9100'
scrape_interval: 15sCreate ScyllaDB alerting rules
Define Prometheus alerting rules for common ScyllaDB issues and performance thresholds.
sudo mkdir -p /etc/prometheus/rulesgroups:
- name: scylla.rules
rules:
- alert: ScyllaDBNodeDown
expr: up{job="scylla"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "ScyllaDB node {{ $labels.instance }} is down"
description: "ScyllaDB node {{ $labels.instance }} has been down for more than 1 minute."
- alert: ScyllaDBHighLatency
expr: scylla_storage_proxy_coordinator_read_latency{quantile="0.99"} > 100
for: 5m
labels:
severity: warning
annotations:
summary: "High read latency on {{ $labels.instance }}"
description: "99th percentile read latency is {{ $value }}ms on {{ $labels.instance }}"
- alert: ScyllaDBHighCPUUsage
expr: scylla_reactor_utilization > 0.8
for: 10m
labels:
severity: warning
annotations:
summary: "High CPU usage on {{ $labels.instance }}"
description: "CPU utilization is {{ $value | humanizePercentage }} on {{ $labels.instance }}"
- alert: ScyllaDBLowDiskSpace
expr: (scylla_database_total_disk_space_bytes - scylla_database_used_disk_space_bytes) / scylla_database_total_disk_space_bytes < 0.1
for: 5m
labels:
severity: critical
annotations:
summary: "Low disk space on {{ $labels.instance }}"
description: "Available disk space is less than 10% on {{ $labels.instance }}"
- alert: ScyllaDBHighMemoryUsage
expr: scylla_memory_allocated_memory / scylla_memory_total_memory > 0.9
for: 10m
labels:
severity: warning
annotations:
summary: "High memory usage on {{ $labels.instance }}"
description: "Memory usage is {{ $value | humanizePercentage }} on {{ $labels.instance }}"
- alert: ScyllaDBCompactionBacklog
expr: scylla_compaction_manager_pending_tasks > 100
for: 15m
labels:
severity: warning
annotations:
summary: "High compaction backlog on {{ $labels.instance }}"
description: "{{ $value }} pending compaction tasks on {{ $labels.instance }}"
- alert: ScyllaDBTimeoutOperations
expr: increase(scylla_storage_proxy_coordinator_read_timeouts_total[5m]) > 10
for: 2m
labels:
severity: critical
annotations:
summary: "High number of read timeouts on {{ $labels.instance }}"
description: "{{ $value }} read timeouts in the last 5 minutes on {{ $labels.instance }}"
- alert: ScyllaDBClusterNotHealthy
expr: count(up{job="scylla"} == 1) < 2
for: 1m
labels:
severity: critical
annotations:
summary: "ScyllaDB cluster unhealthy"
description: "Only {{ $value }} ScyllaDB nodes are available out of expected 3 nodes"Install and configure Alertmanager
Set up Alertmanager to handle alerts from Prometheus and send notifications.
sudo apt install -y prometheus-alertmanagerglobal:
smtp_smarthost: 'localhost:587'
smtp_from: 'alerts@example.com'
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'web.hook'
receivers:
- name: 'web.hook'
email_configs:
- to: 'admin@example.com'
subject: 'ScyllaDB Alert: {{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'
body: |
{{ range .Alerts }}
Alert: {{ .Annotations.summary }}
Description: {{ .Annotations.description }}
Instance: {{ .Labels.instance }}
Severity: {{ .Labels.severity }}
{{ end }}Start monitoring services
Enable and start Prometheus and Alertmanager services.
sudo systemctl enable prometheus alertmanager
sudo systemctl start prometheus alertmanager
sudo systemctl status prometheus alertmanagerInstall Grafana
Install Grafana for creating dashboards and visualizing ScyllaDB metrics.
sudo apt install -y software-properties-common
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt update
sudo apt install -y grafanaConfigure Grafana
Configure Grafana with security settings and enable anonymous access for monitoring dashboards.
[server]
http_port = 3000
domain = example.com
root_url = http://example.com:3000/
[security]
admin_user = admin
admin_password = your_secure_password
secret_key = your_secret_key
[auth.anonymous]
enabled = false
[alerting]
execute_alerts = trueStart Grafana
Enable and start Grafana service, then access the web interface.
sudo systemctl enable grafana-server
sudo systemctl start grafana-server
sudo systemctl status grafana-serverConfigure Prometheus data source in Grafana
Add Prometheus as a data source in Grafana to access ScyllaDB metrics.
curl -X POST http://admin:your_secure_password@localhost:3000/api/datasources \
-H "Content-Type: application/json" \
-d '{
"name": "Prometheus",
"type": "prometheus",
"url": "http://localhost:9090",
"access": "proxy",
"isDefault": true
}'Import ScyllaDB dashboard
Create a comprehensive ScyllaDB monitoring dashboard with key performance metrics.
{
"dashboard": {
"title": "ScyllaDB Cluster Monitoring",
"tags": ["scylladb", "performance"],
"timezone": "browser",
"panels": [
{
"title": "Cluster Status",
"type": "stat",
"targets": [
{
"expr": "count(up{job=\"scylla\"} == 1)",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"displayName": "Nodes Up",
"min": 0,
"max": 3
}
},
"gridPos": {"h": 8, "w": 6, "x": 0, "y": 0}
},
{
"title": "Read Latency (99th percentile)",
"type": "timeseries",
"targets": [
{
"expr": "scylla_storage_proxy_coordinator_read_latency{quantile=\"0.99\"}",
"refId": "A",
"legendFormat": "{{instance}}"
}
],
"fieldConfig": {
"defaults": {
"unit": "ms"
}
},
"gridPos": {"h": 8, "w": 18, "x": 6, "y": 0}
},
{
"title": "Write Latency (99th percentile)",
"type": "timeseries",
"targets": [
{
"expr": "scylla_storage_proxy_coordinator_write_latency{quantile=\"0.99\"}",
"refId": "A",
"legendFormat": "{{instance}}"
}
],
"fieldConfig": {
"defaults": {
"unit": "ms"
}
},
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 8}
},
{
"title": "CPU Utilization",
"type": "timeseries",
"targets": [
{
"expr": "scylla_reactor_utilization * 100",
"refId": "A",
"legendFormat": "{{instance}}"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"min": 0,
"max": 100
}
},
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 8}
},
{
"title": "Memory Usage",
"type": "timeseries",
"targets": [
{
"expr": "scylla_memory_allocated_memory / scylla_memory_total_memory * 100",
"refId": "A",
"legendFormat": "{{instance}}"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"min": 0,
"max": 100
}
},
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 16}
},
{
"title": "Disk Usage",
"type": "timeseries",
"targets": [
{
"expr": "scylla_database_used_disk_space_bytes / scylla_database_total_disk_space_bytes * 100",
"refId": "A",
"legendFormat": "{{instance}}"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"min": 0,
"max": 100
}
},
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 16}
},
{
"title": "Operations per Second",
"type": "timeseries",
"targets": [
{
"expr": "rate(scylla_storage_proxy_coordinator_reads_total[5m])",
"refId": "A",
"legendFormat": "Reads - {{instance}}"
},
{
"expr": "rate(scylla_storage_proxy_coordinator_writes_total[5m])",
"refId": "B",
"legendFormat": "Writes - {{instance}}"
}
],
"gridPos": {"h": 8, "w": 24, "x": 0, "y": 24}
},
{
"title": "Compaction Tasks",
"type": "timeseries",
"targets": [
{
"expr": "scylla_compaction_manager_pending_tasks",
"refId": "A",
"legendFormat": "Pending - {{instance}}"
},
{
"expr": "scylla_compaction_manager_active_tasks",
"refId": "B",
"legendFormat": "Active - {{instance}}"
}
],
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 32}
},
{
"title": "Error Rates",
"type": "timeseries",
"targets": [
{
"expr": "rate(scylla_storage_proxy_coordinator_read_timeouts_total[5m])",
"refId": "A",
"legendFormat": "Read Timeouts - {{instance}}"
},
{
"expr": "rate(scylla_storage_proxy_coordinator_write_timeouts_total[5m])",
"refId": "B",
"legendFormat": "Write Timeouts - {{instance}}"
}
],
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 32}
}
],
"time": {
"from": "now-1h",
"to": "now"
},
"refresh": "10s"
}
}curl -X POST http://admin:your_secure_password@localhost:3000/api/dashboards/db \
-H "Content-Type: application/json" \
-d @/tmp/scylla-dashboard.jsonConfigure firewall access
Open necessary ports for monitoring services while maintaining security.
sudo ufw allow 9090/tcp comment 'Prometheus'
sudo ufw allow 9093/tcp comment 'Alertmanager'
sudo ufw allow 3000/tcp comment 'Grafana'
sudo ufw allow from 203.0.113.0/24 to any port 9180 comment 'ScyllaDB metrics'
sudo ufw reloadConfigure performance optimization
Tune Prometheus retention
Configure Prometheus retention and storage settings for long-term monitoring data.
ARGS="--config.file=/etc/prometheus/prometheus.yml --storage.tsdb.path=/var/lib/prometheus --storage.tsdb.retention.time=90d --storage.tsdb.retention.size=50GB --web.console.libraries=/etc/prometheus/console_libraries --web.console.templates=/etc/prometheus/consoles --web.enable-lifecycle"Configure ScyllaDB monitoring user
Create a monitoring-specific user in ScyllaDB with limited privileges for security.
cqlsh -e "CREATE USER monitoring WITH PASSWORD 'monitoring_password' NOSUPERUSER;"
cqlsh -e "GRANT SELECT ON ALL KEYSPACES TO monitoring;"Set up log monitoring
Configure log monitoring for ScyllaDB error detection and troubleshooting.
# Add this job to the existing scrape_configs section
- job_name: 'scylla-logs'
static_configs:
- targets:
- '203.0.113.10:9080'
- '203.0.113.11:9080'
- '203.0.113.12:9080'
scrape_interval: 30s
metrics_path: /metricsVerify your setup
Check that all monitoring components are working and collecting data properly.
# Check ScyllaDB metrics endpoint
curl http://203.0.113.10:9180/metrics | grep scylla_storage_proxy
Verify Prometheus targets
curl http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | {job, instance, health}'
Check Grafana datasource
curl -u admin:your_secure_password http://localhost:3000/api/datasources
Test alerting rules
curl http://localhost:9090/api/v1/rules | jq '.data.groups[].rules[].name'
Verify cluster status
nodetool status
nodetool infoCommon issues
| Symptom | Cause | Fix |
|---|---|---|
| Prometheus can't reach ScyllaDB metrics | Firewall blocking port 9180 | Configure firewall rules or disable for testing |
| Grafana shows "No data" | Prometheus data source not configured | Check datasource URL and connectivity |
| High memory usage alerts | Normal ScyllaDB behavior | Adjust thresholds in alerting rules |
| Missing ScyllaDB metrics | prometheus_port not configured | Add prometheus_port to scylla.yaml and restart |
| Alertmanager not sending emails | SMTP configuration issues | Check SMTP settings and test with amtool |
| Dashboard shows connection refused | ScyllaDB node down | Check ScyllaDB service status with systemctl status scylla-server |
Next steps
You now have comprehensive ScyllaDB monitoring with Prometheus and Grafana. Consider these additional improvements:
- Monitor Apache Cassandra cluster with Prometheus and Grafana dashboards for comparison with other NoSQL systems
- Configure ScyllaDB backup and restore with automation for data protection strategies
- Set up ScyllaDB multi-datacenter replication for disaster recovery
- Optimize ScyllaDB performance tuning for production workloads
Running this in production?
Automated install script
Run this to automate the entire setup
#!/usr/bin/env bash
set -euo pipefail
# ScyllaDB Cluster Monitoring Setup with Prometheus and Grafana
# Production-ready installation script
# Colors
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'
# Configuration
CLUSTER_NAME="ScyllaCluster"
PROMETHEUS_PORT="9180"
NODE_EXPORTER_PORT="9100"
PROMETHEUS_WEB_PORT="9090"
usage() {
echo "Usage: $0 [OPTIONS]"
echo "Options:"
echo " --cluster-ips IPs Comma-separated list of cluster IPs (required)"
echo " --local-ip IP Local IP address for this node (required)"
echo " --monitoring-only Install only monitoring components (Prometheus/Grafana)"
echo " --scylla-only Install only ScyllaDB"
echo " --help Show this help message"
echo ""
echo "Example:"
echo " $0 --cluster-ips 10.0.1.10,10.0.1.11,10.0.1.12 --local-ip 10.0.1.10"
exit 1
}
error() {
echo -e "${RED}ERROR: $1${NC}" >&2
exit 1
}
success() {
echo -e "${GREEN}✓ $1${NC}"
}
warning() {
echo -e "${YELLOW}⚠ $1${NC}"
}
info() {
echo -e "[$(date '+%H:%M:%S')] $1"
}
cleanup() {
if [ $? -ne 0 ]; then
error "Installation failed. Check logs above for details."
fi
}
trap cleanup ERR
check_prerequisites() {
info "[1/12] Checking prerequisites..."
if [[ $EUID -ne 0 ]]; then
error "This script must be run as root or with sudo"
fi
if ! command -v wget &> /dev/null; then
error "wget is required but not installed"
fi
success "Prerequisites check passed"
}
detect_distro() {
info "[2/12] Detecting distribution..."
if [ ! -f /etc/os-release ]; then
error "Cannot detect distribution - /etc/os-release not found"
fi
. /etc/os-release
case "$ID" in
ubuntu|debian)
PKG_MGR="apt"
PKG_UPDATE="apt update"
PKG_INSTALL="apt install -y"
PKG_UPGRADE="apt upgrade -y"
PROMETHEUS_CONFIG="/etc/prometheus/prometheus.yml"
PROMETHEUS_RULES="/etc/prometheus/rules"
PROMETHEUS_SERVICE="prometheus"
FIREWALL_CMD="ufw"
;;
almalinux|rocky|centos|rhel|ol)
PKG_MGR="dnf"
PKG_UPDATE="dnf update -y"
PKG_INSTALL="dnf install -y"
PKG_UPGRADE="dnf upgrade -y"
PROMETHEUS_CONFIG="/etc/prometheus/prometheus.yml"
PROMETHEUS_RULES="/etc/prometheus/rules"
PROMETHEUS_SERVICE="prometheus"
FIREWALL_CMD="firewall-cmd"
;;
amzn)
PKG_MGR="yum"
PKG_UPDATE="yum update -y"
PKG_INSTALL="yum install -y"
PKG_UPGRADE="yum upgrade -y"
PROMETHEUS_CONFIG="/etc/prometheus/prometheus.yml"
PROMETHEUS_RULES="/etc/prometheus/rules"
PROMETHEUS_SERVICE="prometheus"
FIREWALL_CMD="firewall-cmd"
;;
fedora)
PKG_MGR="dnf"
PKG_UPDATE="dnf update -y"
PKG_INSTALL="dnf install -y"
PKG_UPGRADE="dnf upgrade -y"
PROMETHEUS_CONFIG="/etc/prometheus/prometheus.yml"
PROMETHEUS_RULES="/etc/prometheus/rules"
PROMETHEUS_SERVICE="prometheus"
FIREWALL_CMD="firewall-cmd"
;;
*)
error "Unsupported distribution: $ID"
;;
esac
success "Detected $PRETTY_NAME"
}
update_system() {
info "[3/12] Updating system packages..."
$PKG_UPDATE
success "System updated"
}
install_scylladb() {
info "[4/12] Installing ScyllaDB..."
case "$PKG_MGR" in
apt)
wget -qO - https://downloads.scylladb.com/deb/ubuntu/scylla-5.4-$(lsb_release -s -c).list | tee /etc/apt/sources.list.d/scylla.list
wget -qO - https://downloads.scylladb.com/downloads/scylla-drivers-repo/scylla.key | apt-key add -
$PKG_UPDATE
$PKG_INSTALL scylla
;;
dnf|yum)
curl -L --output /etc/yum.repos.d/scylla.repo http://downloads.scylladb.com/rpm/centos/scylla-5.4.repo
$PKG_INSTALL scylla
;;
esac
success "ScyllaDB installed"
}
configure_scylladb() {
info "[5/12] Configuring ScyllaDB..."
# Backup original config
cp /etc/scylla/scylla.yaml /etc/scylla/scylla.yaml.backup
# Create new configuration
cat > /etc/scylla/scylla.yaml << EOF
cluster_name: '$CLUSTER_NAME'
seeds: "$CLUSTER_IPS"
listen_address: $LOCAL_IP
rpc_address: 0.0.0.0
broadcast_rpc_address: $LOCAL_IP
endpoint_snitch: GossipingPropertyFileSnitch
prometheus_port: $PROMETHEUS_PORT
prometheus_address: 0.0.0.0
data_file_directories:
- /var/lib/scylla/data
commitlog_directory: /var/lib/scylla/commitlog
hints_directory: /var/lib/scylla/hints
view_hints_directory: /var/lib/scylla/view_hints
EOF
chown scylla:scylla /etc/scylla/scylla.yaml
chmod 644 /etc/scylla/scylla.yaml
success "ScyllaDB configured"
}
setup_scylladb() {
info "[6/12] Setting up ScyllaDB..."
scylla_setup --no-raid-setup --no-fstrim-setup --no-coredump-setup --no-sysconfig-setup --no-bootparam-setup --no-ec2-check
systemctl enable scylla-server
systemctl start scylla-server
# Wait for ScyllaDB to start
sleep 10
success "ScyllaDB setup completed"
}
install_monitoring() {
info "[7/12] Installing monitoring components..."
case "$PKG_MGR" in
apt)
$PKG_INSTALL prometheus prometheus-node-exporter grafana
;;
dnf)
$PKG_INSTALL epel-release
$PKG_INSTALL prometheus2 node_exporter grafana
# Fix service name for RHEL-based
PROMETHEUS_SERVICE="prometheus"
;;
yum)
$PKG_INSTALL epel-release
$PKG_INSTALL prometheus2 node_exporter grafana
PROMETHEUS_SERVICE="prometheus"
;;
esac
success "Monitoring components installed"
}
configure_prometheus() {
info "[8/12] Configuring Prometheus..."
# Create rules directory
mkdir -p $PROMETHEUS_RULES
chmod 755 $PROMETHEUS_RULES
# Configure Prometheus
cat > $PROMETHEUS_CONFIG << EOF
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "$PROMETHEUS_RULES/*.yml"
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:$PROMETHEUS_WEB_PORT']
- job_name: 'scylla'
static_configs:
- targets: [$(echo "$CLUSTER_IPS" | sed "s/,/:$PROMETHEUS_PORT','/g"):$PROMETHEUS_PORT']
scrape_interval: 10s
metrics_path: /metrics
- job_name: 'node-exporter'
static_configs:
- targets: [$(echo "$CLUSTER_IPS" | sed "s/,/:$NODE_EXPORTER_PORT','/g"):$NODE_EXPORTER_PORT']
scrape_interval: 15s
EOF
chmod 644 $PROMETHEUS_CONFIG
# Create alerting rules
cat > $PROMETHEUS_RULES/scylla.yml << EOF
groups:
- name: scylla.rules
rules:
- alert: ScyllaDBNodeDown
expr: up{job="scylla"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "ScyllaDB node {{ \$labels.instance }} is down"
description: "ScyllaDB node {{ \$labels.instance }} has been down for more than 1 minute."
- alert: ScyllaDBHighLatency
expr: scylla_storage_proxy_coordinator_read_latency{quantile="0.99"} > 100
for: 5m
labels:
severity: warning
annotations:
summary: "High read latency on {{ \$labels.instance }}"
description: "99th percentile read latency is {{ \$value }}ms on {{ \$labels.instance }}"
EOF
chmod 644 $PROMETHEUS_RULES/scylla.yml
success "Prometheus configured"
}
configure_firewall() {
info "[9/12] Configuring firewall..."
case "$FIREWALL_CMD" in
ufw)
ufw --force enable
ufw allow $PROMETHEUS_PORT/tcp
ufw allow $NODE_EXPORTER_PORT/tcp
ufw allow $PROMETHEUS_WEB_PORT/tcp
ufw allow 3000/tcp # Grafana
ufw allow 7000/tcp # ScyllaDB inter-node
ufw allow 9042/tcp # ScyllaDB CQL
;;
firewall-cmd)
systemctl enable firewalld
systemctl start firewalld
firewall-cmd --permanent --add-port=$PROMETHEUS_PORT/tcp
firewall-cmd --permanent --add-port=$NODE_EXPORTER_PORT/tcp
firewall-cmd --permanent --add-port=$PROMETHEUS_WEB_PORT/tcp
firewall-cmd --permanent --add-port=3000/tcp
firewall-cmd --permanent --add-port=7000/tcp
firewall-cmd --permanent --add-port=9042/tcp
firewall-cmd --reload
;;
esac
success "Firewall configured"
}
start_services() {
info "[10/12] Starting services..."
# Start node exporter
systemctl enable node_exporter
systemctl start node_exporter
# Start Prometheus
systemctl enable $PROMETHEUS_SERVICE
systemctl start $PROMETHEUS_SERVICE
# Start Grafana
systemctl enable grafana-server
systemctl start grafana-server
success "Services started"
}
verify_installation() {
info "[11/12] Verifying installation..."
# Check ScyllaDB if installed
if [ "$MONITORING_ONLY" != "true" ]; then
if ! systemctl is-active --quiet scylla-server; then
error "ScyllaDB service is not running"
fi
fi
# Check monitoring services if installed
if [ "$SCYLLA_ONLY" != "true" ]; then
if ! systemctl is-active --quiet node_exporter; then
error "Node exporter service is not running"
fi
if ! systemctl is-active --quiet $PROMETHEUS_SERVICE; then
error "Prometheus service is not running"
fi
if ! systemctl is-active --quiet grafana-server; then
error "Grafana service is not running"
fi
fi
success "All services are running"
}
show_summary() {
info "[12/12] Installation completed!"
echo ""
success "ScyllaDB Monitoring Stack installed successfully!"
echo ""
echo "Access URLs:"
echo " Prometheus: http://$LOCAL_IP:$PROMETHEUS_WEB_PORT"
echo " Grafana: http://$LOCAL_IP:3000 (admin/admin)"
echo ""
echo "ScyllaDB Endpoints:"
echo " CQL: $LOCAL_IP:9042"
echo " Metrics: $LOCAL_IP:$PROMETHEUS_PORT/metrics"
echo ""
warning "Remember to:"
warning "1. Change Grafana admin password"
warning "2. Import ScyllaDB dashboards in Grafana"
warning "3. Configure alerting in Prometheus"
}
# Parse command line arguments
CLUSTER_IPS=""
LOCAL_IP=""
MONITORING_ONLY="false"
SCYLLA_ONLY="false"
while [[ $# -gt 0 ]]; do
case $1 in
--cluster-ips)
CLUSTER_IPS="$2"
shift 2
;;
--local-ip)
LOCAL_IP="$2"
shift 2
;;
--monitoring-only)
MONITORING_ONLY="true"
shift
;;
--scylla-only)
SCYLLA_ONLY="true"
shift
;;
--help)
usage
;;
*)
error "Unknown option: $1"
;;
esac
done
if [ -z "$CLUSTER_IPS" ] || [ -z "$LOCAL_IP" ]; then
usage
fi
# Main execution
check_prerequisites
detect_distro
update_system
if [ "$MONITORING_ONLY" != "true" ]; then
install_scylladb
configure_scylladb
setup_scylladb
fi
if [ "$SCYLLA_ONLY" != "true" ]; then
install_monitoring
configure_prometheus
fi
configure_firewall
start_services
verify_installation
show_summary
Review the script before running. Execute with: bash install.sh