Configure comprehensive monitoring for keepalived VRRP clusters using Prometheus metrics collection, alerting rules for failover events, and Grafana dashboards for high availability visualization.
Prerequisites
- Two servers for keepalived cluster
- One server for monitoring stack
- Basic networking knowledge
- Root access to all servers
What this solves
Keepalived provides high availability through VRRP (Virtual Router Redundancy Protocol) but lacks built-in monitoring capabilities. This tutorial sets up comprehensive monitoring for keepalived clusters using Prometheus to collect VRRP state metrics, create alerting rules for failover events, and build Grafana dashboards for real-time visualization of your high availability infrastructure.
Prerequisites
You'll need two servers for the keepalived cluster, plus monitoring infrastructure. Ensure you have root access and basic networking knowledge of VRRP concepts.
Step-by-step configuration
Install keepalived cluster
Set up keepalived on both cluster nodes to create a high availability pair with shared virtual IP addresses.
sudo apt update
sudo apt install -y keepalived
Configure primary keepalived node
Create the keepalived configuration for the primary node with VRRP instance and health checking.
global_defs {
router_id KEEPALIVED_PRIMARY
script_user keepalived_script
enable_script_security
}
vrrp_script chk_nginx {
script "/bin/curl -f http://localhost/ || exit 1"
interval 2
weight -2
fall 3
rise 2
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 110
advert_int 1
authentication {
auth_type PASS
auth_pass changeme123
}
virtual_ipaddress {
203.0.113.100/24 dev eth0
}
track_script {
chk_nginx
}
notify_master "/etc/keepalived/scripts/notify_master.sh"
notify_backup "/etc/keepalived/scripts/notify_backup.sh"
notify_fault "/etc/keepalived/scripts/notify_fault.sh"
}
Configure backup keepalived node
Set up the backup node with lower priority and same virtual IP configuration.
global_defs {
router_id KEEPALIVED_BACKUP
script_user keepalived_script
enable_script_security
}
vrrp_script chk_nginx {
script "/bin/curl -f http://localhost/ || exit 1"
interval 2
weight -2
fall 3
rise 2
}
vrrp_instance VI_1 {
state BACKUP
interface eth0
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass changeme123
}
virtual_ipaddress {
203.0.113.100/24 dev eth0
}
track_script {
chk_nginx
}
notify_master "/etc/keepalived/scripts/notify_master.sh"
notify_backup "/etc/keepalived/scripts/notify_backup.sh"
notify_fault "/etc/keepalived/scripts/notify_fault.sh"
}
Create keepalived notification scripts
Set up notification scripts that will update metrics files when VRRP state changes occur.
sudo mkdir -p /etc/keepalived/scripts
sudo mkdir -p /var/lib/prometheus/node-exporter
#!/bin/bash
echo "keepalived_vrrp_state{instance=\"VI_1\",state=\"master\"} 2" > /var/lib/prometheus/node-exporter/keepalived.prom
echo "keepalived_vrrp_priority{instance=\"VI_1\"} $(grep priority /etc/keepalived/keepalived.conf | awk '{print $2}')" >> /var/lib/prometheus/node-exporter/keepalived.prom
echo "keepalived_transitions_total{instance=\"VI_1\",type=\"master\"} $(date +%s)" >> /var/lib/prometheus/node-exporter/keepalived.prom
logger "Keepalived: Transitioned to MASTER state"
#!/bin/bash
echo "keepalived_vrrp_state{instance=\"VI_1\",state=\"backup\"} 1" > /var/lib/prometheus/node-exporter/keepalived.prom
echo "keepalived_vrrp_priority{instance=\"VI_1\"} $(grep priority /etc/keepalived/keepalived.conf | awk '{print $2}')" >> /var/lib/prometheus/node-exporter/keepalived.prom
echo "keepalived_transitions_total{instance=\"VI_1\",type=\"backup\"} $(date +%s)" >> /var/lib/prometheus/node-exporter/keepalived.prom
logger "Keepalived: Transitioned to BACKUP state"
#!/bin/bash
echo "keepalived_vrrp_state{instance=\"VI_1\",state=\"fault\"} 0" > /var/lib/prometheus/node-exporter/keepalived.prom
echo "keepalived_vrrp_priority{instance=\"VI_1\"} $(grep priority /etc/keepalived/keepalived.conf | awk '{print $2}')" >> /var/lib/prometheus/node-exporter/keepalived.prom
echo "keepalived_transitions_total{instance=\"VI_1\",type=\"fault\"} $(date +%s)" >> /var/lib/prometheus/node-exporter/keepalived.prom
logger "Keepalived: Transitioned to FAULT state"
Set script permissions and user
Create the keepalived script user and set proper permissions for security.
sudo useradd -r -s /bin/false keepalived_script
sudo chmod 755 /etc/keepalived/scripts/*.sh
sudo chown -R keepalived_script:keepalived_script /etc/keepalived/scripts
sudo chown -R prometheus:prometheus /var/lib/prometheus/node-exporter
Install Prometheus Node Exporter
Install Node Exporter to collect system metrics and expose keepalived custom metrics.
cd /tmp
wget https://github.com/prometheus/node_exporter/releases/download/v1.8.2/node_exporter-1.8.2.linux-amd64.tar.gz
tar -xzf node_exporter-1.8.2.linux-amd64.tar.gz
sudo cp node_exporter-1.8.2.linux-amd64/node_exporter /usr/local/bin/
sudo useradd -r -s /bin/false prometheus
Configure Node Exporter with text file collector
Enable the text file collector to read keepalived metrics from the notification scripts.
[Unit]
Description=Prometheus Node Exporter
After=network.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/node_exporter \
--collector.textfile.directory=/var/lib/prometheus/node-exporter \
--collector.systemd \
--collector.processes
Restart=always
[Install]
WantedBy=multi-user.target
Install Prometheus server
Set up Prometheus to scrape metrics from your keepalived cluster nodes.
cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.54.1/prometheus-2.54.1.linux-amd64.tar.gz
tar -xzf prometheus-2.54.1.linux-amd64.tar.gz
sudo cp prometheus-2.54.1.linux-amd64/prometheus /usr/local/bin/
sudo cp prometheus-2.54.1.linux-amd64/promtool /usr/local/bin/
sudo mkdir -p /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus
Configure Prometheus scraping
Configure Prometheus to collect metrics from both keepalived cluster nodes.
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "/etc/prometheus/rules/keepalived.yml"
alerting:
alertmanagers:
- static_configs:
- targets:
- "localhost:9093"
scrape_configs:
- job_name: 'keepalived-cluster'
static_configs:
- targets:
- '203.0.113.10:9100' # Primary node
- '203.0.113.11:9100' # Backup node
scrape_interval: 5s
metrics_path: '/metrics'
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
Create Prometheus alerting rules
Set up alerting rules to detect keepalived failover events and cluster issues.
sudo mkdir -p /etc/prometheus/rules
groups:
- name: keepalived.rules
rules:
- alert: KeepaliveFailover
expr: increase(keepalived_transitions_total[5m]) > 0
for: 0m
labels:
severity: warning
annotations:
summary: "Keepalived failover detected on {{ $labels.instance }}"
description: "Keepalived instance {{ $labels.instance }} has experienced a state transition in the last 5 minutes."
- alert: KeepaliveNoMaster
expr: sum(keepalived_vrrp_state == 2) == 0
for: 30s
labels:
severity: critical
annotations:
summary: "No keepalived master found in cluster"
description: "No keepalived instance is currently in MASTER state, indicating a split-brain or cluster failure."
- alert: KeepaliveMultipleMasters
expr: sum(keepalived_vrrp_state == 2) > 1
for: 30s
labels:
severity: critical
annotations:
summary: "Multiple keepalived masters detected"
description: "{{ $value }} keepalived instances are in MASTER state, indicating a split-brain condition."
- alert: KeepaliveInstanceDown
expr: up{job="keepalived-cluster"} == 0
for: 1m
labels:
severity: warning
annotations:
summary: "Keepalived node {{ $labels.instance }} is down"
description: "Cannot scrape metrics from keepalived node {{ $labels.instance }} for more than 1 minute."
- alert: KeepaliveHighFailoverRate
expr: rate(keepalived_transitions_total[1h]) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "High keepalived failover rate on {{ $labels.instance }}"
description: "Keepalived instance {{ $labels.instance }} is experiencing frequent state transitions ({{ $value }} per hour)."
- alert: KeepaliveFaultState
expr: keepalived_vrrp_state == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Keepalived instance {{ $labels.instance }} in FAULT state"
description: "Keepalived instance {{ $labels.instance }} has been in FAULT state for more than 1 minute."
Install Prometheus Alertmanager
Install Alertmanager to handle alert notifications from Prometheus.
cd /tmp
wget https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-0.27.0.linux-amd64.tar.gz
tar -xzf alertmanager-0.27.0.linux-amd64.tar.gz
sudo cp alertmanager-0.27.0.linux-amd64/alertmanager /usr/local/bin/
sudo mkdir -p /etc/alertmanager
Configure Alertmanager
Set up basic Alertmanager configuration for email notifications.
global:
smtp_smarthost: 'localhost:587'
smtp_from: 'alerts@example.com'
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'keepalived-alerts'
receivers:
- name: 'keepalived-alerts'
email_configs:
- to: 'admin@example.com'
subject: 'Keepalived Alert: {{ .GroupLabels.alertname }}'
body: |
{{ range .Alerts }}
Alert: {{ .Annotations.summary }}
Description: {{ .Annotations.description }}
Instance: {{ .Labels.instance }}
Severity: {{ .Labels.severity }}
{{ end }}
Create systemd services
Create systemd service files for Prometheus and Alertmanager.
[Unit]
Description=Prometheus Server
After=network.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries \
--web.listen-address=0.0.0.0:9090 \
--web.enable-lifecycle
Restart=always
[Install]
WantedBy=multi-user.target
[Unit]
Description=Prometheus Alertmanager
After=network.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/alertmanager \
--config.file=/etc/alertmanager/alertmanager.yml \
--storage.path=/var/lib/alertmanager/
Restart=always
[Install]
WantedBy=multi-user.target
Install Grafana
Install Grafana for visualizing keepalived cluster metrics and creating dashboards.
curl -fsSL https://packages.grafana.com/gpg.key | sudo gpg --dearmor -o /usr/share/keyrings/grafana.gpg
echo "deb [signed-by=/usr/share/keyrings/grafana.gpg] https://packages.grafana.com/oss/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt update
sudo apt install -y grafana
Configure Grafana data source
Add Prometheus as a data source in Grafana for accessing keepalived metrics.
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://localhost:9090
isDefault: true
editable: true
Create Grafana keepalived dashboard
Create a comprehensive dashboard for monitoring keepalived cluster status and metrics.
{
"dashboard": {
"id": null,
"title": "Keepalived Cluster Monitoring",
"tags": ["keepalived", "vrrp", "high-availability"],
"timezone": "browser",
"panels": [
{
"id": 1,
"title": "VRRP Instance States",
"type": "stat",
"targets": [
{
"expr": "keepalived_vrrp_state",
"legendFormat": "{{instance}} - {{state}}"
}
],
"fieldConfig": {
"defaults": {
"mappings": [
{"options": {"0": {"text": "FAULT", "color": "red"}}} ,
{"options": {"1": {"text": "BACKUP", "color": "yellow"}}},
{"options": {"2": {"text": "MASTER", "color": "green"}}}
]
}
},
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 0}
},
{
"id": 2,
"title": "Failover Events",
"type": "graph",
"targets": [
{
"expr": "increase(keepalived_transitions_total[5m])",
"legendFormat": "{{instance}} - {{type}}"
}
],
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 0}
},
{
"id": 3,
"title": "Instance Priorities",
"type": "graph",
"targets": [
{
"expr": "keepalived_vrrp_priority",
"legendFormat": "{{instance}}"
}
],
"gridPos": {"h": 8, "w": 24, "x": 0, "y": 8}
}
],
"time": {
"from": "now-1h",
"to": "now"
},
"refresh": "5s"
}
}
Set ownership and permissions
Configure proper ownership for all service directories and files.
sudo chown -R prometheus:prometheus /etc/prometheus /var/lib/prometheus
sudo chown -R prometheus:prometheus /etc/alertmanager
sudo mkdir -p /var/lib/alertmanager
sudo chown -R prometheus:prometheus /var/lib/alertmanager
sudo chown -R grafana:grafana /etc/grafana
Start all services
Enable and start keepalived, monitoring, and visualization services on all nodes.
# On both keepalived nodes
sudo systemctl daemon-reload
sudo systemctl enable --now keepalived node_exporter
On monitoring server
sudo systemctl enable --now prometheus alertmanager grafana-server
Configure firewall rules
Open necessary ports for monitoring communication between cluster nodes.
# On keepalived nodes
sudo ufw allow 9100/tcp # Node Exporter
sudo ufw allow from 224.0.0.0/8 # VRRP multicast
On monitoring server
sudo ufw allow 9090/tcp # Prometheus
sudo ufw allow 9093/tcp # Alertmanager
sudo ufw allow 3000/tcp # Grafana
Verify your setup
Test your keepalived cluster monitoring by checking service status and triggering failover events.
# Check keepalived status on both nodes
sudo systemctl status keepalived
ip addr show # Look for virtual IP
Check monitoring services
sudo systemctl status prometheus alertmanager grafana-server node_exporter
Test Prometheus targets
curl http://localhost:9090/api/v1/targets
Test Prometheus metrics
curl "http://localhost:9090/api/v1/query?query=keepalived_vrrp_state"
Check Grafana dashboard access
curl -I http://localhost:3000
Test keepalived failover
sudo systemctl stop keepalived # On master node
ip addr show # Verify VIP moved to backup
Common issues
| Symptom | Cause | Fix |
|---|---|---|
| Split-brain condition | Network partition or authentication mismatch | Check network connectivity and verify auth_pass matches on both nodes |
| Virtual IP not moving | Priority misconfiguration or script failures | Check priority values and test health check scripts manually |
| Prometheus can't scrape metrics | Node Exporter not running or firewall blocking | Verify systemctl status node_exporter and check firewall rules |
| No keepalived metrics in Prometheus | Text file collector not configured | Ensure --collector.textfile.directory flag is set and scripts have write permissions |
| Grafana dashboard shows no data | Prometheus data source not configured | Check /etc/grafana/provisioning/datasources/ configuration and restart grafana |
| Alert notifications not working | Alertmanager configuration or SMTP issues | Check /etc/alertmanager/alertmanager.yml and test SMTP connectivity |
Next steps
Automated install script
Run this to automate the entire setup
#!/usr/bin/env bash
set -euo pipefail
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color
# Default values
NODE_TYPE=""
INTERFACE="eth0"
VIRTUAL_IP=""
PRIORITY=""
AUTH_PASS="changeme123"
# Usage function
usage() {
echo "Usage: $0 --node-type [primary|backup] --virtual-ip IP --interface INTERFACE [--priority NUM] [--auth-pass PASS]"
echo "Example: $0 --node-type primary --virtual-ip 203.0.113.100/24 --interface eth0"
exit 1
}
# Parse arguments
while [[ $# -gt 0 ]]; do
case $1 in
--node-type)
NODE_TYPE="$2"
shift 2
;;
--virtual-ip)
VIRTUAL_IP="$2"
shift 2
;;
--interface)
INTERFACE="$2"
shift 2
;;
--priority)
PRIORITY="$2"
shift 2
;;
--auth-pass)
AUTH_PASS="$2"
shift 2
;;
-h|--help)
usage
;;
*)
echo -e "${RED}Unknown option: $1${NC}"
usage
;;
esac
done
# Validate arguments
if [[ -z "$NODE_TYPE" || -z "$VIRTUAL_IP" ]]; then
echo -e "${RED}Error: --node-type and --virtual-ip are required${NC}"
usage
fi
if [[ "$NODE_TYPE" != "primary" && "$NODE_TYPE" != "backup" ]]; then
echo -e "${RED}Error: --node-type must be 'primary' or 'backup'${NC}"
usage
fi
# Set default priorities if not specified
if [[ -z "$PRIORITY" ]]; then
if [[ "$NODE_TYPE" == "primary" ]]; then
PRIORITY=110
else
PRIORITY=100
fi
fi
# Cleanup function
cleanup() {
echo -e "${RED}Installation failed. Cleaning up...${NC}"
systemctl stop keepalived 2>/dev/null || true
systemctl disable keepalived 2>/dev/null || true
}
trap cleanup ERR
# Check if running as root
if [[ $EUID -ne 0 ]]; then
echo -e "${RED}This script must be run as root${NC}"
exit 1
fi
# Detect distribution
echo -e "${YELLOW}[1/8] Detecting distribution...${NC}"
if [ -f /etc/os-release ]; then
. /etc/os-release
case "$ID" in
ubuntu|debian)
PKG_MGR="apt"
PKG_UPDATE="apt update"
PKG_INSTALL="apt install -y"
;;
almalinux|rocky|centos|rhel|ol|fedora)
PKG_MGR="dnf"
PKG_UPDATE="dnf update -y --refresh"
PKG_INSTALL="dnf install -y"
;;
amzn)
PKG_MGR="yum"
PKG_UPDATE="yum update -y"
PKG_INSTALL="yum install -y"
;;
*)
echo -e "${RED}Unsupported distribution: $ID${NC}"
exit 1
;;
esac
echo -e "${GREEN}Detected: $PRETTY_NAME${NC}"
else
echo -e "${RED}Cannot detect distribution${NC}"
exit 1
fi
# Update package repositories
echo -e "${YELLOW}[2/8] Updating package repositories...${NC}"
$PKG_UPDATE
# Install keepalived and required packages
echo -e "${YELLOW}[3/8] Installing keepalived and dependencies...${NC}"
$PKG_INSTALL keepalived curl
# Create keepalived user for scripts
echo -e "${YELLOW}[4/8] Creating keepalived script user...${NC}"
if ! id -u keepalived_script >/dev/null 2>&1; then
useradd -r -s /bin/false -d /var/empty keepalived_script
fi
# Create directories with proper permissions
echo -e "${YELLOW}[5/8] Creating directories and notification scripts...${NC}"
mkdir -p /etc/keepalived/scripts
mkdir -p /var/lib/prometheus/node-exporter
chown root:root /etc/keepalived/scripts
chmod 755 /etc/keepalived/scripts
chown nobody:nogroup /var/lib/prometheus/node-exporter
chmod 755 /var/lib/prometheus/node-exporter
# Create notification scripts
cat > /etc/keepalived/scripts/notify_master.sh << 'EOF'
#!/bin/bash
echo "keepalived_vrrp_state{instance=\"VI_1\",state=\"master\"} 2" > /var/lib/prometheus/node-exporter/keepalived.prom
echo "keepalived_vrrp_priority{instance=\"VI_1\"} $(grep priority /etc/keepalived/keepalived.conf | awk '{print $2}')" >> /var/lib/prometheus/node-exporter/keepalived.prom
echo "keepalived_transitions_total{instance=\"VI_1\",type=\"master\"} $(date +%s)" >> /var/lib/prometheus/node-exporter/keepalived.prom
logger "Keepalived: Transitioned to MASTER state"
EOF
cat > /etc/keepalived/scripts/notify_backup.sh << 'EOF'
#!/bin/bash
echo "keepalived_vrrp_state{instance=\"VI_1\",state=\"backup\"} 1" > /var/lib/prometheus/node-exporter/keepalived.prom
echo "keepalived_vrrp_priority{instance=\"VI_1\"} $(grep priority /etc/keepalived/keepalived.conf | awk '{print $2}')" >> /var/lib/prometheus/node-exporter/keepalived.prom
echo "keepalived_transitions_total{instance=\"VI_1\",type=\"backup\"} $(date +%s)" >> /var/lib/prometheus/node-exporter/keepalived.prom
logger "Keepalived: Transitioned to BACKUP state"
EOF
cat > /etc/keepalived/scripts/notify_fault.sh << 'EOF'
#!/bin/bash
echo "keepalived_vrrp_state{instance=\"VI_1\",state=\"fault\"} 0" > /var/lib/prometheus/node-exporter/keepalived.prom
echo "keepalived_vrrp_priority{instance=\"VI_1\"} $(grep priority /etc/keepalived/keepalived.conf | awk '{print $2}')" >> /var/lib/prometheus/node-exporter/keepalived.prom
echo "keepalived_transitions_total{instance=\"VI_1\",type=\"fault\"} $(date +%s)" >> /var/lib/prometheus/node-exporter/keepalived.prom
logger "Keepalived: Transitioned to FAULT state"
EOF
# Set proper permissions for scripts
chmod 755 /etc/keepalived/scripts/*.sh
chown root:root /etc/keepalived/scripts/*.sh
# Configure keepalived
echo -e "${YELLOW}[6/8] Creating keepalived configuration...${NC}"
if [[ "$NODE_TYPE" == "primary" ]]; then
STATE="MASTER"
ROUTER_ID="KEEPALIVED_PRIMARY"
else
STATE="BACKUP"
ROUTER_ID="KEEPALIVED_BACKUP"
fi
cat > /etc/keepalived/keepalived.conf << EOF
global_defs {
router_id $ROUTER_ID
script_user keepalived_script
enable_script_security
}
vrrp_script chk_nginx {
script "/bin/curl -f http://localhost/ || exit 1"
interval 2
weight -2
fall 3
rise 2
}
vrrp_instance VI_1 {
state $STATE
interface $INTERFACE
virtual_router_id 51
priority $PRIORITY
advert_int 1
authentication {
auth_type PASS
auth_pass $AUTH_PASS
}
virtual_ipaddress {
$VIRTUAL_IP dev $INTERFACE
}
track_script {
chk_nginx
}
notify_master "/etc/keepalived/scripts/notify_master.sh"
notify_backup "/etc/keepalived/scripts/notify_backup.sh"
notify_fault "/etc/keepalived/scripts/notify_fault.sh"
}
EOF
chown root:root /etc/keepalived/keepalived.conf
chmod 644 /etc/keepalived/keepalived.conf
# Configure firewall
echo -e "${YELLOW}[7/8] Configuring firewall...${NC}"
case "$ID" in
ubuntu|debian)
if command -v ufw >/dev/null 2>&1 && ufw status | grep -q "Status: active"; then
ufw allow 224.0.0.18
ufw allow from any to any port 112 proto vrrp
fi
;;
*)
if command -v firewall-cmd >/dev/null 2>&1 && systemctl is-active --quiet firewalld; then
firewall-cmd --permanent --add-rich-rule="rule protocol value='vrrp' accept"
firewall-cmd --permanent --add-rich-rule="rule destination address='224.0.0.18' accept"
firewall-cmd --reload
fi
;;
esac
# Enable and start keepalived
echo -e "${YELLOW}[8/8] Starting and enabling keepalived service...${NC}"
systemctl enable keepalived
systemctl start keepalived
# Verification
echo -e "${YELLOW}Verifying installation...${NC}"
sleep 3
if systemctl is-active --quiet keepalived; then
echo -e "${GREEN}✓ Keepalived service is running${NC}"
else
echo -e "${RED}✗ Keepalived service is not running${NC}"
exit 1
fi
if [[ -f /var/lib/prometheus/node-exporter/keepalived.prom ]]; then
echo -e "${GREEN}✓ Prometheus metrics file created${NC}"
else
echo -e "${YELLOW}⚠ Prometheus metrics file not yet created (will be created on state change)${NC}"
fi
echo -e "${GREEN}Installation completed successfully!${NC}"
echo -e "${YELLOW}Node type: $NODE_TYPE${NC}"
echo -e "${YELLOW}Virtual IP: $VIRTUAL_IP${NC}"
echo -e "${YELLOW}Interface: $INTERFACE${NC}"
echo -e "${YELLOW}Priority: $PRIORITY${NC}"
echo
echo "Next steps:"
echo "1. Configure the other node with opposite node-type"
echo "2. Set up Prometheus node-exporter to read from /var/lib/prometheus/node-exporter/"
echo "3. Configure Prometheus alerts for keepalived state changes"
echo "4. Create Grafana dashboards for visualization"
Review the script before running. Execute with: bash install.sh