Build a production-ready Grafana Enterprise cluster with PostgreSQL shared storage, HAProxy load balancing, and SSL encryption. Includes automated failover, session persistence, and comprehensive monitoring for enterprise observability platforms.
Prerequisites
- 3+ servers for Grafana instances
- 2+ servers for PostgreSQL cluster
- 1 server for HAProxy load balancer
- Valid Grafana Enterprise license
- SSL certificates for domain
What this solves
Grafana Enterprise high availability clustering eliminates single points of failure for your monitoring infrastructure. This setup provides automated failover, horizontal scaling, and persistent sessions across multiple Grafana instances using a shared PostgreSQL database and HAProxy load balancer.
Step-by-step configuration
Update system packages
Start by updating all servers in your cluster to ensure consistent package versions.
sudo apt update && sudo apt upgrade -y
Install PostgreSQL cluster
Set up a PostgreSQL cluster for shared Grafana data. Install on your designated database servers.
sudo apt install -y postgresql postgresql-contrib postgresql-client
sudo systemctl enable --now postgresql
Configure PostgreSQL for clustering
Enable streaming replication and configure authentication for the Grafana database cluster.
# Primary server configuration
listen_addresses = '*'
wal_level = replica
max_wal_senders = 3
max_replication_slots = 3
archive_mode = on
archive_command = 'cp %p /var/lib/postgresql/16/main/archive/%f'
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h '
log_checkpoints = on
log_connections = on
log_disconnections = on
Configure PostgreSQL authentication
Set up authentication rules for Grafana connections and replication users.
# Grafana database connections
host grafana grafana_user 203.0.113.10/32 md5
host grafana grafana_user 203.0.113.11/32 md5
host grafana grafana_user 203.0.113.12/32 md5
Replication connections
host replication replicator 203.0.113.20/32 md5
host replication replicator 203.0.113.21/32 md5
Create Grafana database and user
Set up the dedicated database and user account for Grafana Enterprise with proper privileges.
sudo -u postgres psql
CREATE DATABASE grafana;
CREATE USER grafana_user WITH ENCRYPTED PASSWORD 'secure_grafana_password_2024';
GRANT ALL PRIVILEGES ON DATABASE grafana TO grafana_user;
ALTER USER grafana_user CREATEDB;
\q
Set up PostgreSQL streaming replication
Configure a standby server for automatic failover. Run this on your secondary PostgreSQL server.
# On primary server - create replication user
sudo -u postgres psql
CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'replication_password_2024';
\q
On standby server - create base backup
sudo systemctl stop postgresql
sudo rm -rf /var/lib/postgresql/16/main/*
sudo -u postgres pg_basebackup -h 203.0.113.20 -D /var/lib/postgresql/16/main -U replicator -W -v -P
sudo -u postgres touch /var/lib/postgresql/16/main/standby.signal
Configure standby server
Set up recovery configuration for the PostgreSQL standby instance.
primary_conninfo = 'host=203.0.113.20 port=5432 user=replicator password=replication_password_2024'
restore_command = 'cp /var/lib/postgresql/16/main/archive/%f %p'
recovery_target_timeline = 'latest'
Install Grafana Enterprise
Install Grafana Enterprise on your application servers. You'll need a valid license key.
wget -q -O /usr/share/keyrings/grafana.key https://apt.grafana.com/gpg.key
echo "deb [signed-by=/usr/share/keyrings/grafana.key] https://apt.grafana.com stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list
sudo apt update
sudo apt install -y grafana-enterprise
Configure Grafana Enterprise for clustering
Set up each Grafana instance with PostgreSQL backend and clustering parameters. Use this config on all Grafana servers.
[server]
http_port = 3000
domain = grafana.example.com
root_url = https://grafana.example.com
serve_from_sub_path = false
[database]
type = postgres
host = 203.0.113.20:5432
name = grafana
user = grafana_user
password = secure_grafana_password_2024
ssl_mode = require
max_open_conn = 300
max_idle_conn = 300
conn_max_lifetime = 14400
[session]
provider = postgres
provider_config = user=grafana_user password=secure_grafana_password_2024 host=203.0.113.20 port=5432 dbname=grafana sslmode=require
cookie_secure = true
session_life_time = 86400
[security]
admin_user = admin
admin_password = secure_admin_password_2024
secret_key = very_long_random_secret_key_for_clustering_2024
cookie_secure = true
cookie_samesite = strict
strict_transport_security = true
[auth.anonymous]
enabled = false
[log]
mode = file
level = info
format = json
[alerting]
enabled = true
execute_alerts = true
[unified_alerting]
enabled = true
[enterprise]
license_path = /etc/grafana/license.jwt
Install and configure HAProxy
Set up HAProxy for load balancing across your Grafana Enterprise instances.
sudo apt install -y haproxy
Configure HAProxy load balancing
Set up sticky sessions and health checks for your Grafana Enterprise cluster.
global
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon
log stdout local0
ssl-default-bind-ciphers ECDHE+AESGCM:ECDHE+CHACHA20:DHE+AESGCM:DHE+CHACHA20:!aNULL:!SHA1:!AESCCM
ssl-default-bind-options no-sslv3 no-tlsv10 no-tlsv11
tune.ssl.default-dh-param 2048
defaults
mode http
log global
option httplog
option dontlognull
option log-health-checks
option forwardfor
option http-server-close
timeout connect 10s
timeout client 300s
timeout server 300s
timeout http-keep-alive 10s
timeout check 10s
maxconn 3000
default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100
frontend grafana_frontend
bind *:80
bind *:443 ssl crt /etc/ssl/certs/grafana.example.com.pem
redirect scheme https if !{ ssl_fc }
# Security headers
http-response set-header Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
http-response set-header X-Frame-Options "DENY"
http-response set-header X-Content-Type-Options "nosniff"
http-response set-header X-XSS-Protection "1; mode=block"
default_backend grafana_backend
backend grafana_backend
balance roundrobin
cookie GRAFANA_SESSION_ID prefix nocache
# Health check
option httpchk GET /api/health
http-check expect status 200
server grafana1 203.0.113.10:3000 check cookie grafana1 ssl verify none
server grafana2 203.0.113.11:3000 check cookie grafana2 ssl verify none
server grafana3 203.0.113.12:3000 check cookie grafana3 ssl verify none
listen stats
bind *:8404
stats enable
stats uri /stats
stats refresh 30s
stats admin if TRUE
Generate SSL certificates
Create SSL certificates for encrypted communication. For production, use Let's Encrypt or your CA.
# Generate self-signed certificate for testing
sudo mkdir -p /etc/ssl/certs
sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
-keyout /etc/ssl/certs/grafana.example.com.key \
-out /etc/ssl/certs/grafana.example.com.crt \
-subj "/C=US/ST=State/L=City/O=Organization/CN=grafana.example.com"
Combine for HAProxy
sudo cat /etc/ssl/certs/grafana.example.com.crt /etc/ssl/certs/grafana.example.com.key > /etc/ssl/certs/grafana.example.com.pem
sudo chmod 600 /etc/ssl/certs/grafana.example.com.pem
Configure firewall rules
Open required ports for cluster communication and monitoring access.
# HAProxy
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw allow 8404/tcp
PostgreSQL
sudo ufw allow from 203.0.113.0/24 to any port 5432
Grafana inter-node
sudo ufw allow from 203.0.113.0/24 to any port 3000
Initialize database schema
Run database migrations on one Grafana instance to set up the schema.
# On first Grafana server only
sudo systemctl start grafana-server
sudo systemctl enable grafana-server
Check logs to ensure successful startup
sudo journalctl -u grafana-server -f
Start all cluster services
Enable and start all services in the correct order across your cluster.
# PostgreSQL servers
sudo systemctl restart postgresql
sudo systemctl status postgresql
All Grafana servers
sudo systemctl enable --now grafana-server
sudo systemctl status grafana-server
HAProxy server
sudo systemctl enable --now haproxy
sudo systemctl status haproxy
Install Grafana Enterprise license
Upload your Enterprise license to enable clustering features on all nodes.
# Copy license file to each Grafana server
sudo cp /path/to/license.jwt /etc/grafana/license.jwt
sudo chown grafana:grafana /etc/grafana/license.jwt
sudo chmod 644 /etc/grafana/license.jwt
Restart to load license
sudo systemctl restart grafana-server
Configure monitoring and alerting
Enable HAProxy statistics
Configure monitoring for your load balancer to track cluster health and performance.
# Access HAProxy stats at http://your-haproxy-server:8404/stats
curl -s http://203.0.113.100:8404/stats | head -10
Set up database monitoring
Monitor PostgreSQL replication status and connection health.
# Check replication status on primary
sudo -u postgres psql -c "SELECT client_addr, state, sync_state FROM pg_stat_replication;"
Check replication status on standby
sudo -u postgres psql -c "SELECT pg_is_in_recovery(), pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn();"
Configure log aggregation
Set up centralized logging for troubleshooting cluster issues. This integrates with existing log management systems.
# HAProxy logs
$ModLoad imudp
$UDPServerRun 514
$UDPServerAddress 127.0.0.1
local0.* /var/log/haproxy.log
& stop
PostgreSQL logs
local0.* /var/log/postgresql/cluster.log
Verify your setup
# Check cluster status
curl -k https://grafana.example.com/api/health
curl -s http://203.0.113.100:8404/stats | grep -E "grafana[1-3]"
Test failover
sudo systemctl stop grafana-server # on one node
curl -k https://grafana.example.com/api/health # should still work
Check database connectivity
sudo -u postgres psql -h 203.0.113.20 -d grafana -c "SELECT count(*) FROM dashboard;"
Verify session persistence
curl -c cookies.txt -k https://grafana.example.com/login
curl -b cookies.txt -k https://grafana.example.com/api/user
Common issues
| Symptom | Cause | Fix |
|---|---|---|
| 502 Bad Gateway from HAProxy | Grafana instances down or unreachable | Check sudo systemctl status grafana-server and firewall rules |
| Database connection errors | PostgreSQL authentication or network issues | Verify pg_hba.conf rules and test with psql -h host -U user -d grafana |
| Session not persisting across nodes | Database session storage not configured | Check session provider config in grafana.ini and database connectivity |
| License activation failures | Invalid license file or permissions | Verify license file permissions and check journalctl -u grafana-server |
| SSL certificate errors | Certificate mismatch or expired | Check certificate validity with openssl x509 -in cert.pem -text -noout |
| PostgreSQL replication lag | Network latency or high write load | Monitor with SELECT pg_stat_replication and optimize wal settings |
Security hardening
Enable database SSL encryption
Configure PostgreSQL SSL for encrypted communication between Grafana and database.
# Generate SSL certificates for PostgreSQL
sudo -u postgres openssl req -new -x509 -days 365 -nodes -text -out /var/lib/postgresql/server.crt -keyout /var/lib/postgresql/server.key -subj "/CN=postgres.example.com"
sudo chmod 600 /var/lib/postgresql/server.key
sudo chmod 644 /var/lib/postgresql/server.crt
Configure authentication security
Implement additional security measures for production deployments.
[auth]
disable_login_form = false
disable_signout_menu = false
signout_redirect_url = https://grafana.example.com/login
oauth_auto_login = false
[auth.basic]
enabled = true
[security]
disable_gravatar = true
cookie_samesite = strict
allow_embedding = false
angular_support_enabled = false
[users]
allow_sign_up = false
allow_org_create = false
auto_assign_org = true
auto_assign_org_role = Viewer
Performance optimization
Optimize PostgreSQL for Grafana workload
Tune database parameters for optimal Grafana Enterprise performance.
# Memory settings
shared_buffers = '512MB'
effective_cache_size = '1GB'
work_mem = '16MB'
maintenance_work_mem = '128MB'
Connection settings
max_connections = 300
idle_in_transaction_session_timeout = 300000
Performance settings
random_page_cost = 1.1
effective_io_concurrency = 200
max_worker_processes = 8
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
Configure HAProxy performance tuning
Optimize load balancer settings for high-traffic Grafana deployments.
# Add to global section
nbproc 1
nbthread 4
cpu-map auto:1/1-4 0-3
ssl-server-verify none
Add to defaults section
timeout http-request 10s
timeout queue 1m
timeout tarpit 10s
compression algo gzip
compression type text/html text/plain text/css text/javascript application/javascript application/json
Next steps
- Configure Grafana LDAP authentication and role-based access control with Active Directory integration
- Set up Grafana Enterprise SSO authentication with LDAP, SAML, and OAuth2 integration
- Monitor PostgreSQL performance with Prometheus and Grafana dashboards
- Configure HAProxy multi-site SSL termination with SNI for secure load balancing
- Implement Grafana alerting with Prometheus and InfluxDB for comprehensive monitoring
Running this in production?
Automated install script
Run this to automate the entire setup
#!/usr/bin/env bash
set -euo pipefail
# Grafana Enterprise HA Cluster Installation Script
# Supports: Ubuntu, Debian, AlmaLinux, Rocky Linux, CentOS, RHEL
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'
# Default values
ROLE=""
PRIMARY_DB_IP=""
GRAFANA_DB_PASS=""
REPLICATION_PASS=""
usage() {
echo "Usage: $0 [OPTIONS]"
echo "Options:"
echo " --role [primary-db|standby-db|grafana|haproxy] Required: Server role"
echo " --primary-db-ip IP Primary DB IP (required for standby-db/grafana)"
echo " --grafana-db-pass PASSWORD Grafana database password"
echo " --replication-pass PASSWORD Replication user password"
echo ""
echo "Examples:"
echo " $0 --role primary-db --grafana-db-pass mypass --replication-pass replpass"
echo " $0 --role standby-db --primary-db-ip 10.0.1.10 --replication-pass replpass"
echo " $0 --role grafana --primary-db-ip 10.0.1.10 --grafana-db-pass mypass"
exit 1
}
log_info() { echo -e "${GREEN}[INFO]${NC} $1"; }
log_warn() { echo -e "${YELLOW}[WARN]${NC} $1"; }
log_error() { echo -e "${RED}[ERROR]${NC} $1"; }
cleanup() {
log_error "Installation failed. Check logs above for details."
exit 1
}
trap cleanup ERR
# Parse arguments
while [[ $# -gt 0 ]]; do
case $1 in
--role) ROLE="$2"; shift 2 ;;
--primary-db-ip) PRIMARY_DB_IP="$2"; shift 2 ;;
--grafana-db-pass) GRAFANA_DB_PASS="$2"; shift 2 ;;
--replication-pass) REPLICATION_PASS="$2"; shift 2 ;;
-h|--help) usage ;;
*) log_error "Unknown option: $1"; usage ;;
esac
done
[[ -z "$ROLE" ]] && { log_error "Role is required"; usage; }
# Detect distro and package manager
if [[ ! -f /etc/os-release ]]; then
log_error "Cannot detect OS distribution"
exit 1
fi
. /etc/os-release
case "$ID" in
ubuntu|debian)
PKG_MGR="apt"
PKG_UPDATE="apt update"
PKG_INSTALL="apt install -y"
PG_VERSION="14"
PG_DATA_DIR="/var/lib/postgresql/$PG_VERSION/main"
PG_CONFIG_DIR="/etc/postgresql/$PG_VERSION/main"
PG_USER="postgres"
;;
almalinux|rocky|centos|rhel|ol|fedora)
PKG_MGR="dnf"
PKG_UPDATE="dnf update -y"
PKG_INSTALL="dnf install -y"
PG_VERSION=""
PG_DATA_DIR="/var/lib/pgsql/data"
PG_CONFIG_DIR="/var/lib/pgsql/data"
PG_USER="postgres"
;;
amzn)
PKG_MGR="yum"
PKG_UPDATE="yum update -y"
PKG_INSTALL="yum install -y"
PG_DATA_DIR="/var/lib/pgsql/data"
PG_CONFIG_DIR="/var/lib/pgsql/data"
PG_USER="postgres"
;;
*)
log_error "Unsupported distribution: $ID"
exit 1
;;
esac
# Check prerequisites
if [[ $EUID -ne 0 ]]; then
log_error "This script must be run as root"
exit 1
fi
install_postgresql() {
echo "[1/6] Installing PostgreSQL..."
$PKG_UPDATE
if [[ "$ID" =~ ^(ubuntu|debian)$ ]]; then
$PKG_INSTALL postgresql postgresql-contrib postgresql-client
else
$PKG_INSTALL postgresql-server postgresql-contrib postgresql
postgresql-setup --initdb || true
fi
systemctl enable --now postgresql
log_info "PostgreSQL installed and started"
}
configure_primary_db() {
echo "[2/6] Configuring primary PostgreSQL..."
# Backup original config
cp "$PG_CONFIG_DIR/postgresql.conf" "$PG_CONFIG_DIR/postgresql.conf.backup"
# Configure PostgreSQL for replication
cat >> "$PG_CONFIG_DIR/postgresql.conf" << EOF
# Grafana HA Configuration
listen_addresses = '*'
wal_level = replica
max_wal_senders = 3
max_replication_slots = 3
archive_mode = on
archive_command = 'cp %p $PG_DATA_DIR/archive/%f'
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h '
log_checkpoints = on
log_connections = on
log_disconnections = on
EOF
# Create archive directory
mkdir -p "$PG_DATA_DIR/archive"
chown "$PG_USER:$PG_USER" "$PG_DATA_DIR/archive"
chmod 750 "$PG_DATA_DIR/archive"
systemctl restart postgresql
log_info "Primary PostgreSQL configured"
}
configure_pg_hba() {
echo "[3/6] Configuring PostgreSQL authentication..."
# Backup original pg_hba.conf
cp "$PG_CONFIG_DIR/pg_hba.conf" "$PG_CONFIG_DIR/pg_hba.conf.backup"
# Add Grafana and replication access
cat >> "$PG_CONFIG_DIR/pg_hba.conf" << EOF
# Grafana connections
host grafana grafana_user 0.0.0.0/0 md5
# Replication connections
host replication replicator 0.0.0.0/0 md5
EOF
systemctl reload postgresql
log_info "PostgreSQL authentication configured"
}
create_grafana_database() {
echo "[4/6] Creating Grafana database and user..."
[[ -z "$GRAFANA_DB_PASS" ]] && { log_error "Grafana database password required"; exit 1; }
sudo -u "$PG_USER" psql << EOF
CREATE DATABASE grafana;
CREATE USER grafana_user WITH ENCRYPTED PASSWORD '$GRAFANA_DB_PASS';
GRANT ALL PRIVILEGES ON DATABASE grafana TO grafana_user;
ALTER USER grafana_user CREATEDB;
EOF
log_info "Grafana database created"
}
create_replication_user() {
echo "[5/6] Creating replication user..."
[[ -z "$REPLICATION_PASS" ]] && { log_error "Replication password required"; exit 1; }
sudo -u "$PG_USER" psql << EOF
CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD '$REPLICATION_PASS';
EOF
log_info "Replication user created"
}
setup_standby_db() {
echo "[1/4] Setting up PostgreSQL standby..."
[[ -z "$PRIMARY_DB_IP" ]] && { log_error "Primary DB IP required for standby"; exit 1; }
[[ -z "$REPLICATION_PASS" ]] && { log_error "Replication password required"; exit 1; }
install_postgresql
systemctl stop postgresql
# Remove existing data
rm -rf "$PG_DATA_DIR"/*
# Create base backup
sudo -u "$PG_USER" PGPASSWORD="$REPLICATION_PASS" pg_basebackup -h "$PRIMARY_DB_IP" -D "$PG_DATA_DIR" -U replicator -v -P -W
# Create standby signal
sudo -u "$PG_USER" touch "$PG_DATA_DIR/standby.signal"
echo "[2/4] Configuring standby recovery..."
cat > "$PG_DATA_DIR/postgresql.auto.conf" << EOF
primary_conninfo = 'host=$PRIMARY_DB_IP port=5432 user=replicator password=$REPLICATION_PASS'
restore_command = 'cp $PG_DATA_DIR/archive/%f %p'
recovery_target_timeline = 'latest'
EOF
chown "$PG_USER:$PG_USER" "$PG_DATA_DIR/postgresql.auto.conf"
chmod 600 "$PG_DATA_DIR/postgresql.auto.conf"
systemctl start postgresql
log_info "PostgreSQL standby configured"
}
install_grafana() {
echo "[1/3] Installing Grafana Enterprise..."
[[ -z "$PRIMARY_DB_IP" ]] && { log_error "Primary DB IP required for Grafana"; exit 1; }
[[ -z "$GRAFANA_DB_PASS" ]] && { log_error "Grafana database password required"; exit 1; }
if [[ "$ID" =~ ^(ubuntu|debian)$ ]]; then
wget -q -O /usr/share/keyrings/grafana.key https://apt.grafana.com/gpg.key
echo "deb [signed-by=/usr/share/keyrings/grafana.key] https://apt.grafana.com stable main" > /etc/apt/sources.list.d/grafana.list
$PKG_UPDATE
$PKG_INSTALL grafana-enterprise
else
cat > /etc/yum.repos.d/grafana.repo << EOF
[grafana]
name=grafana
baseurl=https://rpm.grafana.com
repo_gpgcheck=1
enabled=1
gpgcheck=1
gpgkey=https://rpm.grafana.com/gpg.key
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
EOF
$PKG_INSTALL grafana-enterprise
fi
echo "[2/3] Configuring Grafana for PostgreSQL..."
cat > /etc/grafana/grafana.ini << EOF
[database]
type = postgres
host = $PRIMARY_DB_IP:5432
name = grafana
user = grafana_user
password = $GRAFANA_DB_PASS
[security]
admin_user = admin
admin_password = admin
[server]
http_port = 3000
[clustering]
enabled = true
EOF
chown grafana:grafana /etc/grafana/grafana.ini
chmod 640 /etc/grafana/grafana.ini
systemctl enable --now grafana-server
log_info "Grafana Enterprise installed and configured"
}
case "$ROLE" in
primary-db)
install_postgresql
configure_primary_db
configure_pg_hba
create_grafana_database
create_replication_user
echo "[6/6] Opening firewall for PostgreSQL..."
if command -v firewall-cmd &> /dev/null; then
firewall-cmd --permanent --add-port=5432/tcp
firewall-cmd --reload
elif command -v ufw &> /dev/null; then
ufw allow 5432/tcp
fi
log_info "Primary PostgreSQL database setup complete"
;;
standby-db)
setup_standby_db
echo "[3/4] Opening firewall for PostgreSQL..."
if command -v firewall-cmd &> /dev/null; then
firewall-cmd --permanent --add-port=5432/tcp
firewall-cmd --reload
elif command -v ufw &> /dev/null; then
ufw allow 5432/tcp
fi
echo "[4/4] Verifying replication status..."
sudo -u "$PG_USER" psql -c "SELECT pg_is_in_recovery();" || log_warn "Could not verify replication status"
log_info "Standby PostgreSQL database setup complete"
;;
grafana)
install_grafana
echo "[3/3] Opening firewall for Grafana..."
if command -v firewall-cmd &> /dev/null; then
firewall-cmd --permanent --add-port=3000/tcp
firewall-cmd --reload
elif command -v ufw &> /dev/null; then
ufw allow 3000/tcp
fi
log_info "Grafana Enterprise setup complete"
log_info "Access Grafana at http://$(hostname -I | awk '{print $1}'):3000"
;;
*)
log_error "Invalid role: $ROLE"
usage
;;
esac
log_info "Installation completed successfully!"
Review the script before running. Execute with: bash install.sh