Set up multiple Ollama instances behind HAProxy load balancer for high-availability AI model serving with health checks, SSL termination, and automatic failover to eliminate single points of failure.
Prerequisites
- Multiple servers (minimum 3: 1 for HAProxy, 2 for Ollama instances)
- Root or sudo access
- Basic understanding of HTTP load balancing
- At least 8GB RAM per Ollama instance
What this solves
Running a single Ollama instance creates a single point of failure for AI model serving. When that instance goes down, your entire AI service becomes unavailable. This tutorial shows you how to deploy multiple Ollama instances behind HAProxy load balancer for high availability, automatic failover, and improved performance through load distribution.
Step-by-step installation
Update system packages
Start by updating your package manager to ensure you have the latest versions available.
sudo apt update && sudo apt upgrade -y
Install HAProxy load balancer
HAProxy will distribute requests across multiple Ollama instances and handle health checks for automatic failover.
sudo apt install -y haproxy
Install Ollama on multiple instances
Download and install Ollama on each server that will serve AI models. Run this on each target server.
curl -fsSL https://ollama.ai/install.sh | sh
Configure Ollama service binding
Configure each Ollama instance to bind to all interfaces instead of localhost only. This allows HAProxy to connect from other servers.
sudo mkdir -p /etc/systemd/system/ollama.service.d
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
Start Ollama services
Enable and start Ollama on each server instance. Check the status to confirm successful startup.
sudo systemctl daemon-reload
sudo systemctl enable --now ollama
sudo systemctl status ollama
Download AI models on each instance
Pull the same AI model on each Ollama instance to ensure consistent responses across all backends.
ollama pull llama3.2:3b
ollama pull codellama:7b
Configure HAProxy load balancer
Create HAProxy configuration with multiple Ollama backend servers, health checks, and load balancing algorithms.
global
log 127.0.0.1:514 local0
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option forwardfor
option redispatch
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout http-keep-alive 10s
timeout check 10s
maxconn 3000
frontend ollama_frontend
bind *:8080
bind *:8443 ssl crt /etc/ssl/certs/ollama.pem
redirect scheme https if !{ ssl_fc }
default_backend ollama_backend
backend ollama_backend
balance roundrobin
option httpchk GET /api/tags
http-check expect status 200
server ollama1 203.0.113.10:11434 check inter 30s fall 3 rise 2
server ollama2 203.0.113.11:11434 check inter 30s fall 3 rise 2
server ollama3 203.0.113.12:11434 check inter 30s fall 3 rise 2
listen stats
bind *:8404
stats enable
stats uri /stats
stats refresh 30s
stats admin if TRUE
Generate SSL certificates
Create SSL certificates for secure HTTPS connections to the load balancer. Replace example.com with your actual domain.
sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
-keyout /etc/ssl/private/ollama.key \
-out /etc/ssl/certs/ollama.crt \
-subj "/C=US/ST=State/L=City/O=Organization/CN=example.com"
sudo cat /etc/ssl/certs/ollama.crt /etc/ssl/private/ollama.key | sudo tee /etc/ssl/certs/ollama.pem
Configure firewall rules
Open necessary ports for HAProxy frontend and Ollama backend communication.
sudo ufw allow 8080/tcp
sudo ufw allow 8443/tcp
sudo ufw allow 8404/tcp
sudo ufw allow 11434/tcp
Start HAProxy service
Enable and start HAProxy to begin load balancing across Ollama instances.
sudo systemctl enable --now haproxy
sudo systemctl status haproxy
Configure health check monitoring
Set up systemd service for monitoring Ollama instance health and automatic restart on failures.
[Unit]
Description=Ollama Health Check Service
After=network.target
[Service]
Type=oneshot
ExecStart=/usr/local/bin/ollama-health-check.sh
User=ollama
Group=ollama
[Unit]
Description=Run Ollama Health Check Every 60 Seconds
Requires=ollama-health-check.service
[Timer]
OnCalendar=::0/60
Persistent=true
[Install]
WantedBy=timers.target
Create health check script
Create monitoring script to check Ollama service health and restart if needed.
#!/bin/bash
OLLAMA_URL="http://localhost:11434/api/tags"
HEALTH_CHECK_TIMEOUT=10
MAX_RESTART_ATTEMPTS=3
RESTART_COUNT_FILE="/tmp/ollama_restart_count"
Function to check Ollama health
check_ollama_health() {
curl -s --max-time $HEALTH_CHECK_TIMEOUT "$OLLAMA_URL" > /dev/null 2>&1
return $?
}
Function to get restart count
get_restart_count() {
if [[ -f "$RESTART_COUNT_FILE" ]]; then
cat "$RESTART_COUNT_FILE"
else
echo 0
fi
}
Function to increment restart count
increment_restart_count() {
local count=$(get_restart_count)
echo $((count + 1)) > "$RESTART_COUNT_FILE"
}
Function to reset restart count
reset_restart_count() {
echo 0 > "$RESTART_COUNT_FILE"
}
Main health check logic
if check_ollama_health; then
echo "$(date): Ollama is healthy"
reset_restart_count
exit 0
else
echo "$(date): Ollama health check failed"
restart_count=$(get_restart_count)
if [[ $restart_count -lt $MAX_RESTART_ATTEMPTS ]]; then
echo "$(date): Attempting to restart Ollama (attempt $((restart_count + 1))/$MAX_RESTART_ATTEMPTS)"
systemctl restart ollama
increment_restart_count
# Wait a moment and check again
sleep 30
if check_ollama_health; then
echo "$(date): Ollama restart successful"
reset_restart_count
exit 0
else
echo "$(date): Ollama restart failed"
exit 1
fi
else
echo "$(date): Maximum restart attempts reached. Manual intervention required."
exit 1
fi
fi
sudo chmod 755 /usr/local/bin/ollama-health-check.sh
sudo systemctl enable --now ollama-health-check.timer
Configure SSL termination and security
Enable SSL security headers
Add security headers to HAProxy configuration for enhanced HTTPS security.
sudo cp /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.backup
# Add these lines to the frontend ollama_frontend section
frontend ollama_frontend
bind *:8080
bind *:8443 ssl crt /etc/ssl/certs/ollama.pem
redirect scheme https if !{ ssl_fc }
# Security headers
http-response set-header Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
http-response set-header X-Frame-Options "DENY"
http-response set-header X-Content-Type-Options "nosniff"
http-response set-header X-XSS-Protection "1; mode=block"
http-response set-header Referrer-Policy "strict-origin-when-cross-origin"
# Rate limiting
stick-table type ip size 100k expire 30s store http_req_rate(10s)
http-request track-sc0 src
http-request reject if { sc_http_req_rate(0) gt 20 }
default_backend ollama_backend
Configure backend connection limits
Set connection limits and timeouts for better resource management and security.
# Update backend ollama_backend section
backend ollama_backend
balance roundrobin
option httpchk GET /api/tags
http-check expect status 200
# Connection limits
timeout server 60s
timeout connect 5s
server ollama1 203.0.113.10:11434 check inter 30s fall 3 rise 2 maxconn 50
server ollama2 203.0.113.11:11434 check inter 30s fall 3 rise 2 maxconn 50
server ollama3 203.0.113.12:11434 check inter 30s fall 3 rise 2 maxconn 50
Restart HAProxy with new configuration
Apply the updated configuration and verify HAProxy starts successfully.
sudo haproxy -f /etc/haproxy/haproxy.cfg -c
sudo systemctl restart haproxy
sudo systemctl status haproxy
Monitor performance and implement scaling
Set up HAProxy statistics monitoring
Configure detailed statistics collection for monitoring load balancer performance and backend health.
# Update the stats section with authentication
listen stats
bind *:8404
stats enable
stats uri /stats
stats refresh 10s
stats admin if TRUE
stats auth admin:secure_password_here
stats show-legends
stats show-desc "Ollama Load Balancer Statistics"
Create performance monitoring script
Set up automated monitoring script to track response times and queue depths.
#!/bin/bash
LOG_FILE="/var/log/ollama-performance.log"
METRICS_FILE="/var/log/ollama-metrics.json"
ALERT_THRESHOLD_MS=5000
EMAIL_ALERT="admin@example.com"
Function to test response time
test_response_time() {
local url="$1"
local start_time=$(date +%s%N)
curl -s -o /dev/null -w "%{http_code}" --max-time 30 "$url" > /dev/null 2>&1
local exit_code=$?
local end_time=$(date +%s%N)
local response_time_ms=$(( (end_time - start_time) / 1000000 ))
echo "$response_time_ms,$exit_code"
}
Test each backend directly
backends=("203.0.113.10:11434" "203.0.113.11:11434" "203.0.113.12:11434")
loadbalancer="localhost:8080"
Test load balancer
lb_result=$(test_response_time "http://$loadbalancer/api/tags")
lb_time=$(echo "$lb_result" | cut -d',' -f1)
lb_status=$(echo "$lb_result" | cut -d',' -f2)
echo "$(date): LoadBalancer response_time=${lb_time}ms status=$lb_status" >> "$LOG_FILE"
Test individual backends
for i in "${!backends[@]}"; do
backend="${backends[$i]}"
result=$(test_response_time "http://$backend/api/tags")
time=$(echo "$result" | cut -d',' -f1)
status=$(echo "$result" | cut -d',' -f2)
echo "$(date): Backend$((i+1)) ($backend) response_time=${time}ms status=$status" >> "$LOG_FILE"
# Alert on slow responses
if [[ $time -gt $ALERT_THRESHOLD_MS ]]; then
echo "ALERT: Backend$((i+1)) slow response: ${time}ms" | mail -s "Ollama Performance Alert" "$EMAIL_ALERT"
fi
done
Create JSON metrics for external monitoring
cat > "$METRICS_FILE" << EOF
{
"timestamp": "$(date -Iseconds)",
"loadbalancer": {
"response_time_ms": $lb_time,
"status_code": $lb_status
},
"backends": [
EOF
for i in "${!backends[@]}"; do
backend="${backends[$i]}"
result=$(test_response_time "http://$backend/api/tags")
time=$(echo "$result" | cut -d',' -f1)
status=$(echo "$result" | cut -d',' -f2)
echo " {" >> "$METRICS_FILE"
echo " \"name\": \"backend$((i+1))\"," >> "$METRICS_FILE"
echo " \"address\": \"$backend\"," >> "$METRICS_FILE"
echo " \"response_time_ms\": $time," >> "$METRICS_FILE"
echo " \"status_code\": $status" >> "$METRICS_FILE"
if [[ $i -eq $((${#backends[@]} - 1)) ]]; then
echo " }" >> "$METRICS_FILE"
else
echo " }," >> "$METRICS_FILE"
fi
done
echo " ]" >> "$METRICS_FILE"
echo "}" >> "$METRICS_FILE"
sudo chmod 755 /usr/local/bin/ollama-performance-monitor.sh
Configure log rotation
Set up log rotation to prevent performance and metrics logs from consuming too much disk space.
/var/log/ollama-performance.log {
daily
rotate 30
compress
delaycompress
missingok
notifempty
copytruncate
}
/var/log/ollama-metrics.json {
daily
rotate 7
compress
delaycompress
missingok
notifempty
copytruncate
}
Verify your setup
Test the load balancer functionality and verify all components are working correctly.
# Check HAProxy status
sudo systemctl status haproxy
Test load balancer endpoint
curl -k https://localhost:8443/api/tags
Check backend health via HAProxy stats
curl -u admin:secure_password_here http://localhost:8404/stats
Test model inference through load balancer
curl -k -X POST https://localhost:8443/api/generate -d '{
"model": "llama3.2:3b",
"prompt": "What is load balancing?",
"stream": false
}'
Check individual Ollama instances
for port in 11434; do
echo "Testing direct connection to Ollama on port $port:"
curl -s http://localhost:$port/api/tags | jq '.models[].name'
done
Verify SSL certificate
openssl s_client -connect localhost:8443 -servername example.com < /dev/null
Common issues
| Symptom | Cause | Fix |
|---|---|---|
| 502 Bad Gateway errors | Ollama instances not responding | Check sudo systemctl status ollama on backend servers |
| SSL certificate errors | Invalid or expired certificates | Regenerate certificates with correct domain: openssl req -x509 -nodes -days 365... |
| Health checks failing | Ollama not binding to 0.0.0.0 | Verify OLLAMA_HOST=0.0.0.0:11434 in systemd override |
| Connection refused on 8080/8443 | HAProxy not started or firewall blocking | sudo systemctl restart haproxy && sudo ufw allow 8080,8443/tcp |
| Models not found errors | Different models on different instances | Run ollama pull model_name on all backend servers |
| High response times | Insufficient resources or overloading | Scale up backend instances or adjust maxconn limits |
Next steps
- Monitor HAProxy and Consul with Prometheus and Grafana dashboards for advanced monitoring setup
- Implement HAProxy rate limiting and DDoS protection with advanced security rules for production security
- Set up Ollama GPU acceleration with Kubernetes cluster for high-performance AI model serving
- Configure Ollama model caching with Redis cluster to improve response times
- Implement Ollama auto-scaling with Kubernetes HPA for dynamic resource allocation
Automated install script
Run this to automate the entire setup
#!/usr/bin/env bash
set -euo pipefail
# Colors for output
readonly RED='\033[0;31m'
readonly GREEN='\033[0;32m'
readonly YELLOW='\033[1;33m'
readonly NC='\033[0m'
# Usage
usage() {
echo "Usage: $0 [options]"
echo "Options:"
echo " -d DOMAIN Domain name for SSL certificate (default: localhost)"
echo " -s SERVERS Comma-separated list of Ollama server IPs (default: 127.0.0.1)"
echo " -h Show this help"
exit 1
}
# Parse arguments
DOMAIN="localhost"
SERVERS="127.0.0.1"
while getopts "d:s:h" opt; do
case $opt in
d) DOMAIN="$OPTARG" ;;
s) SERVERS="$OPTARG" ;;
h) usage ;;
*) usage ;;
esac
done
# Check root/sudo
if [[ $EUID -ne 0 ]]; then
echo -e "${RED}This script must be run as root or with sudo${NC}"
exit 1
fi
# Detect distribution
if [[ ! -f /etc/os-release ]]; then
echo -e "${RED}/etc/os-release not found. Cannot detect distribution${NC}"
exit 1
fi
. /etc/os-release
case "$ID" in
ubuntu|debian)
PKG_MGR="apt"
PKG_INSTALL="apt install -y"
PKG_UPDATE="apt update && apt upgrade -y"
FIREWALL_CMD="ufw"
;;
almalinux|rocky|centos|rhel|ol|fedora)
PKG_MGR="dnf"
PKG_INSTALL="dnf install -y"
PKG_UPDATE="dnf update -y"
FIREWALL_CMD="firewall-cmd"
;;
amzn)
PKG_MGR="yum"
PKG_INSTALL="yum install -y"
PKG_UPDATE="yum update -y"
FIREWALL_CMD="firewall-cmd"
;;
*)
echo -e "${RED}Unsupported distribution: $ID${NC}"
exit 1
;;
esac
# Cleanup function
cleanup() {
echo -e "${YELLOW}Cleaning up on error...${NC}"
systemctl stop haproxy 2>/dev/null || true
systemctl stop ollama 2>/dev/null || true
}
trap cleanup ERR
echo -e "${GREEN}[1/9] Updating system packages...${NC}"
$PKG_UPDATE
echo -e "${GREEN}[2/9] Installing HAProxy...${NC}"
$PKG_INSTALL haproxy curl openssl
echo -e "${GREEN}[3/9] Installing Ollama...${NC}"
curl -fsSL https://ollama.ai/install.sh | sh
echo -e "${GREEN}[4/9] Configuring Ollama service...${NC}"
mkdir -p /etc/systemd/system/ollama.service.d
cat > /etc/systemd/system/ollama.service.d/override.conf << 'EOF'
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
EOF
systemctl daemon-reload
systemctl enable ollama
systemctl start ollama
echo -e "${GREEN}[5/9] Downloading AI models...${NC}"
sleep 10
ollama pull llama3.2:3b || echo -e "${YELLOW}Warning: Failed to pull llama3.2:3b${NC}"
echo -e "${GREEN}[6/9] Generating SSL certificates...${NC}"
openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
-keyout /etc/ssl/private/ollama.key \
-out /etc/ssl/certs/ollama.crt \
-subj "/C=US/ST=State/L=City/O=Organization/CN=$DOMAIN"
chmod 600 /etc/ssl/private/ollama.key
chmod 644 /etc/ssl/certs/ollama.crt
cat /etc/ssl/certs/ollama.crt /etc/ssl/private/ollama.key > /etc/ssl/certs/ollama.pem
chmod 600 /etc/ssl/certs/ollama.pem
chown haproxy:haproxy /etc/ssl/certs/ollama.pem
echo -e "${GREEN}[7/9] Configuring HAProxy...${NC}"
cp /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.backup
# Generate backend servers configuration
BACKEND_CONFIG=""
COUNTER=1
IFS=',' read -ra SERVER_ARRAY <<< "$SERVERS"
for server in "${SERVER_ARRAY[@]}"; do
server=$(echo "$server" | xargs)
BACKEND_CONFIG+="\n server ollama$COUNTER $server:11434 check inter 30s fall 3 rise 2"
((COUNTER++))
done
cat > /etc/haproxy/haproxy.cfg << EOF
global
log 127.0.0.1:514 local0
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option forwardfor
option redispatch
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout http-keep-alive 10s
timeout check 10s
maxconn 3000
frontend ollama_frontend
bind *:8080
bind *:8443 ssl crt /etc/ssl/certs/ollama.pem
redirect scheme https if !{ ssl_fc }
default_backend ollama_backend
backend ollama_backend
balance roundrobin
option httpchk GET /api/tags
http-check expect status 200$(echo -e "$BACKEND_CONFIG")
listen stats
bind *:8404
stats enable
stats uri /stats
stats refresh 30s
stats admin if TRUE
EOF
chmod 644 /etc/haproxy/haproxy.cfg
chown root:root /etc/haproxy/haproxy.cfg
echo -e "${GREEN}[8/9] Configuring firewall...${NC}"
if [[ "$FIREWALL_CMD" == "ufw" ]]; then
ufw --force enable 2>/dev/null || true
ufw allow 8080/tcp
ufw allow 8443/tcp
ufw allow 8404/tcp
ufw allow 11434/tcp
else
systemctl enable firewalld 2>/dev/null || true
systemctl start firewalld 2>/dev/null || true
firewall-cmd --permanent --add-port=8080/tcp
firewall-cmd --permanent --add-port=8443/tcp
firewall-cmd --permanent --add-port=8404/tcp
firewall-cmd --permanent --add-port=11434/tcp
firewall-cmd --reload
fi
echo -e "${GREEN}[9/9] Starting services...${NC}"
systemctl enable haproxy
systemctl start haproxy
echo -e "${GREEN}Verifying installation...${NC}"
sleep 5
if systemctl is-active --quiet ollama; then
echo -e "${GREEN}✓ Ollama is running${NC}"
else
echo -e "${RED}✗ Ollama failed to start${NC}"
exit 1
fi
if systemctl is-active --quiet haproxy; then
echo -e "${GREEN}✓ HAProxy is running${NC}"
else
echo -e "${RED}✗ HAProxy failed to start${NC}"
exit 1
fi
if curl -s http://localhost:8080/api/tags >/dev/null; then
echo -e "${GREEN}✓ Load balancer is responding${NC}"
else
echo -e "${YELLOW}⚠ Load balancer not responding (may need time to start)${NC}"
fi
echo -e "${GREEN}Installation completed successfully!${NC}"
echo -e "Access points:"
echo -e " HTTP: http://localhost:8080"
echo -e " HTTPS: https://localhost:8443"
echo -e " Stats: http://localhost:8404/stats"
echo -e " Direct Ollama: http://localhost:11434"
Review the script before running. Execute with: bash install.sh