Configure Apache Airflow high availability with CeleryExecutor and Redis clustering for production deployments

Advanced 45 min Apr 23, 2026 22 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Set up Apache Airflow with CeleryExecutor and Redis clustering for high availability production deployments. Configure multiple workers, load balancing, monitoring, and automated failover to handle enterprise-scale workflow orchestration with zero downtime.

Prerequisites

  • Root or sudo access
  • At least 4GB RAM
  • Python 3.8+ installed
  • PostgreSQL server access
  • Multiple server instances for full HA

What this solves

Apache Airflow with CeleryExecutor and Redis clustering provides high availability for production workflow orchestration. This configuration eliminates single points of failure by distributing task execution across multiple workers with automatic failover and load balancing. You'll set up a scalable Airflow deployment that can handle enterprise workloads with zero downtime.

Step-by-step configuration

Update system packages

Update your package manager to ensure you have the latest security patches and packages.

sudo apt update && sudo apt upgrade -y
sudo apt install -y python3-pip python3-venv redis-server postgresql postgresql-contrib nginx
sudo dnf update -y
sudo dnf install -y python3-pip python3-virtualenv redis postgresql-server postgresql-contrib nginx

Configure Redis cluster for high availability

Set up Redis cluster with multiple nodes for high availability message broking. Create three Redis instances on different ports for fault tolerance.

sudo mkdir -p /etc/redis/cluster/{7000,7001,7002}
sudo mkdir -p /var/lib/redis/{7000,7001,7002}
sudo mkdir -p /var/log/redis

Create Redis cluster configuration files

Configure each Redis node with cluster settings and persistence. Each node runs on a different port for distribution.

port 7000
cluster-enabled yes
cluster-config-file nodes-7000.conf
cluster-node-timeout 15000
appendonly yes
appendfilename "appendonly-7000.aof"
dir /var/lib/redis/7000
logfile /var/log/redis/redis-7000.log
bind 127.0.0.1 203.0.113.10
protected-mode yes
requirepass airflow-redis-pass
masterauth airflow-redis-pass
maxmemory 256mb
maxmemory-policy allkeys-lru

Create additional Redis node configurations

Set up the remaining Redis cluster nodes with unique ports and directories.

port 7001
cluster-enabled yes
cluster-config-file nodes-7001.conf
cluster-node-timeout 15000
appendonly yes
appendfilename "appendonly-7001.aof"
dir /var/lib/redis/7001
logfile /var/log/redis/redis-7001.log
bind 127.0.0.1 203.0.113.10
protected-mode yes
requirepass airflow-redis-pass
masterauth airflow-redis-pass
maxmemory 256mb
maxmemory-policy allkeys-lru
port 7002
cluster-enabled yes
cluster-config-file nodes-7002.conf
cluster-node-timeout 15000
appendonly yes
appendfilename "appendonly-7002.aof"
dir /var/lib/redis/7002
logfile /var/log/redis/redis-7002.log
bind 127.0.0.1 203.0.113.10
protected-mode yes
requirepass airflow-redis-pass
masterauth airflow-redis-pass
maxmemory 256mb
maxmemory-policy allkeys-lru

Set Redis directory permissions and ownership

Configure proper ownership and permissions for Redis directories. The redis user needs read/write access to data and log directories.

sudo chown -R redis:redis /var/lib/redis/
sudo chown -R redis:redis /var/log/redis/
sudo chown -R redis:redis /etc/redis/cluster/
sudo chmod 755 /var/lib/redis/{7000,7001,7002}
sudo chmod 644 /etc/redis/cluster/*/redis.conf
Never use chmod 777. It gives every user on the system full access to your files. Instead, fix ownership with chown and use minimal permissions.

Create Redis cluster systemd services

Create systemd service files for each Redis cluster node to manage them independently.

[Unit]
Description=Redis Cluster Node 7000
After=network.target

[Service]
Type=forking
User=redis
Group=redis
ExecStart=/usr/bin/redis-server /etc/redis/cluster/7000/redis.conf
ExecReload=/bin/kill -USR2 $MAINPID
TimeoutStopSec=0
Restart=always

[Install]
WantedBy=multi-user.target
[Unit]
Description=Redis Cluster Node 7001
After=network.target

[Service]
Type=forking
User=redis
Group=redis
ExecStart=/usr/bin/redis-server /etc/redis/cluster/7001/redis.conf
ExecReload=/bin/kill -USR2 $MAINPID
TimeoutStopSec=0
Restart=always

[Install]
WantedBy=multi-user.target
[Unit]
Description=Redis Cluster Node 7002
After=network.target

[Service]
Type=forking
User=redis
Group=redis
ExecStart=/usr/bin/redis-server /etc/redis/cluster/7002/redis.conf
ExecReload=/bin/kill -USR2 $MAINPID
TimeoutStopSec=0
Restart=always

[Install]
WantedBy=multi-user.target

Start Redis cluster nodes

Enable and start all Redis cluster nodes. Reload systemd to recognize the new service files.

sudo systemctl daemon-reload
sudo systemctl enable --now redis-cluster-7000 redis-cluster-7001 redis-cluster-7002
sudo systemctl status redis-cluster-7000 redis-cluster-7001 redis-cluster-7002

Initialize Redis cluster

Create the Redis cluster by connecting all nodes together. This establishes the cluster topology and enables automatic sharding.

redis-cli --cluster create 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 --cluster-replicas 0 -a airflow-redis-pass

Configure PostgreSQL for Airflow metadata

Set up PostgreSQL database for Airflow metadata storage. Initialize the database and create dedicated user and database.

sudo systemctl enable --now postgresql
sudo -u postgres psql -c "CREATE USER airflow WITH PASSWORD 'airflow-db-pass';"
sudo -u postgres psql -c "CREATE DATABASE airflow OWNER airflow;"
sudo -u postgres psql -c "GRANT ALL PRIVILEGES ON DATABASE airflow TO airflow;"
sudo postgresql-setup --initdb
sudo systemctl enable --now postgresql
sudo -u postgres psql -c "CREATE USER airflow WITH PASSWORD 'airflow-db-pass';"
sudo -u postgres psql -c "CREATE DATABASE airflow OWNER airflow;"
sudo -u postgres psql -c "GRANT ALL PRIVILEGES ON DATABASE airflow TO airflow;"

Create Airflow user and directories

Create dedicated system user for Airflow and set up directory structure with proper permissions.

sudo useradd -r -m -s /bin/bash airflow
sudo mkdir -p /opt/airflow/{dags,logs,plugins,config}
sudo mkdir -p /var/log/airflow
sudo chown -R airflow:airflow /opt/airflow
sudo chown -R airflow:airflow /var/log/airflow
sudo chmod 755 /opt/airflow/{dags,logs,plugins,config}

Install Airflow with CeleryExecutor

Install Apache Airflow with Celery and Redis dependencies in a Python virtual environment.

sudo -u airflow python3 -m venv /opt/airflow/venv
sudo -u airflow /opt/airflow/venv/bin/pip install --upgrade pip
sudo -u airflow /opt/airflow/venv/bin/pip install "apache-airflow[celery,redis,postgres]==2.8.1" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.8.1/constraints-3.11.txt"

Configure Airflow with CeleryExecutor

Create Airflow configuration file with CeleryExecutor, Redis clustering, and PostgreSQL backend settings.

[core]
dags_folder = /opt/airflow/dags
base_log_folder = /var/log/airflow
remote_logging = False
encrypt_s3_logs = False
logging_level = INFO
fab_logging_level = WARN
logging_config_class = 
executor = CeleryExecutor
sql_alchemy_conn = postgresql+psycopg2://airflow:airflow-db-pass@localhost:5432/airflow
sql_alchemy_pool_size = 10
sql_alchemy_pool_recycle = 3600
max_active_tasks_per_dag = 16
max_active_runs_per_dag = 16
default_task_retries = 3
parallelism = 32
max_active_tasks_per_dag = 16
default_timezone = utc

[database]
sql_alchemy_conn = postgresql+psycopg2://airflow:airflow-db-pass@localhost:5432/airflow
sql_alchemy_pool_size = 10
sql_alchemy_pool_recycle = 3600

[celery]
broker_url = redis://:airflow-redis-pass@127.0.0.1:7000/0
result_backend = redis://:airflow-redis-pass@127.0.0.1:7000/0
worker_concurrency = 4
worker_log_server_port = 8793
worker_enable_remote_control = True
worker_send_task_events = True
task_send_sent_event = True
operation_timeout = 10.0
pool_timeout = 120
visibility_timeout = 21600
stalk_retry_options = {"max_retries": 3, "countdown": 10, "max_retry_delay": 600}

[webserver]
base_url = http://localhost:8080
web_server_host = 127.0.0.1
web_server_port = 8080
web_server_worker_timeout = 120
workers = 4
secret_key = $(openssl rand -hex 32)

[scheduler]
scheduler_heartbeat_sec = 5
max_tis_per_query = 512
processor_poll_interval = 1
min_file_process_interval = 30
dag_dir_list_interval = 300
catchup_by_default = False

Set Airflow configuration file ownership

Ensure the Airflow user owns the configuration file and set secure permissions.

sudo chown airflow:airflow /opt/airflow/config/airflow.cfg
sudo chmod 640 /opt/airflow/config/airflow.cfg

Initialize Airflow database

Initialize the Airflow metadata database and create an admin user for the web interface.

sudo -u airflow AIRFLOW_CONFIG=/opt/airflow/config/airflow.cfg /opt/airflow/venv/bin/airflow db init
sudo -u airflow AIRFLOW_CONFIG=/opt/airflow/config/airflow.cfg /opt/airflow/venv/bin/airflow users create \
  --username admin \
  --firstname Admin \
  --lastname User \
  --role Admin \
  --email admin@example.com \
  --password secure-admin-password

Create systemd service for Airflow webserver

Create systemd service file for the Airflow webserver component with proper environment configuration.

[Unit]
Description=Airflow Webserver
After=network.target redis-cluster-7000.service redis-cluster-7001.service redis-cluster-7002.service postgresql.service
Wants=redis-cluster-7000.service redis-cluster-7001.service redis-cluster-7002.service postgresql.service

[Service]
Environment="AIRFLOW_CONFIG=/opt/airflow/config/airflow.cfg"
Environment="AIRFLOW_HOME=/opt/airflow"
User=airflow
Group=airflow
Type=notify
ExecStart=/opt/airflow/venv/bin/airflow webserver --port 8080 --workers 4
ExecReload=/bin/kill -s HUP $MAINPID
KillMode=mixed
TimeoutStopSec=24
Restart=on-failure
RestartSec=5
PrivateTmp=true
WorkingDirectory=/opt/airflow

[Install]
WantedBy=multi-user.target

Create systemd service for Airflow scheduler

Create systemd service for the Airflow scheduler that manages DAG execution and task scheduling.

[Unit]
Description=Airflow Scheduler
After=network.target redis-cluster-7000.service redis-cluster-7001.service redis-cluster-7002.service postgresql.service
Wants=redis-cluster-7000.service redis-cluster-7001.service redis-cluster-7002.service postgresql.service

[Service]
Environment="AIRFLOW_CONFIG=/opt/airflow/config/airflow.cfg"
Environment="AIRFLOW_HOME=/opt/airflow"
User=airflow
Group=airflow
Type=notify
ExecStart=/opt/airflow/venv/bin/airflow scheduler
Restart=on-failure
RestartSec=5
PrivateTmp=true
WorkingDirectory=/opt/airflow

[Install]
WantedBy=multi-user.target

Create systemd service for Celery workers

Create systemd service for Celery workers that execute Airflow tasks across multiple processes.

[Unit]
Description=Airflow Celery Worker
After=network.target redis-cluster-7000.service redis-cluster-7001.service redis-cluster-7002.service postgresql.service
Wants=redis-cluster-7000.service redis-cluster-7001.service redis-cluster-7002.service postgresql.service

[Service]
Environment="AIRFLOW_CONFIG=/opt/airflow/config/airflow.cfg"
Environment="AIRFLOW_HOME=/opt/airflow"
User=airflow
Group=airflow
Type=notify
ExecStart=/opt/airflow/venv/bin/airflow celery worker --concurrency 4
Restart=on-failure
RestartSec=5
KillMode=mixed
TimeoutStopSec=10
PrivateTmp=true
WorkingDirectory=/opt/airflow

[Install]
WantedBy=multi-user.target

Create systemd service for Celery flower monitoring

Create systemd service for Flower, which provides a web interface to monitor Celery workers and tasks.

[Unit]
Description=Airflow Celery Flower
After=network.target redis-cluster-7000.service redis-cluster-7001.service redis-cluster-7002.service
Wants=redis-cluster-7000.service redis-cluster-7001.service redis-cluster-7002.service

[Service]
Environment="AIRFLOW_CONFIG=/opt/airflow/config/airflow.cfg"
Environment="AIRFLOW_HOME=/opt/airflow"
User=airflow
Group=airflow
Type=notify
ExecStart=/opt/airflow/venv/bin/airflow celery flower --port 5555
Restart=on-failure
RestartSec=5
PrivateTmp=true
WorkingDirectory=/opt/airflow

[Install]
WantedBy=multi-user.target

Configure NGINX reverse proxy with load balancing

Set up NGINX as a reverse proxy with load balancing for multiple Airflow webserver instances.

upstream airflow_webserver {
    server 127.0.0.1:8080 max_fails=3 fail_timeout=30s;
    server 127.0.0.1:8081 max_fails=3 fail_timeout=30s backup;
}

upstream flower_monitoring {
    server 127.0.0.1:5555 max_fails=3 fail_timeout=30s;
}

server {
    listen 80;
    server_name airflow.example.com;
    client_max_body_size 16M;
    
    location / {
        proxy_pass http://airflow_webserver;
        proxy_set_header Host $http_host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_buffering off;
        proxy_request_buffering off;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_read_timeout 86400;
    }
}

server {
    listen 80;
    server_name flower.example.com;
    
    location / {
        proxy_pass http://flower_monitoring;
        proxy_set_header Host $http_host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}

Enable NGINX configuration and start services

Enable the NGINX configuration and start all Airflow services with proper dependencies.

sudo ln -s /etc/nginx/sites-available/airflow /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl restart nginx
sudo systemctl daemon-reload
sudo systemctl enable --now airflow-webserver airflow-scheduler airflow-worker airflow-flower

Configure firewall rules

Open necessary ports for Airflow web interface, Flower monitoring, and Redis cluster communication.

sudo ufw allow 80/tcp comment 'NGINX HTTP'
sudo ufw allow 8080/tcp comment 'Airflow Webserver'
sudo ufw allow 5555/tcp comment 'Celery Flower'
sudo ufw allow 7000:7002/tcp comment 'Redis Cluster'
sudo ufw reload
sudo firewall-cmd --permanent --add-port=80/tcp
sudo firewall-cmd --permanent --add-port=8080/tcp
sudo firewall-cmd --permanent --add-port=5555/tcp
sudo firewall-cmd --permanent --add-port=7000-7002/tcp
sudo firewall-cmd --reload

Create sample DAG for testing

Create a test DAG to verify the high availability setup and task distribution across workers.

from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.python import PythonOperator
from airflow.operators.bash import BashOperator
import socket
import time

def get_worker_info(**context):
    hostname = socket.gethostname()
    task_instance = context['task_instance']
    print(f"Task {task_instance.task_id} running on worker: {hostname}")
    time.sleep(10)  # Simulate work
    return f"Completed on {hostname}"

default_args = {
    'owner': 'airflow-admin',
    'depends_on_past': False,
    'start_date': datetime(2024, 1, 1),
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 2,
    'retry_delay': timedelta(minutes=1),
}

dag = DAG(
    'test_ha_celery_workers',
    default_args=default_args,
    description='Test DAG for HA Celery worker distribution',
    schedule_interval='@hourly',
    catchup=False,
    max_active_runs=3,
    max_active_tasks=8,
)

Create multiple parallel tasks to test worker distribution

for i in range(8): task = PythonOperator( task_id=f'worker_test_{i}', python_callable=get_worker_info, dag=dag, )

Add system info task

system_info = BashOperator( task_id='system_info', bash_command='echo "System: $(uname -a)" && echo "Redis cluster status:" && redis-cli -p 7000 -a airflow-redis-pass cluster nodes', dag=dag, )

Set DAG file ownership

Ensure the Airflow user owns the DAG file with proper read permissions.

sudo chown airflow:airflow /opt/airflow/dags/test_ha_dag.py
sudo chmod 644 /opt/airflow/dags/test_ha_dag.py

Configure monitoring and alerting

Install monitoring dependencies

Install Prometheus exporters and monitoring tools for comprehensive Airflow cluster monitoring.

sudo apt install -y prometheus-node-exporter prometheus-redis-exporter
sudo -u airflow /opt/airflow/venv/bin/pip install prometheus-client psutil
sudo dnf install -y golang
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xzf node_exporter-1.7.0.linux-amd64.tar.gz
sudo mv node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/
sudo -u airflow /opt/airflow/venv/bin/pip install prometheus-client psutil

Configure Airflow metrics collection

Enable Prometheus metrics collection in Airflow configuration for monitoring DAG performance and task execution.

sudo -u airflow tee -a /opt/airflow/config/airflow.cfg << 'EOF'

[metrics]
statsd_on = True
statsd_host = localhost
statsd_port = 9125
statsd_prefix = airflow

[prometheus]
enable_prometheus_endpoint = True
prometheus_host = 0.0.0.0
prometheus_port = 9090
EOF

Create health check script

Create automated health check script to monitor all Airflow components and Redis cluster status.

#!/bin/bash

Airflow HA Health Check Script

set -e echo "=== Airflow High Availability Health Check ===" echo "Timestamp: $(date)" echo

Check Redis cluster health

echo "Checking Redis cluster status..." for port in 7000 7001 7002; do if redis-cli -p $port -a airflow-redis-pass ping > /dev/null 2>&1; then echo "✓ Redis node $port: OK" else echo "✗ Redis node $port: FAILED" exit 1 fi done

Check Redis cluster nodes

echo echo "Redis cluster topology:" redis-cli -p 7000 -a airflow-redis-pass cluster nodes 2>/dev/null | grep -E "master|slave" || echo "Cluster not properly configured"

Check PostgreSQL connection

echo echo "Checking PostgreSQL connection..." if sudo -u airflow PGPASSWORD=airflow-db-pass psql -h localhost -U airflow -d airflow -c "SELECT 1;" > /dev/null 2>&1; then echo "✓ PostgreSQL: OK" else echo "✗ PostgreSQL: FAILED" exit 1 fi

Check Airflow services

echo echo "Checking Airflow services..." services=("airflow-webserver" "airflow-scheduler" "airflow-worker" "airflow-flower") for service in "${services[@]}"; do if systemctl is-active --quiet $service; then echo "✓ $service: OK" else echo "✗ $service: FAILED" systemctl status $service --no-pager fi done

Check web interface

echo echo "Checking web interfaces..." if curl -s http://localhost:8080/health > /dev/null 2>&1; then echo "✓ Airflow webserver: OK" else echo "✗ Airflow webserver: FAILED" fi if curl -s http://localhost:5555 > /dev/null 2>&1; then echo "✓ Celery Flower: OK" else echo "✗ Celery Flower: FAILED" fi echo echo "=== Health Check Complete ==="

Set health check script permissions

Make the health check script executable and set proper ownership for the airflow user.

sudo mkdir -p /opt/airflow/scripts
sudo chown airflow:airflow /opt/airflow/scripts/health_check.sh
sudo chmod 755 /opt/airflow/scripts/health_check.sh

Configure log rotation

Set up log rotation for Airflow and Redis logs to prevent disk space issues in production.

/var/log/airflow/*.log {
    daily
    missingok
    rotate 30
    compress
    notifempty
    create 644 airflow airflow
    postrotate
        systemctl reload airflow-webserver airflow-scheduler airflow-worker
    endscript
}

/var/log/redis/*.log {
    daily
    missingok
    rotate 30
    compress
    notifempty
    create 644 redis redis
    postrotate
        systemctl reload redis-cluster-7000 redis-cluster-7001 redis-cluster-7002
    endscript
}

Verify your setup

Run comprehensive checks to verify your high availability Airflow deployment is working correctly.

# Check all services status
sudo systemctl status airflow-webserver airflow-scheduler airflow-worker airflow-flower
sudo systemctl status redis-cluster-7000 redis-cluster-7001 redis-cluster-7002

Run health check

sudo -u airflow /opt/airflow/scripts/health_check.sh

Check Redis cluster status

redis-cli -p 7000 -a airflow-redis-pass cluster info redis-cli -p 7000 -a airflow-redis-pass cluster nodes

Test database connection

sudo -u airflow AIRFLOW_CONFIG=/opt/airflow/config/airflow.cfg /opt/airflow/venv/bin/airflow db check

List DAGs

sudo -u airflow AIRFLOW_CONFIG=/opt/airflow/config/airflow.cfg /opt/airflow/venv/bin/airflow dags list

Check worker status

curl -s http://localhost:5555/api/workers | python3 -m json.tool

Access the Airflow web interface at http://your-server-ip:8080 and Flower monitoring at http://your-server-ip:5555. You can also check our guide on monitoring with Prometheus and Grafana for comprehensive observability.

Common issues

SymptomCauseFix
Celery workers not connectingRedis authentication failureVerify Redis password in /opt/airflow/config/airflow.cfg matches cluster config
DAGs not appearingScheduler not running or DAG syntax errorsudo systemctl restart airflow-scheduler and check /var/log/airflow/scheduler.log
Tasks stuck in queued stateNo available workers or connection issuesCheck worker status with curl http://localhost:5555/api/workers
Redis cluster nodes disconnectedNetwork issues or node failureCheck cluster status with redis-cli -p 7000 cluster nodes and restart failed nodes
Database connection errorsPostgreSQL authentication or network issuesTest connection with psql -h localhost -U airflow -d airflow
High memory usageToo many concurrent workers or tasksReduce worker_concurrency and parallelism in airflow.cfg

Next steps

Running this in production?

Want this handled for you? Running this at scale adds a second layer of work: capacity planning, failover drills, cost control, and on-call. Our managed platform covers monitoring, backups and 24/7 response by default.

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle managed devops services for businesses that depend on uptime. From initial setup to ongoing operations.