Configure Apache Airflow Performance Optimization

Optimize Apache Airflow performance through advanced connection pooling, resource tuning, and Celery executor configuration. Learn to scale workers, configure database pools, and implement comprehensive monitoring for production workflows.

Prerequisites

Apache Airflow 2.7+ installed
PostgreSQL 12+ database
Redis 6+ server
Root or sudo access
4GB+ RAM available

What this solves

Apache Airflow performance degrades under heavy workloads when default configurations are used in production environments. This tutorial optimizes Airflow performance through connection pooling, worker resource allocation, Celery executor tuning, and monitoring setup to handle thousands of concurrent tasks efficiently.

Step-by-step configuration

Install performance monitoring tools

Install system monitoring tools to track resource usage during optimization.

sudo apt update
sudo apt install -y htop iotop sysstat postgresql-client redis-tools
sudo systemctl enable sysstat

sudo dnf update -y
sudo dnf install -y htop iotop sysstat postgresql redis
sudo systemctl enable sysstat

Configure PostgreSQL connection pooling

Optimize the PostgreSQL backend for Airflow by configuring connection pools and performance settings.

max_connections = 200
shared_buffers = 256MB
effective_cache_size = 1GB
work_mem = 4MB
maintenance_work_mem = 64MB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200

Configure Airflow database connection pool

Update Airflow configuration to optimize database connection pooling and reduce connection overhead.

[database]
sql_alchemy_pool_size = 20
sql_alchemy_max_overflow = 40
sql_alchemy_pool_recycle = 3600
sql_alchemy_pool_pre_ping = True
sql_alchemy_engine_args = {"pool_reset_on_return": "commit"}

[core]
max_active_runs_per_dag = 16
max_active_tasks_per_dag = 16
parallelism = 32
max_active_runs = 16
dag_concurrency = 16

Configure Celery executor settings

Optimize Celery executor configuration for distributed task processing and worker management.

[celery]
worker_concurrency = 4
worker_prefetch_multiplier = 1
task_always_eager = False
result_backend_db_mode = db+postgresql://airflow:password@localhost/airflow
broker_url = redis://localhost:6379/0
celery_app_name = airflow.providers.celery.executors.celery_executor
worker_enable_remote_control = True
worker_send_task_events = True
task_send_sent_event = True
result_expires = 3600
worker_max_tasks_per_child = 1000
worker_disable_rate_limits = True

Configure Redis for Celery broker

Optimize Redis configuration for Celery message broker performance and reliability.

maxmemory 2gb
maxmemory-policy allkeys-lru
save 900 1
save 300 10
save 60 10000
tcp-keepalive 60
timeout 0
maxclients 10000
tcp-backlog 511
databases 16

sudo systemctl restart redis-server
sudo systemctl enable redis-server

Configure worker resource limits

Set resource limits for Airflow workers to prevent memory leaks and ensure stable operation.

[Unit]
Description=Airflow Worker
After=network.target postgresql.service redis.service
Wants=postgresql.service redis.service

[Service]
Environment="AIRFLOW_HOME=/opt/airflow"
User=airflow
Group=airflow
Type=simple
ExecStart=/opt/airflow/venv/bin/airflow celery worker
Restart=always
RestartSec=5s
PrivateTmp=true
MemoryMax=2G
CPUQuota=200%
TasksMax=4096
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

Configure scheduler performance settings

Optimize the Airflow scheduler for better task scheduling and reduced latency.

[scheduler]
min_file_process_interval = 30
dag_dir_list_interval = 300
print_stats_interval = 30
scheduler_heartbeat_sec = 5
scheduler_health_check_threshold = 30
orphaned_tasks_check_interval = 300
child_process_timeout = 60
scheduler_zombie_task_threshold = 300
max_tis_per_query = 512
parsing_processes = 2
use_row_level_locking = True
allow_trigger_in_future = False

Configure webserver performance

Optimize Airflow webserver configuration for better UI performance and concurrent user handling.

[webserver]
workers = 4
worker_class = sync
worker_connections = 1000
web_server_worker_timeout = 120
web_server_master_timeout = 120
worker_refresh_batch_size = 1
worker_refresh_interval = 30
reload_on_plugin_change = False
update_fab_perms = True
warn_deployment_exposure = True

Configure task execution optimization

Set task-level performance settings to reduce overhead and improve execution speed.

[core]
execute_tasks_new_python_interpreter = False
fork_process_for_task_execution = True
task_runner = StandardTaskRunner
default_task_retries = 0
max_active_runs_per_dag = 16
dagbag_import_timeout = 30
dag_file_processor_timeout = 50
killed_task_cleanup_time = 60

[logging]
remote_logging = False
logging_level = INFO
fab_logging_level = WARN
log_filename_template = {{ ti.dag_id }}/{{ ti.task_id }}/{{ ts }}/{{ try_number }}.log
log_processor_filename_template = {{ filename }}.log

Create worker scaling script

Implement automatic worker scaling based on queue depth and system resources.

#!/bin/bash

AIRFLOW_HOME="/opt/airflow"
MIN_WORKERS=2
MAX_WORKERS=8
QUEUE_THRESHOLD=10

get_queue_length() {
    redis-cli -h localhost -p 6379 llen celery | head -n1
}

get_active_workers() {
    systemctl list-units --type=service --state=active | grep "airflow-worker@" | wc -l
}

scale_up() {
    local current_workers=$1
    local new_worker_id=$((current_workers + 1))
    
    if [ $current_workers -lt $MAX_WORKERS ]; then
        sudo systemctl start airflow-worker@$new_worker_id
        echo "$(date): Scaled up to $new_worker_id workers"
    fi
}

scale_down() {
    local current_workers=$1
    
    if [ $current_workers -gt $MIN_WORKERS ]; then
        sudo systemctl stop airflow-worker@$current_workers
        echo "$(date): Scaled down to $((current_workers - 1)) workers"
    fi
}

queue_length=$(get_queue_length)
current_workers=$(get_active_workers)

if [ $queue_length -gt $QUEUE_THRESHOLD ] && [ $current_workers -lt $MAX_WORKERS ]; then
    scale_up $current_workers
elif [ $queue_length -eq 0 ] && [ $current_workers -gt $MIN_WORKERS ]; then
    scale_down $current_workers
fi

chmod 755 /opt/airflow/scripts/scale_workers.sh
chown airflow:airflow /opt/airflow/scripts/scale_workers.sh

Configure Prometheus metrics

Set up Prometheus metrics collection for comprehensive Airflow monitoring.

[metrics]
statsd_on = True
statsd_host = localhost
statsd_port = 8125
statsd_prefix = airflow
statsd_allow_list = scheduler,executor,dagrun,taskinstance,pool
statsd_datadog_enabled = False
statsd_datadog_tags = False

sudo apt install -y prometheus-statsd-exporter
sudo systemctl enable --now prometheus-statsd-exporter

sudo dnf install -y golang
go install github.com/prometheus/statsd_exporter@latest
sudo cp ~/go/bin/statsd_exporter /usr/local/bin/
sudo systemctl enable --now statsd-exporter

Configure alerting rules

Create Prometheus alerting rules for critical Airflow performance metrics.

groups:
  - name: airflow
    rules:
      - alert: AirflowSchedulerDown
        expr: up{job="airflow-scheduler"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Airflow scheduler is down"
          description: "Airflow scheduler has been down for more than 1 minute"
      
      - alert: AirflowHighTaskFailureRate
        expr: rate(airflow_task_failures_total[5m]) > 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High task failure rate in Airflow"
          description: "Task failure rate is {{ $value }} per second"
      
      - alert: AirflowQueueBacklog
        expr: airflow_executor_queued_tasks > 100
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Airflow queue backlog is high"
          description: "Queue has {{ $value }} tasks waiting"
      
      - alert: AirflowDatabaseConnections
        expr: airflow_database_connections > 80
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "High database connection usage"
          description: "Database connection pool usage is at {{ $value }}%"

Create monitoring dashboard configuration

Configure Grafana dashboard for Airflow performance monitoring and alerting.

{
  "dashboard": {
    "title": "Airflow Performance Monitor",
    "panels": [
      {
        "title": "Task Success Rate",
        "type": "stat",
        "targets": [{
          "expr": "rate(airflow_task_successes_total[5m]) / rate(airflow_task_total[5m]) * 100"
        }]
      },
      {
        "title": "Active Tasks",
        "type": "graph",
        "targets": [{
          "expr": "airflow_executor_running_tasks"
        }]
      },
      {
        "title": "Queue Depth",
        "type": "graph",
        "targets": [{
          "expr": "airflow_executor_queued_tasks"
        }]
      },
      {
        "title": "Database Connections",
        "type": "graph",
        "targets": [{
          "expr": "airflow_database_connections"
        }]
      }
    ]
  }
}

Apply configurations and restart services

Restart all Airflow services to apply the performance optimizations.

sudo systemctl restart postgresql
sudo systemctl restart redis-server
sudo systemctl daemon-reload
sudo systemctl restart airflow-scheduler
sudo systemctl restart airflow-webserver
sudo systemctl restart airflow-worker

Verify your setup

Check that all services are running with optimized configurations and monitor performance metrics.

sudo systemctl status airflow-scheduler airflow-webserver airflow-worker
redis-cli ping
psql -h localhost -U airflow -d airflow -c "SELECT COUNT(*) FROM dag;"

Check connection pool usage
psql -h localhost -U postgres -d postgres -c "SELECT count(*) as connections, usename FROM pg_stat_activity GROUP BY usename;"

Monitor Celery worker performance
airflow celery inspect active
airflow celery inspect stats

Check queue status
redis-cli llen celery

Monitor system resources
htop
iostat -x 1 5

Note: Monitor the system for 24-48 hours to ensure stable performance under production load. Adjust worker counts and resource limits based on actual usage patterns.

Common issues

Symptom	Cause	Fix
Tasks stuck in queued state	Worker resource exhaustion	Increase worker memory limits or reduce concurrency
Database connection errors	Pool size too small	Increase `sql_alchemy_pool_size` and `max_overflow`
High memory usage	Worker memory leaks	Reduce `worker_max_tasks_per_child` to 500-1000
Slow DAG processing	Insufficient parsing processes	Increase `parsing_processes` to CPU count
Redis connection timeout	Network or memory issues	Check Redis logs and increase `maxmemory`
Scheduler lag	Too many concurrent DAGs	Reduce `max_active_runs` per DAG

Next steps

#airflow #performance #celery #connection-pooling #monitoring

Configure Apache Airflow performance optimization with connection pooling and resource tuning

Prerequisites

What this solves

Step-by-step configuration

Install performance monitoring tools

Configure PostgreSQL connection pooling

Configure Airflow database connection pool

Configure Celery executor settings

Configure Redis for Celery broker

Configure worker resource limits

Configure scheduler performance settings

Configure webserver performance

Configure task execution optimization

Create worker scaling script

Configure Prometheus metrics

Configure alerting rules

Create monitoring dashboard configuration

Apply configurations and restart services

Verify your setup

Check connection pool usage

Monitor Celery worker performance

Check queue status

Monitor system resources

Common issues

Next steps

Related tutorials

Configure Cherokee caching and compression for improved performance

Implement Spark SQL performance optimization with Catalyst optimizer and advanced tuning

Configure Nginx Redis cluster caching for high availability and performance optimization

Don't want to manage this yourself?