Configure Apache Airflow performance optimization with connection pooling and resource tuning

Advanced 45 min Apr 14, 2026 31 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Optimize Apache Airflow performance through advanced connection pooling, resource tuning, and Celery executor configuration. Learn to scale workers, configure database pools, and implement comprehensive monitoring for production workflows.

Prerequisites

  • Apache Airflow 2.7+ installed
  • PostgreSQL 12+ database
  • Redis 6+ server
  • Root or sudo access
  • 4GB+ RAM available

What this solves

Apache Airflow performance degrades under heavy workloads when default configurations are used in production environments. This tutorial optimizes Airflow performance through connection pooling, worker resource allocation, Celery executor tuning, and monitoring setup to handle thousands of concurrent tasks efficiently.

Step-by-step configuration

Install performance monitoring tools

Install system monitoring tools to track resource usage during optimization.

sudo apt update
sudo apt install -y htop iotop sysstat postgresql-client redis-tools
sudo systemctl enable sysstat
sudo dnf update -y
sudo dnf install -y htop iotop sysstat postgresql redis
sudo systemctl enable sysstat

Configure PostgreSQL connection pooling

Optimize the PostgreSQL backend for Airflow by configuring connection pools and performance settings.

max_connections = 200
shared_buffers = 256MB
effective_cache_size = 1GB
work_mem = 4MB
maintenance_work_mem = 64MB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200

Configure Airflow database connection pool

Update Airflow configuration to optimize database connection pooling and reduce connection overhead.

[database]
sql_alchemy_pool_size = 20
sql_alchemy_max_overflow = 40
sql_alchemy_pool_recycle = 3600
sql_alchemy_pool_pre_ping = True
sql_alchemy_engine_args = {"pool_reset_on_return": "commit"}

[core]
max_active_runs_per_dag = 16
max_active_tasks_per_dag = 16
parallelism = 32
max_active_runs = 16
dag_concurrency = 16

Configure Celery executor settings

Optimize Celery executor configuration for distributed task processing and worker management.

[celery]
worker_concurrency = 4
worker_prefetch_multiplier = 1
task_always_eager = False
result_backend_db_mode = db+postgresql://airflow:password@localhost/airflow
broker_url = redis://localhost:6379/0
celery_app_name = airflow.providers.celery.executors.celery_executor
worker_enable_remote_control = True
worker_send_task_events = True
task_send_sent_event = True
result_expires = 3600
worker_max_tasks_per_child = 1000
worker_disable_rate_limits = True

Configure Redis for Celery broker

Optimize Redis configuration for Celery message broker performance and reliability.

maxmemory 2gb
maxmemory-policy allkeys-lru
save 900 1
save 300 10
save 60 10000
tcp-keepalive 60
timeout 0
maxclients 10000
tcp-backlog 511
databases 16
sudo systemctl restart redis-server
sudo systemctl enable redis-server

Configure worker resource limits

Set resource limits for Airflow workers to prevent memory leaks and ensure stable operation.

[Unit]
Description=Airflow Worker
After=network.target postgresql.service redis.service
Wants=postgresql.service redis.service

[Service]
Environment="AIRFLOW_HOME=/opt/airflow"
User=airflow
Group=airflow
Type=simple
ExecStart=/opt/airflow/venv/bin/airflow celery worker
Restart=always
RestartSec=5s
PrivateTmp=true
MemoryMax=2G
CPUQuota=200%
TasksMax=4096
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

Configure scheduler performance settings

Optimize the Airflow scheduler for better task scheduling and reduced latency.

[scheduler]
min_file_process_interval = 30
dag_dir_list_interval = 300
print_stats_interval = 30
scheduler_heartbeat_sec = 5
scheduler_health_check_threshold = 30
orphaned_tasks_check_interval = 300
child_process_timeout = 60
scheduler_zombie_task_threshold = 300
max_tis_per_query = 512
parsing_processes = 2
use_row_level_locking = True
allow_trigger_in_future = False

Configure webserver performance

Optimize Airflow webserver configuration for better UI performance and concurrent user handling.

[webserver]
workers = 4
worker_class = sync
worker_connections = 1000
web_server_worker_timeout = 120
web_server_master_timeout = 120
worker_refresh_batch_size = 1
worker_refresh_interval = 30
reload_on_plugin_change = False
update_fab_perms = True
warn_deployment_exposure = True

Configure task execution optimization

Set task-level performance settings to reduce overhead and improve execution speed.

[core]
execute_tasks_new_python_interpreter = False
fork_process_for_task_execution = True
task_runner = StandardTaskRunner
default_task_retries = 0
max_active_runs_per_dag = 16
dagbag_import_timeout = 30
dag_file_processor_timeout = 50
killed_task_cleanup_time = 60

[logging]
remote_logging = False
logging_level = INFO
fab_logging_level = WARN
log_filename_template = {{ ti.dag_id }}/{{ ti.task_id }}/{{ ts }}/{{ try_number }}.log
log_processor_filename_template = {{ filename }}.log

Create worker scaling script

Implement automatic worker scaling based on queue depth and system resources.

#!/bin/bash

AIRFLOW_HOME="/opt/airflow"
MIN_WORKERS=2
MAX_WORKERS=8
QUEUE_THRESHOLD=10

get_queue_length() {
    redis-cli -h localhost -p 6379 llen celery | head -n1
}

get_active_workers() {
    systemctl list-units --type=service --state=active | grep "airflow-worker@" | wc -l
}

scale_up() {
    local current_workers=$1
    local new_worker_id=$((current_workers + 1))
    
    if [ $current_workers -lt $MAX_WORKERS ]; then
        sudo systemctl start airflow-worker@$new_worker_id
        echo "$(date): Scaled up to $new_worker_id workers"
    fi
}

scale_down() {
    local current_workers=$1
    
    if [ $current_workers -gt $MIN_WORKERS ]; then
        sudo systemctl stop airflow-worker@$current_workers
        echo "$(date): Scaled down to $((current_workers - 1)) workers"
    fi
}

queue_length=$(get_queue_length)
current_workers=$(get_active_workers)

if [ $queue_length -gt $QUEUE_THRESHOLD ] && [ $current_workers -lt $MAX_WORKERS ]; then
    scale_up $current_workers
elif [ $queue_length -eq 0 ] && [ $current_workers -gt $MIN_WORKERS ]; then
    scale_down $current_workers
fi
chmod 755 /opt/airflow/scripts/scale_workers.sh
chown airflow:airflow /opt/airflow/scripts/scale_workers.sh

Configure Prometheus metrics

Set up Prometheus metrics collection for comprehensive Airflow monitoring.

[metrics]
statsd_on = True
statsd_host = localhost
statsd_port = 8125
statsd_prefix = airflow
statsd_allow_list = scheduler,executor,dagrun,taskinstance,pool
statsd_datadog_enabled = False
statsd_datadog_tags = False
sudo apt install -y prometheus-statsd-exporter
sudo systemctl enable --now prometheus-statsd-exporter
sudo dnf install -y golang
go install github.com/prometheus/statsd_exporter@latest
sudo cp ~/go/bin/statsd_exporter /usr/local/bin/
sudo systemctl enable --now statsd-exporter

Configure alerting rules

Create Prometheus alerting rules for critical Airflow performance metrics.

groups:
  - name: airflow
    rules:
      - alert: AirflowSchedulerDown
        expr: up{job="airflow-scheduler"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Airflow scheduler is down"
          description: "Airflow scheduler has been down for more than 1 minute"
      
      - alert: AirflowHighTaskFailureRate
        expr: rate(airflow_task_failures_total[5m]) > 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High task failure rate in Airflow"
          description: "Task failure rate is {{ $value }} per second"
      
      - alert: AirflowQueueBacklog
        expr: airflow_executor_queued_tasks > 100
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Airflow queue backlog is high"
          description: "Queue has {{ $value }} tasks waiting"
      
      - alert: AirflowDatabaseConnections
        expr: airflow_database_connections > 80
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "High database connection usage"
          description: "Database connection pool usage is at {{ $value }}%"

Create monitoring dashboard configuration

Configure Grafana dashboard for Airflow performance monitoring and alerting.

{
  "dashboard": {
    "title": "Airflow Performance Monitor",
    "panels": [
      {
        "title": "Task Success Rate",
        "type": "stat",
        "targets": [{
          "expr": "rate(airflow_task_successes_total[5m]) / rate(airflow_task_total[5m]) * 100"
        }]
      },
      {
        "title": "Active Tasks",
        "type": "graph",
        "targets": [{
          "expr": "airflow_executor_running_tasks"
        }]
      },
      {
        "title": "Queue Depth",
        "type": "graph",
        "targets": [{
          "expr": "airflow_executor_queued_tasks"
        }]
      },
      {
        "title": "Database Connections",
        "type": "graph",
        "targets": [{
          "expr": "airflow_database_connections"
        }]
      }
    ]
  }
}

Apply configurations and restart services

Restart all Airflow services to apply the performance optimizations.

sudo systemctl restart postgresql
sudo systemctl restart redis-server
sudo systemctl daemon-reload
sudo systemctl restart airflow-scheduler
sudo systemctl restart airflow-webserver
sudo systemctl restart airflow-worker

Verify your setup

Check that all services are running with optimized configurations and monitor performance metrics.

sudo systemctl status airflow-scheduler airflow-webserver airflow-worker
redis-cli ping
psql -h localhost -U airflow -d airflow -c "SELECT COUNT(*) FROM dag;"

Check connection pool usage

psql -h localhost -U postgres -d postgres -c "SELECT count(*) as connections, usename FROM pg_stat_activity GROUP BY usename;"

Monitor Celery worker performance

airflow celery inspect active airflow celery inspect stats

Check queue status

redis-cli llen celery

Monitor system resources

htop iostat -x 1 5
Note: Monitor the system for 24-48 hours to ensure stable performance under production load. Adjust worker counts and resource limits based on actual usage patterns.

Common issues

SymptomCauseFix
Tasks stuck in queued stateWorker resource exhaustionIncrease worker memory limits or reduce concurrency
Database connection errorsPool size too smallIncrease sql_alchemy_pool_size and max_overflow
High memory usageWorker memory leaksReduce worker_max_tasks_per_child to 500-1000
Slow DAG processingInsufficient parsing processesIncrease parsing_processes to CPU count
Redis connection timeoutNetwork or memory issuesCheck Redis logs and increase maxmemory
Scheduler lagToo many concurrent DAGsReduce max_active_runs per DAG

Next steps

Need help?

Don't want to manage this yourself?

We handle infrastructure performance optimization for businesses that depend on uptime. From initial setup to ongoing operations.