Optimize Apache Airflow performance through advanced connection pooling, resource tuning, and Celery executor configuration. Learn to scale workers, configure database pools, and implement comprehensive monitoring for production workflows.
Prerequisites
- Apache Airflow 2.7+ installed
- PostgreSQL 12+ database
- Redis 6+ server
- Root or sudo access
- 4GB+ RAM available
What this solves
Apache Airflow performance degrades under heavy workloads when default configurations are used in production environments. This tutorial optimizes Airflow performance through connection pooling, worker resource allocation, Celery executor tuning, and monitoring setup to handle thousands of concurrent tasks efficiently.
Step-by-step configuration
Install performance monitoring tools
Install system monitoring tools to track resource usage during optimization.
sudo apt update
sudo apt install -y htop iotop sysstat postgresql-client redis-tools
sudo systemctl enable sysstat
Configure PostgreSQL connection pooling
Optimize the PostgreSQL backend for Airflow by configuring connection pools and performance settings.
max_connections = 200
shared_buffers = 256MB
effective_cache_size = 1GB
work_mem = 4MB
maintenance_work_mem = 64MB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
Configure Airflow database connection pool
Update Airflow configuration to optimize database connection pooling and reduce connection overhead.
[database]
sql_alchemy_pool_size = 20
sql_alchemy_max_overflow = 40
sql_alchemy_pool_recycle = 3600
sql_alchemy_pool_pre_ping = True
sql_alchemy_engine_args = {"pool_reset_on_return": "commit"}
[core]
max_active_runs_per_dag = 16
max_active_tasks_per_dag = 16
parallelism = 32
max_active_runs = 16
dag_concurrency = 16
Configure Celery executor settings
Optimize Celery executor configuration for distributed task processing and worker management.
[celery]
worker_concurrency = 4
worker_prefetch_multiplier = 1
task_always_eager = False
result_backend_db_mode = db+postgresql://airflow:password@localhost/airflow
broker_url = redis://localhost:6379/0
celery_app_name = airflow.providers.celery.executors.celery_executor
worker_enable_remote_control = True
worker_send_task_events = True
task_send_sent_event = True
result_expires = 3600
worker_max_tasks_per_child = 1000
worker_disable_rate_limits = True
Configure Redis for Celery broker
Optimize Redis configuration for Celery message broker performance and reliability.
maxmemory 2gb
maxmemory-policy allkeys-lru
save 900 1
save 300 10
save 60 10000
tcp-keepalive 60
timeout 0
maxclients 10000
tcp-backlog 511
databases 16
sudo systemctl restart redis-server
sudo systemctl enable redis-server
Configure worker resource limits
Set resource limits for Airflow workers to prevent memory leaks and ensure stable operation.
[Unit]
Description=Airflow Worker
After=network.target postgresql.service redis.service
Wants=postgresql.service redis.service
[Service]
Environment="AIRFLOW_HOME=/opt/airflow"
User=airflow
Group=airflow
Type=simple
ExecStart=/opt/airflow/venv/bin/airflow celery worker
Restart=always
RestartSec=5s
PrivateTmp=true
MemoryMax=2G
CPUQuota=200%
TasksMax=4096
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
Configure scheduler performance settings
Optimize the Airflow scheduler for better task scheduling and reduced latency.
[scheduler]
min_file_process_interval = 30
dag_dir_list_interval = 300
print_stats_interval = 30
scheduler_heartbeat_sec = 5
scheduler_health_check_threshold = 30
orphaned_tasks_check_interval = 300
child_process_timeout = 60
scheduler_zombie_task_threshold = 300
max_tis_per_query = 512
parsing_processes = 2
use_row_level_locking = True
allow_trigger_in_future = False
Configure webserver performance
Optimize Airflow webserver configuration for better UI performance and concurrent user handling.
[webserver]
workers = 4
worker_class = sync
worker_connections = 1000
web_server_worker_timeout = 120
web_server_master_timeout = 120
worker_refresh_batch_size = 1
worker_refresh_interval = 30
reload_on_plugin_change = False
update_fab_perms = True
warn_deployment_exposure = True
Configure task execution optimization
Set task-level performance settings to reduce overhead and improve execution speed.
[core]
execute_tasks_new_python_interpreter = False
fork_process_for_task_execution = True
task_runner = StandardTaskRunner
default_task_retries = 0
max_active_runs_per_dag = 16
dagbag_import_timeout = 30
dag_file_processor_timeout = 50
killed_task_cleanup_time = 60
[logging]
remote_logging = False
logging_level = INFO
fab_logging_level = WARN
log_filename_template = {{ ti.dag_id }}/{{ ti.task_id }}/{{ ts }}/{{ try_number }}.log
log_processor_filename_template = {{ filename }}.log
Create worker scaling script
Implement automatic worker scaling based on queue depth and system resources.
#!/bin/bash
AIRFLOW_HOME="/opt/airflow"
MIN_WORKERS=2
MAX_WORKERS=8
QUEUE_THRESHOLD=10
get_queue_length() {
redis-cli -h localhost -p 6379 llen celery | head -n1
}
get_active_workers() {
systemctl list-units --type=service --state=active | grep "airflow-worker@" | wc -l
}
scale_up() {
local current_workers=$1
local new_worker_id=$((current_workers + 1))
if [ $current_workers -lt $MAX_WORKERS ]; then
sudo systemctl start airflow-worker@$new_worker_id
echo "$(date): Scaled up to $new_worker_id workers"
fi
}
scale_down() {
local current_workers=$1
if [ $current_workers -gt $MIN_WORKERS ]; then
sudo systemctl stop airflow-worker@$current_workers
echo "$(date): Scaled down to $((current_workers - 1)) workers"
fi
}
queue_length=$(get_queue_length)
current_workers=$(get_active_workers)
if [ $queue_length -gt $QUEUE_THRESHOLD ] && [ $current_workers -lt $MAX_WORKERS ]; then
scale_up $current_workers
elif [ $queue_length -eq 0 ] && [ $current_workers -gt $MIN_WORKERS ]; then
scale_down $current_workers
fi
chmod 755 /opt/airflow/scripts/scale_workers.sh
chown airflow:airflow /opt/airflow/scripts/scale_workers.sh
Configure Prometheus metrics
Set up Prometheus metrics collection for comprehensive Airflow monitoring.
[metrics]
statsd_on = True
statsd_host = localhost
statsd_port = 8125
statsd_prefix = airflow
statsd_allow_list = scheduler,executor,dagrun,taskinstance,pool
statsd_datadog_enabled = False
statsd_datadog_tags = False
sudo apt install -y prometheus-statsd-exporter
sudo systemctl enable --now prometheus-statsd-exporter
Configure alerting rules
Create Prometheus alerting rules for critical Airflow performance metrics.
groups:
- name: airflow
rules:
- alert: AirflowSchedulerDown
expr: up{job="airflow-scheduler"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Airflow scheduler is down"
description: "Airflow scheduler has been down for more than 1 minute"
- alert: AirflowHighTaskFailureRate
expr: rate(airflow_task_failures_total[5m]) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "High task failure rate in Airflow"
description: "Task failure rate is {{ $value }} per second"
- alert: AirflowQueueBacklog
expr: airflow_executor_queued_tasks > 100
for: 5m
labels:
severity: warning
annotations:
summary: "Airflow queue backlog is high"
description: "Queue has {{ $value }} tasks waiting"
- alert: AirflowDatabaseConnections
expr: airflow_database_connections > 80
for: 2m
labels:
severity: warning
annotations:
summary: "High database connection usage"
description: "Database connection pool usage is at {{ $value }}%"
Create monitoring dashboard configuration
Configure Grafana dashboard for Airflow performance monitoring and alerting.
{
"dashboard": {
"title": "Airflow Performance Monitor",
"panels": [
{
"title": "Task Success Rate",
"type": "stat",
"targets": [{
"expr": "rate(airflow_task_successes_total[5m]) / rate(airflow_task_total[5m]) * 100"
}]
},
{
"title": "Active Tasks",
"type": "graph",
"targets": [{
"expr": "airflow_executor_running_tasks"
}]
},
{
"title": "Queue Depth",
"type": "graph",
"targets": [{
"expr": "airflow_executor_queued_tasks"
}]
},
{
"title": "Database Connections",
"type": "graph",
"targets": [{
"expr": "airflow_database_connections"
}]
}
]
}
}
Apply configurations and restart services
Restart all Airflow services to apply the performance optimizations.
sudo systemctl restart postgresql
sudo systemctl restart redis-server
sudo systemctl daemon-reload
sudo systemctl restart airflow-scheduler
sudo systemctl restart airflow-webserver
sudo systemctl restart airflow-worker
Verify your setup
Check that all services are running with optimized configurations and monitor performance metrics.
sudo systemctl status airflow-scheduler airflow-webserver airflow-worker
redis-cli ping
psql -h localhost -U airflow -d airflow -c "SELECT COUNT(*) FROM dag;"
Check connection pool usage
psql -h localhost -U postgres -d postgres -c "SELECT count(*) as connections, usename FROM pg_stat_activity GROUP BY usename;"
Monitor Celery worker performance
airflow celery inspect active
airflow celery inspect stats
Check queue status
redis-cli llen celery
Monitor system resources
htop
iostat -x 1 5
Common issues
| Symptom | Cause | Fix |
|---|---|---|
| Tasks stuck in queued state | Worker resource exhaustion | Increase worker memory limits or reduce concurrency |
| Database connection errors | Pool size too small | Increase sql_alchemy_pool_size and max_overflow |
| High memory usage | Worker memory leaks | Reduce worker_max_tasks_per_child to 500-1000 |
| Slow DAG processing | Insufficient parsing processes | Increase parsing_processes to CPU count |
| Redis connection timeout | Network or memory issues | Check Redis logs and increase maxmemory |
| Scheduler lag | Too many concurrent DAGs | Reduce max_active_runs per DAG |
Next steps
- Set up Apache Airflow high availability with CeleryExecutor and Redis clustering
- Configure Apache Airflow monitoring with Prometheus alerts and Grafana dashboards
- Monitor Kubernetes clusters with Prometheus and Grafana for container orchestration insights
- Configure Airflow DAG security and isolation with RBAC policies
- Implement custom Airflow operators and sensors for advanced workflow automation