Configure Apache Airflow DAG security and secrets management with RBAC policies and encryption

Advanced 45 min Apr 22, 2026 119 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Implement comprehensive security for Apache Airflow DAGs using role-based access control, HashiCorp Vault integration, and encrypted secrets management. Configure granular permissions, audit logging, and isolation policies for production workflows.

Prerequisites

  • Existing Apache Airflow installation
  • PostgreSQL database
  • Root or sudo access
  • Basic understanding of RBAC concepts
  • HashiCorp Vault knowledge helpful

What this solves

Apache Airflow DAGs often handle sensitive data and credentials, requiring proper security controls to prevent unauthorized access and data breaches. This tutorial configures comprehensive DAG security with RBAC policies, encrypted secrets management through HashiCorp Vault, and audit logging to meet enterprise security requirements.

Prerequisites

You need an existing Apache Airflow installation with PostgreSQL backend. If you don't have this setup, follow our Apache Airflow installation guide first.

Note: This tutorial assumes Airflow is running as the airflow user with installation directory at /opt/airflow.

Step-by-step configuration

Install required dependencies

Install the necessary Python packages for Vault integration and enhanced security features.

sudo -u airflow pip install apache-airflow[password,ldap]
sudo -u airflow pip install hvac cryptography
sudo -u airflow pip install apache-airflow[password,ldap]
sudo -u airflow pip install hvac cryptography

Configure RBAC authentication

Enable role-based access control in Airflow configuration to manage user permissions granularly.

[webserver]
authenticate = True
auth_backend = airflow.contrib.auth.backends.password_auth
rbac = True
expose_config = False
web_server_ssl_cert = /opt/airflow/ssl/airflow.crt
web_server_ssl_key = /opt/airflow/ssl/airflow.key
base_url = https://example.com:8080

[core]
fernet_key = your_32_character_fernet_key_here
security = kerberos
default_timezone = utc
max_active_runs_per_dag = 3
parallelism = 16
dag_concurrency = 8

Generate Fernet encryption key

Create a secure Fernet key for encrypting connection passwords and sensitive data in the database.

sudo -u airflow python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
sudo sed -i "s/your_32_character_fernet_key_here/$(sudo -u airflow python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())")/" /opt/airflow/airflow.cfg

Create SSL certificates

Generate SSL certificates for secure HTTPS communication with the Airflow webserver.

sudo mkdir -p /opt/airflow/ssl
sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
  -keyout /opt/airflow/ssl/airflow.key \
  -out /opt/airflow/ssl/airflow.crt \
  -subj "/C=US/ST=State/L=City/O=Organization/CN=example.com"
sudo chown -R airflow:airflow /opt/airflow/ssl
sudo chmod 600 /opt/airflow/ssl/airflow.key
sudo chmod 644 /opt/airflow/ssl/airflow.crt

Configure HashiCorp Vault integration

Set up Vault integration for secure secrets management. First, install and configure Vault if not already done.

wget -O- https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt update && sudo apt install -y vault
sudo yum install -y yum-utils
sudo yum-config-manager --add-repo https://rpm.releases.hashicorp.com/RHEL/hashicorp.repo
sudo yum install -y vault

Configure Vault for Airflow secrets

Configure Vault server and create a secrets backend for Airflow to retrieve sensitive configuration.

ui = true
mlock = true

storage "file" {
  path = "/opt/vault/data"
}

listener "tcp" {
  address     = "127.0.0.1:8200"
  tls_disable = 1
}

api_addr = "http://127.0.0.1:8200"
cluster_addr = "https://127.0.0.1:8201"
sudo mkdir -p /opt/vault/data
sudo chown -R vault:vault /opt/vault
sudo systemctl enable --now vault
export VAULT_ADDR='http://127.0.0.1:8200'
vault operator init -key-shares=5 -key-threshold=3

Enable secrets backend in Airflow

Configure Airflow to use Vault as the secrets backend for retrieving database connections and variables.

[secrets]
backend = airflow.providers.hashicorp.secrets.vault.VaultBackend
backend_kwargs = {"connections_path": "connections", "variables_path": "variables", "config_path": "config", "url": "http://127.0.0.1:8200", "token": "your_vault_token_here", "mount_point": "airflow"}

Create custom RBAC roles

Define custom roles with specific permissions for different user types and DAG access levels.

from airflow.www.security import AirflowSecurityManager
from flask_appbuilder.security.views import AuthDBView
from flask_appbuilder.security.views import expose
from flask_login import login_user, logout_user

class CustomSecurityManager(AirflowSecurityManager):
    def __init__(self, appbuilder):
        super(CustomSecurityManager, self).__init__(appbuilder)
        # Define custom roles
        self.create_custom_roles()
    
    def create_custom_roles(self):
        # Create DAG Developer role
        dag_developer_perms = [
            ('can_read', 'DAG Runs'),
            ('can_create', 'DAG Runs'),
            ('can_edit', 'DAG Runs'),
            ('can_read', 'Task Instances'),
            ('can_read', 'Logs'),
            ('can_read', 'ImportError'),
            ('can_read', 'DAG'),
            ('can_read', 'Task Reschedule'),
        ]
        
        # Create DAG Viewer role  
        dag_viewer_perms = [
            ('can_read', 'DAG Runs'),
            ('can_read', 'Task Instances'),
            ('can_read', 'Logs'),
            ('can_read', 'DAG'),
        ]
        
        # Create Production Operator role
        prod_operator_perms = [
            ('can_read', 'DAG Runs'),
            ('can_create', 'DAG Runs'), 
            ('can_edit', 'DAG Runs'),
            ('can_delete', 'DAG Runs'),
            ('can_read', 'Task Instances'),
            ('can_edit', 'Task Instances'),
            ('can_read', 'Logs'),
            ('can_read', 'DAG'),
            ('can_edit', 'DAG'),
        ]
        
        self._create_role_if_not_exists('DAG_Developer', dag_developer_perms)
        self._create_role_if_not_exists('DAG_Viewer', dag_viewer_perms)
        self._create_role_if_not_exists('Production_Operator', prod_operator_perms)
    
    def _create_role_if_not_exists(self, role_name, permissions):
        role = self.find_role(role_name)
        if not role:
            role = self.add_role(role_name)
            for perm_name, view_name in permissions:
                pvm = self.find_permission_view_menu(perm_name, view_name)
                if pvm:
                    self.add_permission_role(role, pvm)

Configure DAG-level access control

Implement DAG-level permissions to restrict access to specific workflows based on user roles and tags.

from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.bash import BashOperator
from airflow.operators.python import PythonOperator
from airflow.models import Variable
from airflow.utils.dates import days_ago

DAG with security classifications

default_args = { 'owner': 'data-team', 'depends_on_past': False, 'start_date': days_ago(1), 'email_on_failure': True, 'email_on_retry': False, 'retries': 1, 'retry_delay': timedelta(minutes=5), }

Production DAG with restricted access

dag_production = DAG( 'production_etl_pipeline', default_args=default_args, description='Production ETL pipeline with restricted access', schedule_interval=timedelta(hours=6), access_control={ 'Production_Operator': {'can_read', 'can_edit', 'can_create'}, 'Admin': {'can_read', 'can_edit', 'can_delete', 'can_create'}, }, tags=['production', 'sensitive', 'finance'], catchup=False, max_active_runs=1, is_paused_upon_creation=True, )

Development DAG with broader access

dag_development = DAG( 'development_test_pipeline', default_args=default_args, description='Development pipeline for testing', schedule_interval=None, access_control={ 'DAG_Developer': {'can_read', 'can_edit', 'can_create'}, 'DAG_Viewer': {'can_read'}, 'Production_Operator': {'can_read', 'can_edit'}, 'Admin': {'can_read', 'can_edit', 'can_delete', 'can_create'}, }, tags=['development', 'testing'], catchup=False, ) def get_sensitive_data(): # Retrieve encrypted connection from Vault db_password = Variable.get("database_password", deserialize_json=False) api_key = Variable.get("external_api_key", deserialize_json=False) return f"Retrieved secure credentials: {len(db_password)} chars"

Production tasks

secure_task = PythonOperator( task_id='fetch_secure_data', python_callable=get_sensitive_data, dag=dag_production, pool='sensitive_pool', ) data_processing = BashOperator( task_id='process_sensitive_data', bash_command='echo "Processing data with security controls"', dag=dag_production, pool='sensitive_pool', ) secure_task >> data_processing

Development tasks

test_task = BashOperator( task_id='test_basic_functionality', bash_command='echo "Running development tests"', dag=dag_development, )

Configure resource pools for isolation

Create resource pools to isolate sensitive DAGs and limit concurrent execution of critical workflows.

sudo -u airflow airflow pools set sensitive_pool 2 "Pool for sensitive production DAGs"
sudo -u airflow airflow pools set development_pool 5 "Pool for development and testing DAGs"
sudo -u airflow airflow pools set default_pool 16 "Default pool for general DAGs"

Enable comprehensive audit logging

Configure detailed audit logging to track all user actions, DAG executions, and security events.

[logging]
logging_level = INFO
fab_logging_level = WARN
logging_config_class = airflow.config_templates.airflow_local_settings.DEFAULT_LOGGING_CONFIG
remote_logging = True
remote_log_conn_id = aws_s3_logs
remote_base_log_folder = s3://airflow-logs/
encrypt_s3_logs = True

[webserver]
audit_view_excluded_events = login,logout
audit_view_include_ids = True

[core]
sql_alchemy_conn = postgresql+psycopg2://airflow:password@localhost:5432/airflow
load_examples = False
dag_dir_list_interval = 300
dag_discovery_safe_mode = True
default_task_retries = 1
killed_task_cleanup_time = 60

Configure database connection encryption

Ensure database connections use SSL encryption and store credentials securely in Vault.

# Store database connection in Vault
vault kv put airflow/connections/postgres_default \
  conn_type=postgres \
  host=localhost \
  schema=airflow \
  login=airflow \
  password=secure_password_here \
  port=5432 \
  extra='{"sslmode": "require", "sslcert": "/opt/airflow/ssl/client.crt", "sslkey": "/opt/airflow/ssl/client.key", "sslrootcert": "/opt/airflow/ssl/ca.crt"}'

Store API keys and sensitive variables

vault kv put airflow/variables/database_password value=secure_db_password vault kv put airflow/variables/external_api_key value=api_key_12345

Create user accounts with RBAC

Set up user accounts with appropriate role assignments for different access levels.

# Create admin user
sudo -u airflow airflow users create \
  --username admin \
  --firstname Admin \
  --lastname User \
  --role Admin \
  --email admin@example.com \
  --password secure_admin_password

Create production operator

sudo -u airflow airflow users create \ --username prodops \ --firstname Production \ --lastname Operator \ --role Production_Operator \ --email prodops@example.com \ --password secure_prod_password

Create DAG developer

sudo -u airflow airflow users create \ --username devuser \ --firstname DAG \ --lastname Developer \ --role DAG_Developer \ --email devuser@example.com \ --password secure_dev_password

Create read-only viewer

sudo -u airflow airflow users create \ --username viewer \ --firstname DAG \ --lastname Viewer \ --role DAG_Viewer \ --email viewer@example.com \ --password secure_viewer_password

Configure firewall rules

Set up firewall rules to restrict access to Airflow services and allow only necessary connections.

sudo ufw allow 8080/tcp comment "Airflow Webserver HTTPS"
sudo ufw allow from 10.0.0.0/8 to any port 8793 comment "Airflow Worker"
sudo ufw allow from 127.0.0.1 to any port 8200 comment "Vault Local"
sudo ufw reload
sudo firewall-cmd --permanent --add-port=8080/tcp --comment="Airflow Webserver HTTPS"
sudo firewall-cmd --permanent --add-rich-rule="rule family=ipv4 source address=10.0.0.0/8 port protocol=tcp port=8793 accept" --comment="Airflow Worker"
sudo firewall-cmd --permanent --add-rich-rule="rule family=ipv4 source address=127.0.0.1 port protocol=tcp port=8200 accept" --comment="Vault Local"
sudo firewall-cmd --reload

Start secured Airflow services

Restart Airflow services with the new security configuration and verify all components are running correctly.

sudo systemctl restart airflow-webserver
sudo systemctl restart airflow-scheduler
sudo systemctl restart airflow-worker
sudo systemctl status airflow-webserver airflow-scheduler

Verify your setup

Test the security configuration by verifying RBAC permissions, SSL encryption, and secrets management.

# Check Airflow is running with SSL
curl -k https://localhost:8080/health

Verify Vault integration

sudo -u airflow airflow connections list sudo -u airflow airflow variables list

Test RBAC by logging in with different user roles

Navigate to https://localhost:8080 in your browser

Verify audit logging

sudo tail -f /opt/airflow/logs/dag_processor_manager/dag_processor_manager.log sudo tail -f /var/log/syslog | grep airflow

Configure monitoring and alerting

Set up monitoring for security events and failed authentication attempts. This integrates with existing monitoring solutions like Prometheus for comprehensive observability.

import logging
from airflow.models import DagRun, TaskInstance
from airflow.utils.email import send_email
from datetime import datetime, timedelta

Configure security event logging

security_logger = logging.getLogger('airflow.security') handler = logging.FileHandler('/opt/airflow/logs/security.log') formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s') handler.setFormatter(formatter) security_logger.addHandler(handler) security_logger.setLevel(logging.INFO) def log_security_event(event_type, user, details): """Log security events for audit trail""" security_logger.info(f"SECURITY_EVENT: {event_type} | USER: {user} | DETAILS: {details}") def check_failed_logins(): """Monitor for excessive failed login attempts""" # Implementation depends on your authentication backend # This is a basic example for database-backed authentication pass def monitor_dag_modifications(): """Alert on unauthorized DAG modifications""" # Check for DAG file modifications outside of approved processes pass

Common issues

SymptomCauseFix
RBAC permissions not workingCustom security manager not loadedEnsure AIRFLOW__WEBSERVER__RBAC=True and restart webserver
Vault connection failedToken expired or wrong mount pointCheck vault token validity: vault auth -method=userpass
SSL certificate errorsSelf-signed certificate not trustedAdd certificate to system trust store or use proper CA-signed cert
DAG access deniedUser role doesn't have required permissionsCheck role assignment: airflow users list
Fernet key encryption errorsKey changed after connections createdRegenerate connections or use same key across environments
Pool slots exhaustedToo many concurrent tasks in sensitive poolIncrease pool size: airflow pools set sensitive_pool 4
Security Warning: Never store secrets directly in DAG files or Airflow configuration. Always use Vault or another secure secrets backend. Regularly rotate Fernet keys and database passwords to maintain security.

Next steps

Running this in production?

Want this handled for you? Running this at scale adds a second layer of work: capacity planning, failover drills, cost control, and on-call. Our managed platform covers monitoring, backups and 24/7 response by default.

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle infrastructure security hardening for businesses that depend on uptime. From initial setup to ongoing operations.