Setup Apache Airflow cluster with Kubernetes Executor for auto-scaling workflows

Advanced 45 min Apr 08, 2026 247 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Deploy production-grade Apache Airflow with Kubernetes Executor for dynamic workflow scaling. Configure PostgreSQL backend, RBAC authentication, and auto-scaling policies with Prometheus monitoring integration.

Prerequisites

  • At least 8GB RAM
  • 4 CPU cores
  • 50GB disk space
  • Root or sudo access
  • Basic Kubernetes knowledge

What this solves

Apache Airflow with Kubernetes Executor provides dynamic scaling for data workflows by creating pods on-demand for each task execution. This setup eliminates resource waste from idle workers while handling variable workloads efficiently. You get built-in fault tolerance, resource isolation per task, and seamless integration with existing Kubernetes infrastructure for production data pipeline orchestration.

Step-by-step installation

Update system packages

Start by updating your package manager to ensure you get the latest versions of all dependencies.

sudo apt update && sudo apt upgrade -y
sudo apt install -y curl wget gnupg software-properties-common
sudo dnf update -y
sudo dnf install -y curl wget gnupg2 yum-utils

Install Docker and container runtime

Kubernetes requires a container runtime. Install Docker as the container engine for running Airflow worker pods.

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io
sudo systemctl enable --now docker
sudo usermod -aG docker $USER
sudo dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
sudo dnf install -y docker-ce docker-ce-cli containerd.io
sudo systemctl enable --now docker
sudo usermod -aG docker $USER

Install Kubernetes cluster

Install kubeadm, kubelet, and kubectl for managing the Kubernetes cluster that will host Airflow workers.

curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.31/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.31/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt update
sudo apt install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
cat <

Initialize Kubernetes cluster

Initialize the Kubernetes control plane and configure the cluster for Airflow deployment.

sudo kubeadm init --pod-network-cidr=10.244.0.0/16
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Install Calico CNI network plugin

Install Calico for pod networking and network policy enforcement in the Kubernetes cluster.

kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.28.1/manifests/tigera-operator.yaml
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.28.1/manifests/custom-resources.yaml
kubectl taint nodes --all node-role.kubernetes.io/control-plane-

Install PostgreSQL for Airflow metadata

Deploy PostgreSQL as the Airflow metadata database backend with persistent storage.

apiVersion: v1
kind: Secret
metadata:
  name: postgres-secret
  namespace: airflow
type: Opaque
data:
  postgres-password: YWlyZmxvd3Bhc3N3b3Jk  # airflowpassword base64 encoded
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-pvc
  namespace: airflow
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres
  namespace: airflow
spec:
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:17
        env:
        - name: POSTGRES_DB
          value: "airflow"
        - name: POSTGRES_USER
          value: "airflow"
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: postgres-secret
              key: postgres-password
        ports:
        - containerPort: 5432
        volumeMounts:
        - name: postgres-storage
          mountPath: /var/lib/postgresql/data
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
      volumes:
      - name: postgres-storage
        persistentVolumeClaim:
          claimName: postgres-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: postgres-service
  namespace: airflow
spec:
  selector:
    app: postgres
  ports:
  - port: 5432
    targetPort: 5432
  type: ClusterIP

Create Airflow namespace and RBAC

Create the airflow namespace and configure role-based access control for Airflow components.

kubectl create namespace airflow
apiVersion: v1
kind: ServiceAccount
metadata:
  name: airflow
  namespace: airflow
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: airflow-cluster-role
rules:
  • apiGroups: [""]
resources: ["pods", "pods/log", "pods/exec"] verbs: ["create", "get", "list", "watch", "delete", "patch"]
  • apiGroups: [""]
resources: ["secrets", "configmaps"] verbs: ["get", "list"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: airflow-cluster-role-binding roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: airflow-cluster-role subjects:
  • kind: ServiceAccount
name: airflow namespace: airflow

Deploy PostgreSQL and RBAC configuration

Apply the PostgreSQL deployment and RBAC configuration to the Kubernetes cluster.

kubectl apply -f airflow-postgres.yaml
kubectl apply -f airflow-rbac.yaml
kubectl wait --for=condition=ready pod -l app=postgres -n airflow --timeout=300s

Create Airflow configuration

Configure Airflow to use Kubernetes Executor with the PostgreSQL backend and proper resource limits.

apiVersion: v1
kind: ConfigMap
metadata:
  name: airflow-config
  namespace: airflow
data:
  airflow.cfg: |
    [core]
    executor = KubernetesExecutor
    sql_alchemy_conn = postgresql+psycopg2://airflow:airflowpassword@postgres-service.airflow.svc.cluster.local:5432/airflow
    load_examples = False
    dags_are_paused_at_creation = False
    parallelism = 64
    dag_concurrency = 32
    max_active_runs_per_dag = 16
    
    [webserver]
    base_url = http://localhost:8080
    web_server_port = 8080
    workers = 4
    worker_refresh_batch_size = 1
    worker_refresh_interval = 6000
    
    [scheduler]
    dag_dir_list_interval = 60
    catchup_by_default = False
    max_tis_per_query = 512
    
    [kubernetes]
    namespace = airflow
    airflow_configmap = airflow-config
    worker_container_repository = apache/airflow
    worker_container_tag = 2.10.2-python3.11
    worker_service_account_name = airflow
    delete_worker_pods = True
    delete_worker_pods_on_failure = False
    worker_pods_creation_batch_size = 1
    worker_container_image_pull_policy = IfNotPresent
    
    [kubernetes_pod_template]
    pod_template_file = /opt/airflow/pod_template.yaml

Create Airflow pod template

Define the pod template for worker pods with resource limits and security context.

apiVersion: v1
kind: ConfigMap
metadata:
  name: airflow-pod-template
  namespace: airflow
data:
  pod_template.yaml: |
    apiVersion: v1
    kind: Pod
    metadata:
      name: airflow-worker-template
      namespace: airflow
    spec:
      serviceAccountName: airflow
      restartPolicy: Never
      containers:
      - name: base
        image: apache/airflow:2.10.2-python3.11
        env:
        - name: AIRFLOW__CORE__EXECUTOR
          value: "LocalExecutor"
        - name: AIRFLOW__DATABASE__SQL_ALCHEMY_CONN
          value: "postgresql+psycopg2://airflow:airflowpassword@postgres-service.airflow.svc.cluster.local:5432/airflow"
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
        securityContext:
          runAsUser: 50000
          runAsGroup: 50000

Deploy Airflow webserver

Deploy the Airflow webserver with the Kubernetes executor configuration.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: airflow-webserver
  namespace: airflow
spec:
  replicas: 1
  selector:
    matchLabels:
      app: airflow-webserver
  template:
    metadata:
      labels:
        app: airflow-webserver
    spec:
      serviceAccountName: airflow
      initContainers:
      - name: airflow-init
        image: apache/airflow:2.10.2-python3.11
        command: ["airflow", "db", "init"]
        env:
        - name: AIRFLOW__DATABASE__SQL_ALCHEMY_CONN
          value: "postgresql+psycopg2://airflow:airflowpassword@postgres-service.airflow.svc.cluster.local:5432/airflow"
        volumeMounts:
        - name: airflow-config
          mountPath: /opt/airflow/airflow.cfg
          subPath: airflow.cfg
      containers:
      - name: airflow-webserver
        image: apache/airflow:2.10.2-python3.11
        command: ["airflow", "webserver"]
        ports:
        - containerPort: 8080
        env:
        - name: AIRFLOW__DATABASE__SQL_ALCHEMY_CONN
          value: "postgresql+psycopg2://airflow:airflowpassword@postgres-service.airflow.svc.cluster.local:5432/airflow"
        volumeMounts:
        - name: airflow-config
          mountPath: /opt/airflow/airflow.cfg
          subPath: airflow.cfg
        - name: pod-template
          mountPath: /opt/airflow/pod_template.yaml
          subPath: pod_template.yaml
        resources:
          requests:
            memory: "1Gi"
            cpu: "500m"
          limits:
            memory: "4Gi"
            cpu: "2000m"
      volumes:
      - name: airflow-config
        configMap:
          name: airflow-config
      - name: pod-template
        configMap:
          name: airflow-pod-template
---
apiVersion: v1
kind: Service
metadata:
  name: airflow-webserver-service
  namespace: airflow
spec:
  selector:
    app: airflow-webserver
  ports:
  - port: 8080
    targetPort: 8080
  type: LoadBalancer

Deploy Airflow scheduler

Deploy the Airflow scheduler that will create Kubernetes pods for task execution.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: airflow-scheduler
  namespace: airflow
spec:
  replicas: 1
  selector:
    matchLabels:
      app: airflow-scheduler
  template:
    metadata:
      labels:
        app: airflow-scheduler
    spec:
      serviceAccountName: airflow
      containers:
      - name: airflow-scheduler
        image: apache/airflow:2.10.2-python3.11
        command: ["airflow", "scheduler"]
        env:
        - name: AIRFLOW__DATABASE__SQL_ALCHEMY_CONN
          value: "postgresql+psycopg2://airflow:airflowpassword@postgres-service.airflow.svc.cluster.local:5432/airflow"
        volumeMounts:
        - name: airflow-config
          mountPath: /opt/airflow/airflow.cfg
          subPath: airflow.cfg
        - name: pod-template
          mountPath: /opt/airflow/pod_template.yaml
          subPath: pod_template.yaml
        resources:
          requests:
            memory: "1Gi"
            cpu: "500m"
          limits:
            memory: "4Gi"
            cpu: "2000m"
      volumes:
      - name: airflow-config
        configMap:
          name: airflow-config
      - name: pod-template
        configMap:
          name: airflow-pod-template

Deploy all Airflow components

Apply all Airflow configurations and wait for the components to be ready.

kubectl apply -f airflow-config.yaml
kubectl apply -f pod-template.yaml
kubectl apply -f airflow-webserver.yaml
kubectl apply -f airflow-scheduler.yaml
kubectl wait --for=condition=ready pod -l app=airflow-webserver -n airflow --timeout=600s
kubectl wait --for=condition=ready pod -l app=airflow-scheduler -n airflow --timeout=600s

Configure horizontal pod autoscaler

Set up horizontal pod autoscaling for the Airflow webserver based on CPU and memory usage.

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: airflow-webserver-hpa
  namespace: airflow
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: airflow-webserver
  minReplicas: 1
  maxReplicas: 5
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
kubectl apply -f airflow-hpa.yaml

Install Prometheus for monitoring

Deploy Prometheus to monitor Airflow metrics and Kubernetes cluster performance.

kubectl create namespace monitoring
kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/main/bundle.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: airflow-metrics
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: airflow-webserver
  endpoints:
  - port: web
    path: /admin/metrics
    interval: 30s
  namespaceSelector:
    matchNames:
    - airflow

Create admin user

Create an admin user for accessing the Airflow web interface.

kubectl exec -it deployment/airflow-webserver -n airflow -- airflow users create \
    --username admin \
    --firstname Admin \
    --lastname User \
    --role Admin \
    --email admin@example.com \
    --password admin123

Verify your setup

Check that all Airflow components are running and the web interface is accessible.

kubectl get pods -n airflow
kubectl get services -n airflow
kubectl logs deployment/airflow-scheduler -n airflow --tail=50
kubectl port-forward service/airflow-webserver-service 8080:8080 -n airflow

Open your browser to http://localhost:8080 and log in with username admin and password admin123. You can find more details on configuring PostgreSQL streaming replication in our PostgreSQL high availability tutorial.

Configure monitoring and alerting

Enable Airflow metrics endpoint

Configure Airflow to expose metrics for Prometheus scraping.

kubectl patch configmap airflow-config -n airflow --type merge -p '{
  "data": {
    "airflow.cfg": "[core]\nexecutor = KubernetesExecutor\nsql_alchemy_conn = postgresql+psycopg2://airflow:airflowpassword@postgres-service.airflow.svc.cluster.local:5432/airflow\nload_examples = False\ndags_are_paused_at_creation = False\nparallelism = 64\ndag_concurrency = 32\nmax_active_runs_per_dag = 16\n\n[webserver]\nbase_url = http://localhost:8080\nweb_server_port = 8080\nworkers = 4\nworker_refresh_batch_size = 1\nworker_refresh_interval = 6000\nexpose_config = True\n\n[scheduler]\ndag_dir_list_interval = 60\ncatchup_by_default = False\nmax_tis_per_query = 512\n\n[metrics]\nstatsd_on = True\nstatsd_host = localhost\nstatsd_port = 8125\nstatsd_prefix = airflow\n\n[kubernetes]\nnamespace = airflow\nairflow_configmap = airflow-config\nworker_container_repository = apache/airflow\nworker_container_tag = 2.10.2-python3.11\nworker_service_account_name = airflow\ndelete_worker_pods = True\ndelete_worker_pods_on_failure = False\nworker_pods_creation_batch_size = 1\nworker_container_image_pull_policy = IfNotPresent\n\n[kubernetes_pod_template]\npod_template_file = /opt/airflow/pod_template.yaml"
  }
}'

For comprehensive monitoring setup, refer to our Airflow Prometheus monitoring tutorial and Prometheus and Grafana monitoring stack guide.

Common issues

SymptomCauseFix
Pods stuck in Pending stateInsufficient cluster resourceskubectl describe node to check resources, scale nodes
Worker pods fail to startRBAC permissions missingVerify service account and cluster role binding exist
Database connection errorsPostgreSQL not readykubectl logs deployment/postgres -n airflow to check database
Scheduler not creating worker podsPod template configuration errorCheck pod template syntax with kubectl apply --dry-run=client
High memory usage on workersResource limits too highAdjust memory requests/limits in pod template
Tasks failing with permission errorsSecurity context restrictionsReview and adjust runAsUser in pod template

Next steps

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle managed devops services for businesses that depend on uptime. From initial setup to ongoing operations.