Configure Spark on Kubernetes with cluster autoscaling for dynamic workloads

Advanced 45 min Jun 05, 2026 191 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Deploy Apache Spark 3.5 on Kubernetes with automatic cluster scaling, dynamic resource allocation, and comprehensive monitoring for production data processing workloads.

Prerequisites

  • Kubernetes cluster v1.26+
  • kubectl with admin access
  • Helm 3.8+
  • 8GB RAM per node minimum

What this solves

This tutorial helps you deploy Apache Spark 3.5 on Kubernetes with automatic scaling capabilities for data processing workloads. You'll configure cluster autoscaling to handle varying computational demands, implement dynamic resource allocation for Spark executors, and set up comprehensive monitoring with Prometheus and Grafana.

Prerequisites

  • Kubernetes cluster (v1.26+) with at least 3 worker nodes
  • kubectl configured with cluster admin access
  • Helm 3.8+ installed
  • At least 8GB RAM and 4 CPU cores per node

Step-by-step installation

Update system packages

Start by updating your system packages to ensure you have the latest versions.

sudo apt update && sudo apt upgrade -y
sudo apt install -y curl wget unzip
sudo dnf update -y
sudo dnf install -y curl wget unzip

Install Kubernetes cluster components

Install kubectl and verify your cluster connection before proceeding with Spark deployment.

curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
kubectl version --client
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
kubectl version --client

Install Helm package manager

Helm will manage the Spark Operator and other Kubernetes components required for our setup.

curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
helm version
helm repo add spark-operator https://googlecloudplatform.github.io/spark-on-k8s-operator
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

Create dedicated namespace

Create a dedicated namespace for Spark workloads to isolate resources and apply security policies.

kubectl create namespace spark-system
kubectl create namespace spark-jobs
kubectl create namespace monitoring

Configure cluster autoscaler

Deploy the Kubernetes cluster autoscaler to automatically scale nodes based on resource demands.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
  labels:
    app: cluster-autoscaler
spec:
  selector:
    matchLabels:
      app: cluster-autoscaler
  replicas: 1
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      priorityClassName: system-cluster-critical
      securityContext:
        runAsNonRoot: true
        runAsUser: 65534
        fsGroup: 65534
      serviceAccountName: cluster-autoscaler
      containers:
      - image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.28.0
        name: cluster-autoscaler
        resources:
          limits:
            cpu: 100m
            memory: 600Mi
          requests:
            cpu: 100m
            memory: 600Mi
        command:
        - ./cluster-autoscaler
        - --v=4
        - --stderrthreshold=info
        - --cloud-provider=aws
        - --skip-nodes-with-local-storage=false
        - --expander=least-waste
        - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/spark-cluster
        - --balance-similar-node-groups
        - --skip-nodes-with-system-pods=false
        - --scale-down-enabled=true
        - --scale-down-delay-after-add=10m
        - --scale-down-unneeded-time=10m
        env:
        - name: AWS_REGION
          value: us-west-2
        volumeMounts:
        - name: ssl-certs
          mountPath: /etc/ssl/certs/ca-certificates.crt
          readOnly: true
        imagePullPolicy: "Always"
      volumes:
      - name: ssl-certs
        hostPath:
          path: "/etc/ssl/certs/ca-bundle.crt"
      nodeSelector:
        kubernetes.io/os: linux

Create cluster autoscaler service account

Configure RBAC permissions for the cluster autoscaler to manage node groups.

apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    k8s-addon: cluster-autoscaler.addons.k8s.io
    k8s-app: cluster-autoscaler
  name: cluster-autoscaler
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cluster-autoscaler
  labels:
    k8s-addon: cluster-autoscaler.addons.k8s.io
    k8s-app: cluster-autoscaler
rules:
  • apiGroups: [""]
resources: ["events", "endpoints"] verbs: ["create", "patch"]
  • apiGroups: [""]
resources: ["pods/eviction"] verbs: ["create"]
  • apiGroups: [""]
resources: ["pods/status"] verbs: ["update"]
  • apiGroups: [""]
resources: ["endpoints"] resourceNames: ["cluster-autoscaler"] verbs: ["get", "update"]
  • apiGroups: [""]
resources: ["nodes"] verbs: ["watch", "list", "get", "update"]
  • apiGroups: [""]
resources: ["pods", "services", "replicationcontrollers", "persistentvolumeclaims", "persistentvolumes"] verbs: ["watch", "list", "get"]
  • apiGroups: ["extensions"]
resources: ["replicasets", "daemonsets"] verbs: ["watch", "list", "get"]
  • apiGroups: ["policy"]
resources: ["poddisruptionbudgets"] verbs: ["watch", "list"]
  • apiGroups: ["apps"]
resources: ["statefulsets", "replicasets", "daemonsets"] verbs: ["watch", "list", "get"]
  • apiGroups: ["storage.k8s.io"]
resources: ["storageclasses", "csinodes"] verbs: ["watch", "list", "get"]
  • apiGroups: ["batch", "extensions"]
resources: ["jobs"] verbs: ["get", "list", "watch", "patch"]
  • apiGroups: ["coordination.k8s.io"]
resources: ["leases"] verbs: ["create"]
  • apiGroups: ["coordination.k8s.io"]
resourceNames: ["cluster-autoscaler"] resources: ["leases"] verbs: ["get", "update"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: cluster-autoscaler labels: k8s-addon: cluster-autoscaler.addons.k8s.io k8s-app: cluster-autoscaler roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-autoscaler subjects:
  • kind: ServiceAccount
name: cluster-autoscaler namespace: kube-system

Deploy cluster autoscaler

Apply the cluster autoscaler configuration to enable automatic node scaling.

kubectl apply -f /tmp/cluster-autoscaler-rbac.yaml
kubectl apply -f /tmp/cluster-autoscaler.yaml
kubectl -n kube-system logs -l app=cluster-autoscaler

Install Spark Operator

Deploy the Spark Operator using Helm to manage Spark applications on Kubernetes.

helm install spark-operator spark-operator/spark-operator \
  --namespace spark-system \
  --set webhook.enable=true \
  --set resources.limits.cpu=200m \
  --set resources.limits.memory=512Mi \
  --set sparkJobNamespace=spark-jobs \
  --set enableBatchScheduler=true \
  --set volcano.enabled=false \
  --wait

Configure Spark service account

Create service accounts and RBAC permissions for Spark driver and executor pods.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: spark-driver
  namespace: spark-jobs
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: spark-jobs
  name: spark-role
rules:
  • apiGroups: [""]
resources: ["pods"] verbs: ["*"]
  • apiGroups: [""]
resources: ["services"] verbs: ["*"]
  • apiGroups: [""]
resources: ["configmaps"] verbs: ["*"]
  • apiGroups: [""]
resources: ["persistentvolumeclaims"] verbs: ["*"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: spark-role-binding namespace: spark-jobs subjects:
  • kind: ServiceAccount
name: spark-driver namespace: spark-jobs roleRef: kind: Role name: spark-role apiGroup: rbac.authorization.k8s.io
kubectl apply -f /tmp/spark-rbac.yaml

Create Spark configuration

Configure Spark with dynamic allocation and Kubernetes-specific settings for optimal performance.

apiVersion: v1
kind: ConfigMap
metadata:
  name: spark-config
  namespace: spark-jobs
data:
  spark-defaults.conf: |
    # Dynamic allocation settings
    spark.dynamicAllocation.enabled=true
    spark.dynamicAllocation.shuffleTracking.enabled=true
    spark.dynamicAllocation.minExecutors=1
    spark.dynamicAllocation.maxExecutors=50
    spark.dynamicAllocation.initialExecutors=2
    spark.dynamicAllocation.executorIdleTimeout=30s
    spark.dynamicAllocation.cachedExecutorIdleTimeout=60s
    spark.dynamicAllocation.schedulerBacklogTimeout=1s
    
    # Kubernetes settings
    spark.kubernetes.executor.deleteOnTermination=true
    spark.kubernetes.driver.deleteOnTermination=true
    spark.kubernetes.executor.podNamePrefix=spark-exec
    
    # Performance tuning
    spark.sql.adaptive.enabled=true
    spark.sql.adaptive.coalescePartitions.enabled=true
    spark.sql.adaptive.skewJoin.enabled=true
    spark.serializer=org.apache.spark.serializer.KryoSerializer
    
    # Resource management
    spark.kubernetes.executor.request.cores=0.5
    spark.kubernetes.executor.limit.cores=2
    spark.executor.memory=2g
    spark.driver.memory=1g
    
    # Monitoring
    spark.ui.prometheus.enabled=true
    spark.sql.streaming.metricsEnabled=true
    spark.eventLog.enabled=true
    spark.eventLog.dir=/tmp/spark-events
kubectl apply -f /tmp/spark-config.yaml

Deploy Horizontal Pod Autoscaler

Configure HPA to automatically scale Spark application pods based on CPU and memory usage.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: spark-driver-hpa
  namespace: spark-jobs
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: spark-driver
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 2
        periodSeconds: 60
kubectl apply -f /tmp/spark-hpa.yaml

Install Prometheus for monitoring

Deploy Prometheus to collect metrics from Spark applications and Kubernetes cluster.

helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --set grafana.enabled=true \
  --set prometheus.prometheusSpec.retention=30d \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi \
  --set grafana.persistence.enabled=true \
  --set grafana.persistence.size=10Gi \
  --set alertmanager.persistence.enabled=true \
  --wait

Configure Spark metrics collection

Set up Prometheus ServiceMonitor to collect metrics from Spark applications.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: spark-metrics
  namespace: monitoring
  labels:
    app: spark
    release: prometheus
spec:
  namespaceSelector:
    matchNames:
    - spark-jobs
  selector:
    matchLabels:
      spark-role: driver
  endpoints:
  - port: metrics
    path: /metrics
    interval: 30s
    scrapeTimeout: 10s
---
apiVersion: v1
kind: Service
metadata:
  name: spark-driver-metrics
  namespace: spark-jobs
  labels:
    spark-role: driver
spec:
  ports:
  - name: metrics
    port: 4040
    targetPort: 4040
  selector:
    spark-role: driver
kubectl apply -f /tmp/spark-servicemonitor.yaml

Create sample Spark application

Deploy a test Spark application to verify the dynamic scaling configuration.

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: spark-jobs
spec:
  type: Scala
  mode: cluster
  image: apache/spark:3.5.0
  imagePullPolicy: IfNotPresent
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.0.jar
  arguments:
  - "1000"
  sparkVersion: 3.5.0
  restartPolicy:
    type: Never
  driver:
    serviceAccount: spark-driver
    cores: 1
    coreLimit: 1200m
    memory: 1g
    labels:
      spark-role: driver
      version: 3.5.0
    annotations:
      prometheus.io/scrape: "true"
      prometheus.io/port: "4040"
      prometheus.io/path: "/metrics"
  executor:
    cores: 1
    instances: 2
    memory: 1g
    labels:
      spark-role: executor
      version: 3.5.0
  dynamicAllocation:
    enabled: true
    initialExecutors: 2
    minExecutors: 1
    maxExecutors: 10
    targetExecutors: 5
  monitoring:
    exposeDriverMetrics: true
    exposeExecutorMetrics: true
    prometheus:
      jmxExporterJar: "/opt/spark/jars/jmx_prometheus_javaagent-0.17.2.jar"
      port: 8090
kubectl apply -f /tmp/spark-pi-app.yaml

Configure Grafana dashboards

Import pre-built Grafana dashboards for Spark and Kubernetes monitoring.

# Get Grafana admin password
kubectl get secret -n monitoring prometheus-grafana -o jsonpath="{.data.admin-password}" | base64 --decode
echo

Port forward to access Grafana

kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80 & echo "Grafana accessible at http://localhost:3000"

Configure dynamic allocation policies

Create resource quotas

Set namespace-level resource quotas to prevent resource exhaustion during scaling events.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: spark-quota
  namespace: spark-jobs
spec:
  hard:
    requests.cpu: "100"
    requests.memory: 200Gi
    limits.cpu: "200"
    limits.memory: 400Gi
    persistentvolumeclaims: "10"
    pods: "100"
---
apiVersion: v1
kind: LimitRange
metadata:
  name: spark-limits
  namespace: spark-jobs
spec:
  limits:
  - default:
      cpu: 2
      memory: 4Gi
    defaultRequest:
      cpu: 100m
      memory: 512Mi
    type: Container
  - max:
      cpu: 8
      memory: 16Gi
    min:
      cpu: 100m
      memory: 128Mi
    type: Container
kubectl apply -f /tmp/resource-quota.yaml

Configure pod disruption budgets

Set pod disruption budgets to maintain application availability during node scaling events.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: spark-driver-pdb
  namespace: spark-jobs
spec:
  minAvailable: 1
  selector:
    matchLabels:
      spark-role: driver
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: spark-executor-pdb
  namespace: spark-jobs
spec:
  maxUnavailable: 50%
  selector:
    matchLabels:
      spark-role: executor
kubectl apply -f /tmp/pod-disruption-budget.yaml

Production optimization and security

Enable network policies

Implement network segmentation for enhanced security between Spark components.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: spark-network-policy
  namespace: spark-jobs
spec:
  podSelector:
    matchLabels:
      app: spark
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: spark-jobs
    - namespaceSelector:
        matchLabels:
          name: monitoring
    ports:
    - protocol: TCP
      port: 4040
    - protocol: TCP
      port: 7077
    - protocol: TCP
      port: 8080
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          name: spark-jobs
  - to: []
    ports:
    - protocol: TCP
      port: 53
    - protocol: UDP
      port: 53
  - to: []
    ports:
    - protocol: TCP
      port: 443
    - protocol: TCP
      port: 80
kubectl apply -f /tmp/network-policy.yaml

Configure security contexts

Apply security contexts and pod security standards to harden Spark deployments.

apiVersion: v1
kind: Namespace
metadata:
  name: spark-jobs
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted
---
apiVersion: v1
kind: SecurityContextConstraints
metadata:
  name: spark-scc
allowHostDirVolumePlugin: false
allowHostIPC: false
allowHostNetwork: false
allowHostPID: false
allowPrivilegedContainer: false
allowedCapabilities: []
defaultAddCapabilities: []
requiredDropCapabilities:
  • ALL
fsGroup: type: RunAsAny runAsUser: type: MustRunAsNonRoot seLinuxContext: type: MustRunAs supplementalGroups: type: RunAsAny volumes:
  • configMap
  • downwardAPI
  • emptyDir
  • persistentVolumeClaim
  • projected
  • secret
kubectl apply -f /tmp/pod-security-policy.yaml

Verify your setup

Check that all components are running correctly and scaling policies are active.

# Verify cluster autoscaler
kubectl -n kube-system get pods -l app=cluster-autoscaler
kubectl -n kube-system logs -l app=cluster-autoscaler --tail=50

Check Spark Operator

kubectl -n spark-system get pods kubectl -n spark-system logs -l app.kubernetes.io/name=spark-operator

Verify Spark application

kubectl -n spark-jobs get sparkapplications kubectl -n spark-jobs describe sparkapplication spark-pi

Check HPA status

kubectl -n spark-jobs get hpa kubectl top nodes kubectl top pods -n spark-jobs

Verify monitoring

kubectl -n monitoring get pods curl -k http://localhost:3000

Common issues

SymptomCauseFix
Spark applications stuck in pending Insufficient cluster resources Check node capacity: kubectl describe nodes
Executors not scaling dynamically Dynamic allocation disabled Verify config: kubectl -n spark-jobs get configmap spark-config -o yaml
Cluster autoscaler not adding nodes Missing node group tags Check AWS ASG tags include cluster-autoscaler annotations
Metrics not appearing in Grafana ServiceMonitor misconfigured Check targets in Prometheus UI: kubectl port-forward -n monitoring svc/prometheus-operated 9090
Pod disruption during scaling PodDisruptionBudget too restrictive Review PDB settings: kubectl -n spark-jobs get pdb

Next steps

Running this in production?

Want this handled for you? Running Spark at scale adds a second layer of work: capacity planning, failover drills, cost control, and on-call. See how we run infrastructure like this for European teams.

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle managed devops services for businesses that depend on uptime. From initial setup to ongoing operations.