Configure Kubernetes cluster autoscaler with mixed instance types for cost optimization

Advanced 45 min Apr 26, 2026 19 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Set up Kubernetes cluster autoscaler 1.30 with mixed instance types and spot instances to automatically scale nodes based on demand while minimizing infrastructure costs through intelligent instance selection and workload optimization.

Prerequisites

  • Kubernetes cluster 1.28+
  • kubectl with admin access
  • Cloud provider with autoscaling groups
  • Prometheus for monitoring (optional)

What this solves

The Kubernetes cluster autoscaler automatically adjusts your cluster size by adding or removing nodes based on pod scheduling demands. By configuring mixed instance types with spot instances, you can reduce infrastructure costs by up to 70% while maintaining application availability. This tutorial shows you how to deploy cluster autoscaler 1.30 with cost optimization policies for production workloads.

Prerequisites and preparation

Verify cluster requirements

Ensure your Kubernetes cluster is running version 1.28 or later and you have admin access.

kubectl version --short
kubectl auth can-i create nodes --all-namespaces

Install required tools

Install Helm 3 for managing the cluster autoscaler deployment.

curl https://baltocdn.com/helm/signing.asc | gpg --dearmor | sudo tee /usr/share/keyrings/helm.gpg > /dev/null
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/helm.gpg] https://baltocdn.com/helm/stable/debian/ all main" | sudo tee /etc/apt/sources.list.d/helm-stable-debian.list
sudo apt update
sudo apt install -y helm
curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
chmod 700 get_helm.sh
./get_helm.sh

Step-by-step configuration

Create autoscaler namespace

Create a dedicated namespace for the cluster autoscaler components.

kubectl create namespace cluster-autoscaler

Configure RBAC permissions

Create the service account and RBAC rules required for the cluster autoscaler to manage nodes.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: cluster-autoscaler
  namespace: cluster-autoscaler
  labels:
    k8s-addon: cluster-autoscaler.addons.k8s.io
    k8s-app: cluster-autoscaler
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cluster-autoscaler
  labels:
    k8s-addon: cluster-autoscaler.addons.k8s.io
    k8s-app: cluster-autoscaler
rules:
  - apiGroups: [""]
    resources: ["events", "endpoints"]
    verbs: ["create", "patch"]
  - apiGroups: [""]
    resources: ["pods/eviction"]
    verbs: ["create"]
  - apiGroups: [""]
    resources: ["pods/status"]
    verbs: ["update"]
  - apiGroups: [""]
    resources: ["endpoints"]
    resourceNames: ["cluster-autoscaler"]
    verbs: ["get", "update"]
  - apiGroups: [""]
    resources: ["nodes"]
    verbs: ["watch", "list", "get", "update"]
  - apiGroups: [""]
    resources: ["namespaces", "pods", "services", "replicationcontrollers", "persistentvolumeclaims", "persistentvolumes"]
    verbs: ["watch", "list", "get"]
  - apiGroups: ["extensions"]
    resources: ["replicasets", "daemonsets"]
    verbs: ["watch", "list", "get"]
  - apiGroups: ["policy"]
    resources: ["poddisruptionbudgets"]
    verbs: ["watch", "list"]
  - apiGroups: ["apps"]
    resources: ["statefulsets", "replicasets", "daemonsets"]
    verbs: ["watch", "list", "get"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses", "csinodes", "csidrivers", "csistoragecapacities"]
    verbs: ["watch", "list", "get"]
  - apiGroups: ["batch", "extensions"]
    resources: ["jobs"]
    verbs: ["get", "list", "watch", "patch"]
  - apiGroups: ["coordination.k8s.io"]
    resources: ["leases"]
    verbs: ["create"]
  - apiGroups: ["coordination.k8s.io"]
    resourceNames: ["cluster-autoscaler"]
    resources: ["leases"]
    verbs: ["get", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: cluster-autoscaler
  namespace: cluster-autoscaler
  labels:
    k8s-addon: cluster-autoscaler.addons.k8s.io
    k8s-app: cluster-autoscaler
rules:
  - apiGroups: [""]
    resources: ["configmaps"]
    verbs: ["create","list","watch"]
  - apiGroups: [""]
    resources: ["configmaps"]
    resourceNames: ["cluster-autoscaler-status", "cluster-autoscaler-priority-expander"]
    verbs: ["delete", "get", "update", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cluster-autoscaler
  labels:
    k8s-addon: cluster-autoscaler.addons.k8s.io
    k8s-app: cluster-autoscaler
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-autoscaler
subjects:
  - kind: ServiceAccount
    name: cluster-autoscaler
    namespace: cluster-autoscaler
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: cluster-autoscaler
  namespace: cluster-autoscaler
  labels:
    k8s-addon: cluster-autoscaler.addons.k8s.io
    k8s-app: cluster-autoscaler
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: cluster-autoscaler
subjects:
  - kind: ServiceAccount
    name: cluster-autoscaler
    namespace: cluster-autoscaler
kubectl apply -f cluster-autoscaler-rbac.yaml

Configure mixed instance node groups

Create node groups with different instance types and spot instances for cost optimization. This example configures three node groups: general purpose, compute optimized, and spot instances.

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-nodegroups
  namespace: cluster-autoscaler
data:
  nodegroups.yaml: |
    nodeGroups:
    - name: general-purpose
      minSize: 1
      maxSize: 10
      desiredCapacity: 2
      instanceTypes:
        - t3.medium
        - t3.large
        - t3.xlarge
      spotAllocationStrategy: diversified
      spotInstancePools: 3
      onDemandPercentageAboveBaseCapacity: 20
      labels:
        node.kubernetes.io/instance-type: general
        node.kubernetes.io/capacity-type: mixed
      taints: []
      tags:
        k8s.io/cluster-autoscaler/enabled: "true"
        k8s.io/cluster-autoscaler/node-template/label/workload-type: general
    - name: compute-optimized
      minSize: 0
      maxSize: 5
      desiredCapacity: 0
      instanceTypes:
        - c5.large
        - c5.xlarge
        - c5.2xlarge
      spotAllocationStrategy: capacity-optimized
      spotInstancePools: 2
      onDemandPercentageAboveBaseCapacity: 0
      labels:
        node.kubernetes.io/instance-type: compute
        node.kubernetes.io/capacity-type: spot
      taints:
        - key: workload-type
          value: compute
          effect: NoSchedule
      tags:
        k8s.io/cluster-autoscaler/enabled: "true"
        k8s.io/cluster-autoscaler/node-template/label/workload-type: compute
    - name: memory-optimized
      minSize: 0
      maxSize: 3
      desiredCapacity: 0
      instanceTypes:
        - r5.large
        - r5.xlarge
        - r5.2xlarge
      spotAllocationStrategy: capacity-optimized
      spotInstancePools: 2
      onDemandPercentageAboveBaseCapacity: 30
      labels:
        node.kubernetes.io/instance-type: memory
        node.kubernetes.io/capacity-type: mixed
      taints:
        - key: workload-type
          value: memory
          effect: NoSchedule
      tags:
        k8s.io/cluster-autoscaler/enabled: "true"
        k8s.io/cluster-autoscaler/node-template/label/workload-type: memory
kubectl apply -f mixed-nodegroups.yaml

Deploy cluster autoscaler with cost optimization

Deploy the cluster autoscaler 1.30 with configurations optimized for mixed instance types and cost savings.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: cluster-autoscaler
  labels:
    app: cluster-autoscaler
spec:
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
      annotations:
        prometheus.io/scrape: 'true'
        prometheus.io/port: '8085'
    spec:
      serviceAccountName: cluster-autoscaler
      containers:
      - image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.30.0
        name: cluster-autoscaler
        resources:
          limits:
            cpu: 100m
            memory: 600Mi
          requests:
            cpu: 100m
            memory: 600Mi
        command:
        - ./cluster-autoscaler
        - --v=4
        - --stderrthreshold=info
        - --cloud-provider=aws
        - --skip-nodes-with-local-storage=false
        - --expander=least-waste,spot-instance-mixing,priority
        - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/$(CLUSTER_NAME)
        - --balance-similar-node-groups
        - --skip-nodes-with-system-pods=false
        - --scale-down-enabled=true
        - --scale-down-delay-after-add=10m
        - --scale-down-delay-after-delete=10s
        - --scale-down-delay-after-failure=3m
        - --scale-down-unneeded-time=10m
        - --scale-down-utilization-threshold=0.5
        - --max-node-provision-time=15m
        - --max-empty-bulk-delete=10
        - --max-nodes-total=100
        - --cores-total=0:1000
        - --memory-total=0:1000Gi
        - --new-pod-scale-up-delay=0s
        - --max-bulk-soft-taint-count=10
        - --max-bulk-soft-taint-time=3s
        env:
        - name: CLUSTER_NAME
          value: YOUR_CLUSTER_NAME
        - name: AWS_REGION
          value: us-west-2
        volumeMounts:
        - name: ssl-certs
          mountPath: /etc/ssl/certs/ca-certificates.crt
          readOnly: true
        imagePullPolicy: Always
      volumes:
      - name: ssl-certs
        hostPath:
          path: /etc/ssl/certs/ca-certificates.crt
      nodeSelector:
        kubernetes.io/os: linux
kubectl apply -f cluster-autoscaler-deployment.yaml

Configure cost optimization policies

Create priority-based scaling policies that prefer cost-effective instances and implement pod disruption budgets.

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-priority-expander
  namespace: cluster-autoscaler
data:
  priorities: |
    10:
      - .spot.
    5:
      - .t3\..
    1:
      - .c5\..
      - .r5\..
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: cluster-autoscaler-pdb
  namespace: cluster-autoscaler
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-status
  namespace: cluster-autoscaler
data:
  config: |
    {
      "scale_down_gpu_utilization_threshold": 0.5,
      "scale_down_utilization_threshold": 0.5,
      "skip_nodes_with_local_storage": false,
      "skip_nodes_with_system_pods": false,
      "max_node_provision_time": "15m",
      "node_startup_timeout": "15m",
      "node_deletion_delay_timeout": "2m",
      "scan_interval": "10s",
      "expendable_pods_priority_cutoff": -10,
      "ignore_daemon_sets_utilization": false,
      "max_bulk_soft_taint_count": 10,
      "max_bulk_soft_taint_time": "3s"
    }
kubectl apply -f cost-optimization-config.yaml

Set up monitoring and alerting

Configure monitoring for the cluster autoscaler with Prometheus metrics and cost tracking. This integrates with existing Prometheus monitoring setups.

apiVersion: v1
kind: Service
metadata:
  name: cluster-autoscaler-metrics
  namespace: cluster-autoscaler
  labels:
    app: cluster-autoscaler
  annotations:
    prometheus.io/scrape: 'true'
    prometheus.io/port: '8085'
    prometheus.io/path: '/metrics'
spec:
  ports:
  - port: 8085
    targetPort: 8085
    protocol: TCP
    name: http-metrics
  selector:
    app: cluster-autoscaler
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: cluster-autoscaler
  namespace: cluster-autoscaler
  labels:
    app: cluster-autoscaler
spec:
  selector:
    matchLabels:
      app: cluster-autoscaler
  endpoints:
  - port: http-metrics
    interval: 30s
    path: /metrics
kubectl apply -f autoscaler-monitoring.yaml

Configure workload-specific scheduling

Create node selectors and tolerations for different workload types to optimize resource allocation and costs.

apiVersion: v1
kind: ConfigMap
metadata:
  name: workload-scheduling-config
  namespace: default
data:
  general-workload.yaml: |
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: general-app
    spec:
      replicas: 3
      selector:
        matchLabels:
          app: general-app
      template:
        metadata:
          labels:
            app: general-app
        spec:
          nodeSelector:
            workload-type: general
            node.kubernetes.io/capacity-type: mixed
          tolerations:
          - key: "node.kubernetes.io/capacity-type"
            operator: "Equal"
            value: "spot"
            effect: "NoSchedule"
          containers:
          - name: app
            image: nginx:latest
            resources:
              requests:
                cpu: 100m
                memory: 128Mi
              limits:
                cpu: 200m
                memory: 256Mi
  compute-workload.yaml: |
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: compute-intensive-app
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: compute-app
      template:
        metadata:
          labels:
            app: compute-app
        spec:
          nodeSelector:
            workload-type: compute
          tolerations:
          - key: "workload-type"
            operator: "Equal"
            value: "compute"
            effect: "NoSchedule"
          - key: "node.kubernetes.io/capacity-type"
            operator: "Equal"
            value: "spot"
            effect: "NoSchedule"
          containers:
          - name: app
            image: nginx:latest
            resources:
              requests:
                cpu: 500m
                memory: 512Mi
              limits:
                cpu: 1000m
                memory: 1Gi
kubectl apply -f workload-scheduling.yaml

Verify your setup

Check that the cluster autoscaler is running and monitoring your node groups correctly.

# Check autoscaler pod status
kubectl get pods -n cluster-autoscaler
kubectl logs -n cluster-autoscaler deployment/cluster-autoscaler

Verify node groups are discovered

kubectl get nodes --show-labels | grep capacity-type

Check autoscaler events

kubectl get events -n cluster-autoscaler --sort-by=.metadata.creationTimestamp

Monitor scaling metrics

kubectl top nodes kubectl describe configmap cluster-autoscaler-status -n cluster-autoscaler
Pro tip: The cluster autoscaler takes 10-15 minutes to make scaling decisions. Be patient when testing scaling behavior.

Configure advanced cost optimization

Implement predictive scaling

Configure vertical pod autoscaler integration and predictive scaling based on historical usage patterns.

apiVersion: v1
kind: ConfigMap
metadata:
  name: predictive-scaling-config
  namespace: cluster-autoscaler
data:
  config.yaml: |
    predictiveScaling:
      enabled: true
      lookbackWindow: 7d
      predictionWindow: 1h
      scalingBuffer: 0.1
      costOptimization:
        spotInstancePreference: 0.7
        instanceTypeMixing: true
        preemptionTolerance: 0.3
      scheduleBasedScaling:
        - name: business-hours
          schedule: "0 8   1-5"
          minNodes: 5
          maxNodes: 20
        - name: off-hours
          schedule: "0 18   1-5"
          minNodes: 2
          maxNodes: 10
        - name: weekend
          schedule: "0 0   0,6"
          minNodes: 1
          maxNodes: 5
kubectl apply -f predictive-scaling.yaml

Set up cost alerts and budgets

Configure alerts for cost thresholds and automated responses to prevent budget overruns.

apiVersion: v1
kind: ConfigMap
metadata:
  name: cost-monitoring-alerts
  namespace: cluster-autoscaler
data:
  alerts.yaml: |
    groups:
    - name: cluster-autoscaler-cost
      rules:
      - alert: HighClusterCost
        expr: increase(cluster_autoscaler_nodes_count[1h]) > 10
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Cluster scaling rapidly - check for cost impact"
          description: "Node count increased by {{ $value }} in the last hour"
      - alert: SpotInstanceFailureRate
        expr: rate(cluster_autoscaler_failed_scale_ups_total[5m]) > 0.1
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "High spot instance failure rate"
          description: "Spot instance provisioning failing at {{ $value }} rate"
      - alert: NodeUtilizationLow
        expr: avg(cluster_autoscaler_node_utilization) < 0.3
        for: 15m
        labels:
          severity: info
        annotations:
          summary: "Low cluster utilization - consider scaling down"
          description: "Average node utilization is {{ $value }}"
---
apiVersion: batch/v1
kind: CronJob
metadata:
  name: cost-optimization-report
  namespace: cluster-autoscaler
spec:
  schedule: "0 9   1"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: cost-reporter
            image: alpine/curl:latest
            command:
            - /bin/sh
            - -c
            - |
              echo "Weekly Cost Optimization Report"
              kubectl top nodes
              kubectl get nodes -l node.kubernetes.io/capacity-type=spot --no-headers | wc -l
              kubectl get nodes -l node.kubernetes.io/capacity-type=on-demand --no-headers | wc -l
          restartPolicy: OnFailure
kubectl apply -f cost-alerts.yaml

Test scaling behavior

Deploy a test workload to verify the autoscaler correctly provisions mixed instance types based on demand.

# Create a test deployment that will trigger scaling
kubectl create deployment test-scale --image=nginx:latest
kubectl scale deployment test-scale --replicas=20

Add resource requests to trigger node scaling

kubectl patch deployment test-scale -p '{"spec":{"template":{"spec":{"containers":[{"name":"nginx","resources":{"requests":{"cpu":"500m","memory":"1Gi"}}}]}}}}'

Watch the scaling events

watch kubectl get nodes watch kubectl get pods -o wide

Check autoscaler decisions

kubectl logs -n cluster-autoscaler deployment/cluster-autoscaler --tail=50

Clean up test deployment

kubectl delete deployment test-scale

Common issues

Symptom Cause Fix
Pods stuck in Pending state Node group max size reached or instance type unavailable Check kubectl describe pod and increase max nodes or add more instance types
Autoscaler not scaling down Utilization threshold too low or system pods preventing scale down Adjust --scale-down-utilization-threshold or use --skip-nodes-with-system-pods=false
High costs despite spot instances Falling back to on-demand due to spot unavailability Increase spotInstancePools and diversify instance types across AZs
Frequent node replacements Spot instance interruptions with poor handling Configure pod disruption budgets and increase onDemandPercentageAboveBaseCapacity
Slow scaling response Default scan interval too high Reduce scan-interval to 10s and scale-down-delay-after-add to 5m
Mixed instance selection not working Expander priority not configured correctly Verify ConfigMap cluster-autoscaler-priority-expander and use least-waste expander

Monitor and optimize costs

Set up comprehensive monitoring to track cost savings and scaling efficiency.

# Monitor cost metrics
kubectl get nodes -l node.kubernetes.io/capacity-type=spot -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.labels.node\.kubernetes\.io/instance-type}{"\n"}{end}'

Check node utilization

kubectl top nodes --sort-by=cpu

Monitor autoscaler metrics via Prometheus

curl -s http://localhost:8085/metrics | grep cluster_autoscaler

Generate cost report

kubectl get nodes -o custom-columns=NAME:.metadata.name,INSTANCE-TYPE:.metadata.labels.node\.kubernetes\.io/instance-type,CAPACITY-TYPE:.metadata.labels.node\.kubernetes\.io/capacity-type,ZONE:.metadata.labels.topology\.kubernetes\.io/zone

This setup integrates well with custom metrics autoscaling for application-specific scaling and resource quotas for better resource management.

Next steps

Running this in production?

Want this handled for you? Running cluster autoscaling at scale adds complexity: capacity planning, cost monitoring, failover testing, and 24/7 incident response when scaling decisions go wrong. See how we run infrastructure like this for European teams with automatic cost optimization and expert monitoring.

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle managed devops services for businesses that depend on uptime. From initial setup to ongoing operations.