Set up Kubernetes custom metrics autoscaling with Prometheus adapter for application-specific scaling

Advanced 45 min Apr 25, 2026 13 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Configure Prometheus adapter to expose custom application metrics to Kubernetes Horizontal Pod Autoscaler for intelligent scaling based on business metrics like queue depth, response time, and user load instead of basic CPU/memory usage.

Prerequisites

  • Running Kubernetes cluster with kubectl access
  • Helm 3 installed
  • Prometheus server deployed
  • Applications exposing metrics

What this solves

Kubernetes Horizontal Pod Autoscaler (HPA) by default only scales based on CPU and memory usage, which often doesn't reflect your application's actual load patterns. This tutorial shows you how to configure Prometheus adapter to expose custom application metrics to HPA, enabling intelligent scaling based on business-relevant metrics like queue depth, active connections, or request latency.

Prerequisites

  • A running Kubernetes cluster with kubectl access
  • Helm 3 installed on your system
  • Prometheus server deployed in your cluster
  • Applications already exposing metrics to Prometheus

Step-by-step configuration

Install the Prometheus adapter with Helm

The Prometheus adapter translates Prometheus metrics into the Kubernetes custom metrics API format that HPA can consume.

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

Create the Prometheus adapter configuration

This configuration defines which Prometheus metrics to expose and how to format them for HPA consumption.

prometheus:
  url: http://prometheus-server.monitoring.svc.cluster.local
  port: 80

rules:
  custom:
  - seriesQuery: 'http_requests_per_second{namespace!="",pod!=""}'
    resources:
      overrides:
        namespace:
          resource: namespace
        pod:
          resource: pod
    name:
      matches: "^(.*)"
      as: "http_requests_per_second"
    metricsQuery: 'sum(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)'
  - seriesQuery: 'nginx_ingress_controller_requests{namespace!="",ingress!=""}'
    resources:
      overrides:
        namespace:
          resource: namespace
        ingress:
          resource: ingress
    name:
      matches: "^(.*)"
      as: "nginx_requests_per_second"
    metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'
  - seriesQuery: 'rabbitmq_queue_messages{namespace!="",pod!=""}'
    resources:
      overrides:
        namespace:
          resource: namespace
        pod:
          resource: pod
    name:
      matches: "^(.*)"
      as: "rabbitmq_queue_depth"
    metricsQuery: 'sum(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)'

resourceRules:
  cpu:
    containerQuery: sum(rate(container_cpu_usage_seconds_total{<<.LabelMatchers>>,container!="POD",container!="",pod!=""}[3m])) by (<<.GroupBy>>)
    nodeQuery: sum(rate(container_cpu_usage_seconds_total{<<.LabelMatchers>>,id='/'}[3m])) by (<<.GroupBy>>)
    resources:
      overrides:
        instance:
          resource: node
        namespace:
          resource: namespace
        pod:
          resource: pod
    containerLabel: container
  memory:
    containerQuery: sum(container_memory_working_set_bytes{<<.LabelMatchers>>,container!="POD",container!="",pod!=""}) by (<<.GroupBy>>)
    nodeQuery: sum(container_memory_working_set_bytes{<<.LabelMatchers>>,id='/'}) by (<<.GroupBy>>)
    resources:
      overrides:
        instance:
          resource: node
        namespace:
          resource: namespace
        pod:
          resource: pod
    containerLabel: container

Deploy the Prometheus adapter

Install the adapter with your custom configuration to enable metric translation.

helm install prometheus-adapter prometheus-community/prometheus-adapter \
  -n monitoring --create-namespace \
  -f prometheus-adapter-values.yaml

Create a sample application with custom metrics

Deploy a test application that exposes metrics Prometheus can scrape for HPA scaling decisions.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sample-app
  namespace: default
spec:
  replicas: 2
  selector:
    matchLabels:
      app: sample-app
  template:
    metadata:
      labels:
        app: sample-app
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
        prometheus.io/path: "/metrics"
    spec:
      containers:
      - name: sample-app
        image: nginx:alpine
        ports:
        - containerPort: 80
        - containerPort: 8080
        volumeMounts:
        - name: metrics-config
          mountPath: /etc/nginx/conf.d
        command: ["/bin/sh"]
        args:
          - -c
          - |
            # Start nginx in background
            nginx -g "daemon off;" &
            # Simple metrics server
            while true; do
              REQUESTS=$(shuf -i 10-100 -n 1)
              echo "# TYPE http_requests_per_second gauge" > /tmp/metrics
              echo "http_requests_per_second{pod=\"$HOSTNAME\",namespace=\"default\"} $REQUESTS" >> /tmp/metrics
              nc -l -p 8080 -e sh -c 'echo -e "HTTP/1.1 200 OK\n\nContent-Type: text/plain\n\n$(cat /tmp/metrics)"' &
              sleep 15
            done
      volumes:
      - name: metrics-config
        configMap:
          name: nginx-config
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: nginx-config
  namespace: default
data:
  default.conf: |
    server {
        listen 80;
        location / {
            return 200 "Sample App Running";
            add_header Content-Type text/plain;
        }
    }
---
apiVersion: v1
kind: Service
metadata:
  name: sample-app
  namespace: default
  labels:
    app: sample-app
spec:
  ports:
  - port: 80
    targetPort: 80
    name: http
  - port: 8080
    targetPort: 8080
    name: metrics
  selector:
    app: sample-app
kubectl apply -f sample-app.yaml

Configure Prometheus to scrape the application metrics

Ensure Prometheus discovers and scrapes your application's custom metrics through service annotations.

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-server
  namespace: monitoring
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      evaluation_interval: 15s

    scrape_configs:
      - job_name: 'kubernetes-pods'
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
            action: replace
            regex: ([^:]+)(?::\d+)?;(\d+)
            replacement: $1:$2
            target_label: __address__
          - action: labelmap
            regex: __meta_kubernetes_pod_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: kubernetes_namespace
          - source_labels: [__meta_kubernetes_pod_name]
            action: replace
            target_label: kubernetes_pod_name
# If using Helm-deployed Prometheus, update the values
helm upgrade prometheus prometheus-community/prometheus \
  -n monitoring \
  --set-file server.configMapOverride.prometheus\.yml=prometheus-scrape-config.yaml

Create a Horizontal Pod Autoscaler with custom metrics

Configure HPA to scale based on your custom application metrics instead of CPU or memory.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: sample-app-hpa
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: sample-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "30"
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 2
        periodSeconds: 60
      selectPolicy: Max
kubectl apply -f custom-metrics-hpa.yaml

Test autoscaling with metric injection

Generate load to trigger scaling based on your custom metrics and verify HPA responds appropriately.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: load-generator
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: load-generator
  template:
    metadata:
      labels:
        app: load-generator
    spec:
      containers:
      - name: load-generator
        image: busybox
        command: ["/bin/sh"]
        args:
          - -c
          - |
            while true; do
              for i in $(seq 1 10); do
                wget -qO- http://sample-app.default.svc.cluster.local/ &
              done
              sleep 1
            done
---
apiVersion: batch/v1
kind: Job
metadata:
  name: metric-injector
  namespace: default
spec:
  template:
    spec:
      containers:
      - name: metric-injector
        image: curlimages/curl
        command: ["/bin/sh"]
        args:
          - -c
          - |
            # Simulate high request rate by updating metrics
            for i in $(seq 1 300); do
              echo "Injecting high metric value: $(date)"
              kubectl patch deployment sample-app -p '{"spec":{"template":{"metadata":{"annotations":{"metric-injection":"'$(date +%s)'"}}}}}}' || true
              sleep 10
            done
      restartPolicy: OnFailure
kubectl apply -f load-generator.yaml

Verify your setup

Check that the Prometheus adapter is exposing your custom metrics and HPA can access them.

# Verify Prometheus adapter is running
kubectl get pods -n monitoring | grep prometheus-adapter

Check available custom metrics

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .

Verify specific metric values

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/http_requests_per_second" | jq .

Monitor HPA status and scaling events

kubectl get hpa sample-app-hpa -w kubectl describe hpa sample-app-hpa

Check current replica count

kubectl get deployment sample-app

View HPA events

kubectl get events --field-selector involvedObject.name=sample-app-hpa

Advanced configuration options

Multiple metric types configuration

Configure HPA to use multiple custom metrics for more sophisticated scaling decisions.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: multi-metric-hpa
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: sample-app
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "50"
  - type: Object
    object:
      metric:
        name: nginx_requests_per_second
      describedObject:
        apiVersion: networking.k8s.io/v1
        kind: Ingress
        name: sample-app-ingress
      target:
        type: Value
        value: "100"
  - type: External
    external:
      metric:
        name: rabbitmq_queue_depth
        selector:
          matchLabels:
            queue: workqueue
      target:
        type: AverageValue
        averageValue: "10"
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 600
      policies:
      - type: Pods
        value: 2
        periodSeconds: 120
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60

Prometheus adapter query optimization

Fine-tune metric queries for better performance and accuracy in high-traffic environments.

rules:
  custom:
  - seriesQuery: 'http_request_duration_seconds{namespace!="",pod!=""}'
    resources:
      overrides:
        namespace:
          resource: namespace
        pod:
          resource: pod
    name:
      matches: "^http_request_duration_seconds"
      as: "http_request_latency_p99"
    metricsQuery: 'histogram_quantile(0.99, sum(rate(<<.Series>>_bucket{<<.LabelMatchers>>}[5m])) by (<<.GroupBy>>, le))'
  - seriesQuery: 'application_active_connections{namespace!="",pod!=""}'
    resources:
      overrides:
        namespace:
          resource: namespace
        pod:
          resource: pod
    name:
      matches: "^(.*)"
      as: "active_connections"
    metricsQuery: 'avg(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)'
  - seriesQuery: 'custom_business_metric{namespace!="",service!=""}'
    resources:
      overrides:
        namespace:
          resource: namespace
        service:
          resource: service
    name:
      matches: "^(.*)"
      as: "business_load_index"
    metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'

Production considerations

Note: For production deployments, consider implementing metric validation, fallback scaling policies, and comprehensive monitoring of the autoscaling system itself.

Scaling policy best practices

  • Set appropriate stabilization windows to prevent flapping
  • Use multiple metrics with different scaling behaviors
  • Implement gradual scale-down policies to maintain service stability
  • Monitor metric availability and implement fallback to CPU/memory scaling
  • Test scaling behavior under various load patterns

Metric reliability considerations

When you're managing Kubernetes resource quotas and limits, ensure your custom metrics remain available during resource constraints. Consider implementing metric backup strategies and monitoring the Prometheus adapter's own resource usage.

For comprehensive infrastructure monitoring beyond just autoscaling metrics, you might want to explore monitoring your entire Kubernetes cluster with Prometheus Operator.

Common issues

SymptomCauseFix
HPA shows "unknown" for custom metricsPrometheus adapter not configured correctlyCheck adapter logs: kubectl logs -n monitoring deployment/prometheus-adapter
Metrics API returns empty resultsQuery doesn't match any seriesVerify query in Prometheus UI, check label matching
Scaling is too aggressiveMissing stabilization windowsAdd behavior section with appropriate stabilization settings
Custom metrics not appearingPrometheus not scraping applicationVerify service annotations and Prometheus targets
HPA scales down immediatelyMetric returns zero during scale-upImplement metric smoothing or minimum value thresholds
Adapter fails to startInvalid Prometheus URL or configurationCheck adapter config and Prometheus service connectivity

Next steps

Running this in production?

Want this handled for you? Running this at scale adds a second layer of work: capacity planning, failover drills, cost control, and on-call. See how we run infrastructure like this for European teams.

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle managed devops services for businesses that depend on uptime. From initial setup to ongoing operations.