Implement Kubernetes monitoring with Prometheus and Helm charts for comprehensive cluster observability

Intermediate 45 min Apr 03, 2026 259 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Deploy a production-ready Prometheus monitoring stack on Kubernetes using Helm charts with ServiceMonitors, alerting rules, and comprehensive cluster observability for metrics collection and monitoring.

Prerequisites

  • Kubernetes cluster with admin access
  • kubectl configured
  • At least 8GB RAM and 4 CPU cores available
  • Storage provisioner configured
  • Internet access for Helm charts

What this solves

Kubernetes clusters generate massive amounts of metrics from nodes, pods, services, and applications that need centralized monitoring and alerting. This tutorial shows you how to deploy a complete Prometheus monitoring stack using Helm charts for production-grade cluster observability. You'll configure ServiceMonitors to automatically discover and scrape metrics, set up alerting rules for proactive issue detection, and establish a foundation for comprehensive Kubernetes monitoring.

Step-by-step installation

Update system packages and install prerequisites

Start by updating your system and installing required packages for Kubernetes operations.

sudo apt update && sudo apt upgrade -y
sudo apt install -y curl wget git
sudo dnf update -y
sudo dnf install -y curl wget git

Verify Kubernetes cluster access

Ensure your kubectl is configured and you have admin access to your Kubernetes cluster.

kubectl cluster-info
kubectl get nodes
kubectl auth can-i '' '' --all-namespaces
Warning: You need cluster-admin privileges to install cluster-wide monitoring components. Ensure you're using the correct kubeconfig context.

Install Helm 3

Download and install Helm 3 for Kubernetes package management. Skip this step if you already have Helm 3 installed.

curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
helm version --short

For detailed Helm configuration with security features, see our comprehensive Helm 3 setup guide.

Add Prometheus Community Helm repository

Add the official Prometheus Community Helm repository that contains the kube-prometheus-stack chart.

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm search repo prometheus-community/kube-prometheus-stack

Create monitoring namespace

Create a dedicated namespace for your monitoring stack to isolate monitoring components.

kubectl create namespace monitoring
kubectl label namespace monitoring name=monitoring

Create Prometheus values configuration

Create a custom values file to configure Prometheus, Grafana, and AlertManager for your environment.

prometheus:
  prometheusSpec:
    retention: 30d
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: standard
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 50Gi
    resources:
      limits:
        cpu: 2000m
        memory: 4Gi
      requests:
        cpu: 1000m
        memory: 2Gi
    additionalScrapeConfigs:
      - job_name: 'kubernetes-service-endpoints'
        kubernetes_sd_configs:
          - role: endpoints
        relabel_configs:
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
            action: keep
            regex: true

grafana:
  enabled: true
  adminPassword: "SecureAdminPass123!"
  persistence:
    enabled: true
    storageClassName: standard
    size: 10Gi
  resources:
    limits:
      cpu: 500m
      memory: 1Gi
    requests:
      cpu: 250m
      memory: 512Mi
  grafana.ini:
    security:
      disable_gravatar: true
    users:
      allow_sign_up: false
    auth.anonymous:
      enabled: false

alertmanager:
  alertmanagerSpec:
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: standard
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 10Gi
    resources:
      limits:
        cpu: 200m
        memory: 512Mi
      requests:
        cpu: 100m
        memory: 256Mi

kubeStateMetrics:
  enabled: true

nodeExporter:
  enabled: true

kubelet:
  enabled: true
  serviceMonitor:
    interval: 30s

Deploy Prometheus monitoring stack

Install the complete Prometheus stack using Helm with your custom configuration.

helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --values prometheus-values.yaml \
  --version 65.1.1

kubectl --namespace monitoring get pods -l "release=prometheus"

Configure service exposure

Create NodePort or LoadBalancer services to access Prometheus, Grafana, and AlertManager UIs.

---
apiVersion: v1
kind: Service
metadata:
  name: prometheus-server-nodeport
  namespace: monitoring
spec:
  type: NodePort
  ports:
    - port: 9090
      targetPort: 9090
      nodePort: 30090
  selector:
    app.kubernetes.io/name: prometheus
    prometheus: prometheus-kube-prometheus-prometheus
---
apiVersion: v1
kind: Service
metadata:
  name: grafana-nodeport
  namespace: monitoring
spec:
  type: NodePort
  ports:
    - port: 80
      targetPort: 3000
      nodePort: 30091
  selector:
    app.kubernetes.io/name: grafana
---
apiVersion: v1
kind: Service
metadata:
  name: alertmanager-nodeport
  namespace: monitoring
spec:
  type: NodePort
  ports:
    - port: 9093
      targetPort: 9093
      nodePort: 30092
  selector:
    app.kubernetes.io/name: alertmanager
kubectl apply -f monitoring-services.yaml

Create custom ServiceMonitor for application monitoring

Configure a ServiceMonitor to automatically discover and scrape metrics from your applications.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: webapp-metrics
  namespace: monitoring
  labels:
    app: webapp
    release: prometheus
spec:
  selector:
    matchLabels:
      app: webapp
      metrics: enabled
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics
    scrapeTimeout: 10s
  namespaceSelector:
    any: true
kubectl apply -f app-servicemonitor.yaml

Configure alerting rules

Create PrometheusRule custom resources to define alerting conditions for cluster and application monitoring.

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: cluster-monitoring-rules
  namespace: monitoring
  labels:
    app: kube-prometheus-stack
    release: prometheus
spec:
  groups:
  - name: cluster.rules
    rules:
    - alert: NodeDown
      expr: up{job="node-exporter"} == 0
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Node {{ $labels.instance }} is down"
        description: "Node {{ $labels.instance }} has been down for more than 5 minutes."
    
    - alert: HighCPUUsage
      expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "High CPU usage on {{ $labels.instance }}"
        description: "CPU usage is above 80% for more than 10 minutes on {{ $labels.instance }}."
    
    - alert: HighMemoryUsage
      expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "High memory usage on {{ $labels.instance }}"
        description: "Memory usage is above 90% for more than 10 minutes on {{ $labels.instance }}."
    
    - alert: PodCrashLooping
      expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} is crash looping"
        description: "Pod {{ $labels.namespace }}/{{ $labels.pod }} has been restarting frequently."
    
    - alert: PersistentVolumeUsageHigh
      expr: 100 * (kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes) > 85
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "PV usage high on {{ $labels.persistentvolumeclaim }}"
        description: "Persistent Volume {{ $labels.persistentvolumeclaim }} usage is above 85%."

  - name: kubernetes.rules
    rules:
    - alert: KubernetesAPIServerDown
      expr: up{job="kubernetes-apiservers"} == 0
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Kubernetes API server is down"
        description: "Kubernetes API server has been down for more than 5 minutes."
    
    - alert: KubeletDown
      expr: up{job="kubernetes-nodes"} == 0
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Kubelet on {{ $labels.instance }} is down"
        description: "Kubelet on node {{ $labels.instance }} has been down for more than 5 minutes."
kubectl apply -f cluster-alerts.yaml

Configure AlertManager for notifications

Update AlertManager configuration to send notifications via email, Slack, or other channels.

apiVersion: v1
kind: Secret
metadata:
  name: alertmanager-prometheus-kube-prometheus-alertmanager
  namespace: monitoring
type: Opaque
stringData:
  alertmanager.yml: |
    global:
      smtp_smarthost: 'smtp.example.com:587'
      smtp_from: 'alerts@example.com'
      smtp_auth_username: 'alerts@example.com'
      smtp_auth_password: 'smtp-password-here'
    
    route:
      group_by: ['alertname', 'cluster', 'service']
      group_wait: 10s
      group_interval: 10s
      repeat_interval: 1h
      receiver: 'web.hook'
      routes:
      - match:
          severity: critical
        receiver: 'critical-alerts'
      - match:
          severity: warning
        receiver: 'warning-alerts'
    
    receivers:
    - name: 'web.hook'
      email_configs:
      - to: 'admin@example.com'
        subject: '[ALERT] {{ .GroupLabels.alertname }}'
        body: |
          {{ range .Alerts }}
          Alert: {{ .Annotations.summary }}
          Description: {{ .Annotations.description }}
          {{ end }}
    
    - name: 'critical-alerts'
      email_configs:
      - to: 'critical-alerts@example.com'
        subject: '[CRITICAL] {{ .GroupLabels.alertname }}'
        body: |
          CRITICAL ALERT
          {{ range .Alerts }}
          Alert: {{ .Annotations.summary }}
          Description: {{ .Annotations.description }}
          {{ end }}
    
    - name: 'warning-alerts'
      email_configs:
      - to: 'warnings@example.com'
        subject: '[WARNING] {{ .GroupLabels.alertname }}'
        body: |
          {{ range .Alerts }}
          Alert: {{ .Annotations.summary }}
          Description: {{ .Annotations.description }}
          {{ end }}
kubectl apply -f alertmanager-config.yaml

Verify your setup

Check that all components are running and accessible, then verify metrics collection and alerting functionality.

# Check all monitoring pods are running
kubectl get pods -n monitoring

Verify Prometheus targets are being scraped

kubectl port-forward -n monitoring svc/prometheus-operated 9090:9090 & curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | select(.health != "up") | .labels'

Check ServiceMonitors are discovered

kubectl get servicemonitors -n monitoring

Verify PrometheusRules are loaded

kubectl get prometheusrules -n monitoring

Test Grafana access

kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80 & curl -s http://admin:SecureAdminPass123!@localhost:3000/api/health

Check AlertManager configuration

kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-alertmanager 9093:9093 & curl -s http://localhost:9093/api/v1/status

Access your monitoring interfaces:

  • Prometheus: http://your-node-ip:30090
  • Grafana: http://your-node-ip:30091 (admin/SecureAdminPass123!)
  • AlertManager: http://your-node-ip:30092

Common issues

SymptomCauseFix
Pods stuck in PendingInsufficient cluster resourcesReduce resource requests in values.yaml or add more nodes
ServiceMonitor not discovering targetsLabel selector mismatchVerify service labels match ServiceMonitor selector
Persistent volumes not provisioningMissing StorageClassCreate default StorageClass or specify existing one
Alerts not firingPrometheusRule labels missingEnsure PrometheusRule has correct labels matching Prometheus selector
Grafana dashboards missing dataPrometheus datasource misconfiguredCheck datasource URL points to prometheus-operated service
High memory usage on PrometheusToo many metrics or long retentionReduce retention period or add resource limits

Next steps

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle managed devops services for businesses that depend on uptime. From initial setup to ongoing operations.