Kubernetes Prometheus Monitoring with Helm Charts

Deploy a production-ready Prometheus monitoring stack on Kubernetes using Helm charts with ServiceMonitors, alerting rules, and comprehensive cluster observability for metrics collection and monitoring.

Prerequisites

Kubernetes cluster with admin access
kubectl configured
At least 8GB RAM and 4 CPU cores available
Storage provisioner configured
Internet access for Helm charts

What this solves

Kubernetes clusters generate massive amounts of metrics from nodes, pods, services, and applications that need centralized monitoring and alerting. This tutorial shows you how to deploy a complete Prometheus monitoring stack using Helm charts for production-grade cluster observability. You'll configure ServiceMonitors to automatically discover and scrape metrics, set up alerting rules for proactive issue detection, and establish a foundation for comprehensive Kubernetes monitoring.

Step-by-step installation

Update system packages and install prerequisites

Start by updating your system and installing required packages for Kubernetes operations.

sudo apt update && sudo apt upgrade -y
sudo apt install -y curl wget git

sudo dnf update -y
sudo dnf install -y curl wget git

Verify Kubernetes cluster access

Ensure your kubectl is configured and you have admin access to your Kubernetes cluster.

kubectl cluster-info
kubectl get nodes
kubectl auth can-i '' '' --all-namespaces

Warning: You need cluster-admin privileges to install cluster-wide monitoring components. Ensure you're using the correct kubeconfig context.

Install Helm 3

Download and install Helm 3 for Kubernetes package management. Skip this step if you already have Helm 3 installed.

curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
helm version --short

For detailed Helm configuration with security features, see our comprehensive Helm 3 setup guide.

Add Prometheus Community Helm repository

Add the official Prometheus Community Helm repository that contains the kube-prometheus-stack chart.

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm search repo prometheus-community/kube-prometheus-stack

Create monitoring namespace

Create a dedicated namespace for your monitoring stack to isolate monitoring components.

kubectl create namespace monitoring
kubectl label namespace monitoring name=monitoring

Create Prometheus values configuration

Create a custom values file to configure Prometheus, Grafana, and AlertManager for your environment.

prometheus:
  prometheusSpec:
    retention: 30d
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: standard
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 50Gi
    resources:
      limits:
        cpu: 2000m
        memory: 4Gi
      requests:
        cpu: 1000m
        memory: 2Gi
    additionalScrapeConfigs:
      - job_name: 'kubernetes-service-endpoints'
        kubernetes_sd_configs:
          - role: endpoints
        relabel_configs:
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
            action: keep
            regex: true

grafana:
  enabled: true
  adminPassword: "SecureAdminPass123!"
  persistence:
    enabled: true
    storageClassName: standard
    size: 10Gi
  resources:
    limits:
      cpu: 500m
      memory: 1Gi
    requests:
      cpu: 250m
      memory: 512Mi
  grafana.ini:
    security:
      disable_gravatar: true
    users:
      allow_sign_up: false
    auth.anonymous:
      enabled: false

alertmanager:
  alertmanagerSpec:
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: standard
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 10Gi
    resources:
      limits:
        cpu: 200m
        memory: 512Mi
      requests:
        cpu: 100m
        memory: 256Mi

kubeStateMetrics:
  enabled: true

nodeExporter:
  enabled: true

kubelet:
  enabled: true
  serviceMonitor:
    interval: 30s

Deploy Prometheus monitoring stack

Install the complete Prometheus stack using Helm with your custom configuration.

helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --values prometheus-values.yaml \
  --version 65.1.1

kubectl --namespace monitoring get pods -l "release=prometheus"

Configure service exposure

Create NodePort or LoadBalancer services to access Prometheus, Grafana, and AlertManager UIs.

---
apiVersion: v1
kind: Service
metadata:
  name: prometheus-server-nodeport
  namespace: monitoring
spec:
  type: NodePort
  ports:
    - port: 9090
      targetPort: 9090
      nodePort: 30090
  selector:
    app.kubernetes.io/name: prometheus
    prometheus: prometheus-kube-prometheus-prometheus
---
apiVersion: v1
kind: Service
metadata:
  name: grafana-nodeport
  namespace: monitoring
spec:
  type: NodePort
  ports:
    - port: 80
      targetPort: 3000
      nodePort: 30091
  selector:
    app.kubernetes.io/name: grafana
---
apiVersion: v1
kind: Service
metadata:
  name: alertmanager-nodeport
  namespace: monitoring
spec:
  type: NodePort
  ports:
    - port: 9093
      targetPort: 9093
      nodePort: 30092
  selector:
    app.kubernetes.io/name: alertmanager

kubectl apply -f monitoring-services.yaml

Create custom ServiceMonitor for application monitoring

Configure a ServiceMonitor to automatically discover and scrape metrics from your applications.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: webapp-metrics
  namespace: monitoring
  labels:
    app: webapp
    release: prometheus
spec:
  selector:
    matchLabels:
      app: webapp
      metrics: enabled
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics
    scrapeTimeout: 10s
  namespaceSelector:
    any: true

kubectl apply -f app-servicemonitor.yaml

Configure alerting rules

Create PrometheusRule custom resources to define alerting conditions for cluster and application monitoring.

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: cluster-monitoring-rules
  namespace: monitoring
  labels:
    app: kube-prometheus-stack
    release: prometheus
spec:
  groups:
  - name: cluster.rules
    rules:
    - alert: NodeDown
      expr: up{job="node-exporter"} == 0
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Node {{ $labels.instance }} is down"
        description: "Node {{ $labels.instance }} has been down for more than 5 minutes."
    
    - alert: HighCPUUsage
      expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "High CPU usage on {{ $labels.instance }}"
        description: "CPU usage is above 80% for more than 10 minutes on {{ $labels.instance }}."
    
    - alert: HighMemoryUsage
      expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "High memory usage on {{ $labels.instance }}"
        description: "Memory usage is above 90% for more than 10 minutes on {{ $labels.instance }}."
    
    - alert: PodCrashLooping
      expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} is crash looping"
        description: "Pod {{ $labels.namespace }}/{{ $labels.pod }} has been restarting frequently."
    
    - alert: PersistentVolumeUsageHigh
      expr: 100 * (kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes) > 85
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "PV usage high on {{ $labels.persistentvolumeclaim }}"
        description: "Persistent Volume {{ $labels.persistentvolumeclaim }} usage is above 85%."

  - name: kubernetes.rules
    rules:
    - alert: KubernetesAPIServerDown
      expr: up{job="kubernetes-apiservers"} == 0
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Kubernetes API server is down"
        description: "Kubernetes API server has been down for more than 5 minutes."
    
    - alert: KubeletDown
      expr: up{job="kubernetes-nodes"} == 0
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Kubelet on {{ $labels.instance }} is down"
        description: "Kubelet on node {{ $labels.instance }} has been down for more than 5 minutes."

kubectl apply -f cluster-alerts.yaml

Configure AlertManager for notifications

Update AlertManager configuration to send notifications via email, Slack, or other channels.

apiVersion: v1
kind: Secret
metadata:
  name: alertmanager-prometheus-kube-prometheus-alertmanager
  namespace: monitoring
type: Opaque
stringData:
  alertmanager.yml: |
    global:
      smtp_smarthost: 'smtp.example.com:587'
      smtp_from: 'alerts@example.com'
      smtp_auth_username: 'alerts@example.com'
      smtp_auth_password: 'smtp-password-here'
    
    route:
      group_by: ['alertname', 'cluster', 'service']
      group_wait: 10s
      group_interval: 10s
      repeat_interval: 1h
      receiver: 'web.hook'
      routes:
      - match:
          severity: critical
        receiver: 'critical-alerts'
      - match:
          severity: warning
        receiver: 'warning-alerts'
    
    receivers:
    - name: 'web.hook'
      email_configs:
      - to: 'admin@example.com'
        subject: '[ALERT] {{ .GroupLabels.alertname }}'
        body: |
          {{ range .Alerts }}
          Alert: {{ .Annotations.summary }}
          Description: {{ .Annotations.description }}
          {{ end }}
    
    - name: 'critical-alerts'
      email_configs:
      - to: 'critical-alerts@example.com'
        subject: '[CRITICAL] {{ .GroupLabels.alertname }}'
        body: |
          CRITICAL ALERT
          {{ range .Alerts }}
          Alert: {{ .Annotations.summary }}
          Description: {{ .Annotations.description }}
          {{ end }}
    
    - name: 'warning-alerts'
      email_configs:
      - to: 'warnings@example.com'
        subject: '[WARNING] {{ .GroupLabels.alertname }}'
        body: |
          {{ range .Alerts }}
          Alert: {{ .Annotations.summary }}
          Description: {{ .Annotations.description }}
          {{ end }}

kubectl apply -f alertmanager-config.yaml

Verify your setup

Check that all components are running and accessible, then verify metrics collection and alerting functionality.

# Check all monitoring pods are running
kubectl get pods -n monitoring

Verify Prometheus targets are being scraped
kubectl port-forward -n monitoring svc/prometheus-operated 9090:9090 &
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | select(.health != "up") | .labels'

Check ServiceMonitors are discovered
kubectl get servicemonitors -n monitoring

Verify PrometheusRules are loaded
kubectl get prometheusrules -n monitoring

Test Grafana access
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80 &
curl -s http://admin:SecureAdminPass123!@localhost:3000/api/health

Check AlertManager configuration
kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-alertmanager 9093:9093 &
curl -s http://localhost:9093/api/v1/status

Access your monitoring interfaces:

Prometheus: http://your-node-ip:30090
Grafana: http://your-node-ip:30091 (admin/SecureAdminPass123!)
AlertManager: http://your-node-ip:30092

Common issues

Symptom	Cause	Fix
Pods stuck in Pending	Insufficient cluster resources	Reduce resource requests in values.yaml or add more nodes
ServiceMonitor not discovering targets	Label selector mismatch	Verify service labels match ServiceMonitor selector
Persistent volumes not provisioning	Missing StorageClass	Create default StorageClass or specify existing one
Alerts not firing	PrometheusRule labels missing	Ensure PrometheusRule has correct labels matching Prometheus selector
Grafana dashboards missing data	Prometheus datasource misconfigured	Check datasource URL points to prometheus-operated service
High memory usage on Prometheus	Too many metrics or long retention	Reduce retention period or add resource limits

Next steps

Monitor Docker containers with Prometheus and Grafana for additional container monitoring
Monitor Istio service mesh with Prometheus and Grafana for service mesh observability
Configure Prometheus long-term storage with Thanos for scalable metrics retention
Implement custom Prometheus exporters for application metrics to monitor your applications
Configure advanced Grafana dashboards and alerting for enhanced visualization

Automated install script

Run this to automate the entire setup

install.sh

#!/usr/bin/env bash
set -euo pipefail

# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color

# Default values
PROMETHEUS_NAMESPACE="monitoring"
GRAFANA_PASSWORD="SecureAdminPass123!"
STORAGE_CLASS="standard"
PROMETHEUS_STORAGE="50Gi"
GRAFANA_STORAGE="10Gi"
ALERTMANAGER_STORAGE="10Gi"

# Usage function
usage() {
    echo "Usage: $0 [OPTIONS]"
    echo "Options:"
    echo "  -n, --namespace NAME     Monitoring namespace (default: monitoring)"
    echo "  -p, --password PASS      Grafana admin password (default: SecureAdminPass123!)"
    echo "  -s, --storage-class SC   Storage class (default: standard)"
    echo "  -h, --help               Show this help message"
    exit 1
}

# Parse command line arguments
while [[ $# -gt 0 ]]; do
    case $1 in
        -n|--namespace)
            PROMETHEUS_NAMESPACE="$2"
            shift 2
            ;;
        -p|--password)
            GRAFANA_PASSWORD="$2"
            shift 2
            ;;
        -s|--storage-class)
            STORAGE_CLASS="$2"
            shift 2
            ;;
        -h|--help)
            usage
            ;;
        *)
            echo -e "${RED}Error: Unknown option $1${NC}"
            usage
            ;;
    esac
done

# Error handling
cleanup() {
    echo -e "${RED}Installation failed. Cleaning up...${NC}"
    kubectl delete namespace "$PROMETHEUS_NAMESPACE" --ignore-not-found=true
    helm repo remove prometheus-community 2>/dev/null || true
    rm -f /tmp/prometheus-values.yaml
    exit 1
}
trap cleanup ERR

log_success() {
    echo -e "${GREEN}✓ $1${NC}"
}

log_warning() {
    echo -e "${YELLOW}⚠ $1${NC}"
}

log_error() {
    echo -e "${RED}✗ $1${NC}"
}

# Detect OS distribution
if [ -f /etc/os-release ]; then
    . /etc/os-release
    case "$ID" in
        ubuntu|debian)
            PKG_MGR="apt"
            PKG_UPDATE="apt update"
            PKG_INSTALL="apt install -y"
            PKG_UPGRADE="apt upgrade -y"
            ;;
        almalinux|rocky|centos|rhel|ol|fedora)
            PKG_MGR="dnf"
            PKG_UPDATE="dnf check-update || true"
            PKG_INSTALL="dnf install -y"
            PKG_UPGRADE="dnf update -y"
            ;;
        amzn)
            PKG_MGR="yum"
            PKG_UPDATE="yum check-update || true"
            PKG_INSTALL="yum install -y"
            PKG_UPGRADE="yum update -y"
            ;;
        *)
            log_error "Unsupported distribution: $ID"
            exit 1
            ;;
    esac
else
    log_error "Cannot detect OS distribution"
    exit 1
fi

echo "[1/8] Checking prerequisites..."

# Check if running as root or with sudo
if [[ $EUID -eq 0 ]]; then
    SUDO=""
else
    if ! command -v sudo &> /dev/null; then
        log_error "This script requires sudo privileges"
        exit 1
    fi
    SUDO="sudo"
fi

echo "[2/8] Updating system packages..."
$SUDO $PKG_UPDATE
$SUDO $PKG_UPGRADE
$SUDO $PKG_INSTALL curl wget git
log_success "System packages updated"

echo "[3/8] Verifying Kubernetes cluster access..."
if ! command -v kubectl &> /dev/null; then
    log_error "kubectl not found. Please install kubectl first"
    exit 1
fi

kubectl cluster-info > /dev/null 2>&1 || {
    log_error "Cannot connect to Kubernetes cluster"
    exit 1
}

kubectl get nodes > /dev/null 2>&1 || {
    log_error "Cannot list cluster nodes"
    exit 1
}

if ! kubectl auth can-i '*' '*' --all-namespaces > /dev/null 2>&1; then
    log_warning "Limited cluster permissions detected. Some features may not work"
fi
log_success "Kubernetes cluster access verified"

echo "[4/8] Installing Helm 3..."
if command -v helm &> /dev/null; then
    HELM_VERSION=$(helm version --short 2>/dev/null | grep -o 'v[0-9]\+' | head -1)
    if [[ "$HELM_VERSION" == "v3" ]]; then
        log_success "Helm 3 already installed"
    else
        log_warning "Helm 2 detected, installing Helm 3..."
        curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
    fi
else
    curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
fi

helm version --short > /dev/null 2>&1 || {
    log_error "Helm installation failed"
    exit 1
}
log_success "Helm 3 installed successfully"

echo "[5/8] Adding Prometheus Community Helm repository..."
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm search repo prometheus-community/kube-prometheus-stack > /dev/null
log_success "Prometheus Community Helm repository added"

echo "[6/8] Creating monitoring namespace..."
kubectl create namespace "$PROMETHEUS_NAMESPACE" --dry-run=client -o yaml | kubectl apply -f -
kubectl label namespace "$PROMETHEUS_NAMESPACE" name="$PROMETHEUS_NAMESPACE" --overwrite
log_success "Monitoring namespace created: $PROMETHEUS_NAMESPACE"

echo "[7/8] Creating Prometheus configuration..."
cat > /tmp/prometheus-values.yaml << EOF
prometheus:
  prometheusSpec:
    retention: 30d
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: $STORAGE_CLASS
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: $PROMETHEUS_STORAGE
    resources:
      limits:
        cpu: 2000m
        memory: 4Gi
      requests:
        cpu: 1000m
        memory: 2Gi

grafana:
  enabled: true
  adminPassword: "$GRAFANA_PASSWORD"
  persistence:
    enabled: true
    storageClassName: $STORAGE_CLASS
    size: $GRAFANA_STORAGE
  resources:
    limits:
      cpu: 500m
      memory: 1Gi
    requests:
      cpu: 250m
      memory: 512Mi
  grafana.ini:
    security:
      disable_gravatar: true
    users:
      allow_sign_up: false
    auth.anonymous:
      enabled: false

alertmanager:
  alertmanagerSpec:
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: $STORAGE_CLASS
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: $ALERTMANAGER_STORAGE
    resources:
      limits:
        cpu: 200m
        memory: 512Mi
      requests:
        cpu: 100m
        memory: 256Mi

kubeStateMetrics:
  enabled: true

nodeExporter:
  enabled: true

kubelet:
  enabled: true
  serviceMonitor:
    interval: 30s
EOF

chmod 644 /tmp/prometheus-values.yaml
log_success "Prometheus configuration created"

echo "[8/8] Deploying Prometheus monitoring stack..."
helm upgrade --install prometheus prometheus-community/kube-prometheus-stack \
  --namespace "$PROMETHEUS_NAMESPACE" \
  --values /tmp/prometheus-values.yaml \
  --wait --timeout=10m

log_success "Prometheus monitoring stack deployed successfully"

echo ""
echo "Verifying installation..."
kubectl get pods -n "$PROMETHEUS_NAMESPACE" --no-headers | while read -r pod status ready restarts age; do
    if [[ "$status" == "Running" ]]; then
        log_success "Pod $pod is running"
    else
        log_warning "Pod $pod status: $status"
    fi
done

echo ""
log_success "Kubernetes monitoring with Prometheus installed successfully!"
echo ""
echo "Access URLs:"
echo "• Prometheus: kubectl port-forward -n $PROMETHEUS_NAMESPACE svc/prometheus-kube-prometheus-prometheus 9090:9090"
echo "• Grafana: kubectl port-forward -n $PROMETHEUS_NAMESPACE svc/prometheus-grafana 3000:80"
echo "• AlertManager: kubectl port-forward -n $PROMETHEUS_NAMESPACE svc/prometheus-kube-prometheus-alertmanager 9093:9093"
echo ""
echo "Grafana credentials:"
echo "• Username: admin"
echo "• Password: $GRAFANA_PASSWORD"
echo ""
echo "Clean up temporary files..."
rm -f /tmp/prometheus-values.yaml

Review the script before running. Execute with: bash install.sh

#kubernetes #prometheus #helm #monitoring #observability

Implement Kubernetes monitoring with Prometheus and Helm charts for comprehensive cluster observability

Prerequisites

What this solves

Step-by-step installation

Update system packages and install prerequisites

Verify Kubernetes cluster access

Install Helm 3

Add Prometheus Community Helm repository

Create monitoring namespace

Create Prometheus values configuration

Deploy Prometheus monitoring stack

Configure service exposure

Create custom ServiceMonitor for application monitoring

Configure alerting rules

Configure AlertManager for notifications

Verify your setup

Verify Prometheus targets are being scraped

Check ServiceMonitors are discovered

Verify PrometheusRules are loaded

Test Grafana access

Check AlertManager configuration

Common issues

Next steps

Related tutorials

Set up Kafka Streams testing framework with TopologyTestDriver for automated stream processing validation

Configure Consul multi-datacenter WAN federation for geographic redundancy

Configure centralized cron management with Ansible automation and systemd timers

Don't want to manage this yourself?