Monitor Kubernetes clusters with Prometheus and Grafana for container orchestration insights

Intermediate 45 min Apr 11, 2026 36 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Set up comprehensive Kubernetes monitoring using the Prometheus Operator and Grafana with persistent storage, RBAC, ServiceMonitors, and custom dashboards for complete cluster observability.

Prerequisites

  • Running Kubernetes cluster with kubectl access
  • Helm 3 installed
  • Persistent storage available in cluster
  • Basic understanding of Kubernetes resources

What this solves

Kubernetes clusters generate massive amounts of metrics from nodes, pods, services, and applications that are essential for maintaining cluster health and performance. Without proper monitoring, you're blind to resource usage, performance bottlenecks, and potential failures. This tutorial shows you how to deploy the Prometheus Operator with Grafana to collect, store, and visualize all Kubernetes metrics using ServiceMonitor and PodMonitor resources for automated discovery and monitoring of your workloads.

Step-by-step installation

Install kubectl and Helm

Install kubectl to interact with your Kubernetes cluster and Helm to deploy the monitoring stack.

sudo apt update
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
sudo dnf update -y
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

Create monitoring namespace

Create a dedicated namespace for all monitoring components to keep them organized and apply consistent policies.

kubectl create namespace monitoring

Add Prometheus community Helm repository

Add the official Prometheus community Helm repository that contains the kube-prometheus-stack chart.

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

Create Grafana configuration

Create a values file to configure Grafana with persistent storage, RBAC, and custom settings.

grafana:
  adminPassword: "secure-admin-password-123"
  persistence:
    enabled: true
    size: 10Gi
    storageClassName: "standard"
  serviceAccount:
    create: true
    name: grafana
  rbac:
    create: true
  service:
    type: ClusterIP
    port: 3000
  ingress:
    enabled: false
  dashboardProviders:
    dashboardproviders.yaml:
      apiVersion: 1
      providers:
      - name: 'default'
        orgId: 1
        folder: ''
        type: file
        disableDeletion: false
        editable: true
        options:
          path: /var/lib/grafana/dashboards/default
  datasources:
    datasources.yaml:
      apiVersion: 1
      datasources:
      - name: Prometheus
        type: prometheus
        url: http://prometheus-kube-prometheus-prometheus:9090
        access: proxy
        isDefault: true

Create Prometheus Operator values

Configure the complete monitoring stack with Prometheus Operator, AlertManager, and all necessary components.

prometheus:
  prometheusSpec:
    retention: "15d"
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: "standard"
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 50Gi
    resources:
      requests:
        memory: "2Gi"
        cpu: "1000m"
      limits:
        memory: "4Gi"
        cpu: "2000m"
    serviceMonitorSelectorNilUsesHelmValues: false
    podMonitorSelectorNilUsesHelmValues: false
    ruleSelectorNilUsesHelmValues: false
    serviceMonitorSelector: {}
    podMonitorSelector: {}
    ruleSelector: {}

alertmanager:
  alertmanagerSpec:
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: "standard"
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 10Gi
    resources:
      requests:
        memory: "256Mi"
        cpu: "100m"
      limits:
        memory: "512Mi"
        cpu: "200m"

grafana:
  adminPassword: "secure-admin-password-123"
  persistence:
    enabled: true
    size: 10Gi
    storageClassName: "standard"
  resources:
    requests:
      memory: "256Mi"
      cpu: "100m"
    limits:
      memory: "512Mi"
      cpu: "200m"

nodeExporter:
  enabled: true

kubeStateMetrics:
  enabled: true

prometheusNodeExporter:
  enabled: true

Deploy the monitoring stack

Install the complete kube-prometheus-stack using Helm with your custom configuration.

helm install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --values prometheus-values.yaml \
  --wait \
  --timeout 600s

Create ServiceMonitor for custom applications

Create a ServiceMonitor resource to automatically discover and monitor custom applications that expose metrics.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: app-metrics
  namespace: monitoring
  labels:
    app: custom-app
spec:
  selector:
    matchLabels:
      app: custom-app
  endpoints:
  - port: metrics
    path: /metrics
    interval: 30s
    scrapeTimeout: 10s
  namespaceSelector:
    matchNames:
    - default
    - production
kubectl apply -f app-servicemonitor.yaml

Create PodMonitor for pod-level monitoring

Create a PodMonitor to directly monitor pods that expose metrics without requiring a service.

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: pod-metrics
  namespace: monitoring
  labels:
    app: pod-monitor
spec:
  selector:
    matchLabels:
      metrics: "enabled"
  podMetricsEndpoints:
  - port: metrics
    path: /metrics
    interval: 30s
    scrapeTimeout: 10s
  namespaceSelector:
    matchNames:
    - default
    - production
kubectl apply -f pod-monitor.yaml

Set up port forwarding for Grafana

Create port forwarding to access Grafana dashboard from your local machine.

kubectl port-forward -n monitoring svc/monitoring-grafana 3000:80 &

Set up port forwarding for Prometheus

Create port forwarding to access Prometheus web UI for query testing and configuration verification.

kubectl port-forward -n monitoring svc/monitoring-kube-prometheus-prometheus 9090:9090 &

Create custom Kubernetes dashboard

Import a comprehensive Kubernetes monitoring dashboard into Grafana with cluster overview metrics.

{
  "dashboard": {
    "id": null,
    "title": "Kubernetes Cluster Overview",
    "tags": ["kubernetes", "cluster"],
    "timezone": "browser",
    "panels": [
      {
        "id": 1,
        "title": "Cluster CPU Usage",
        "type": "stat",
        "targets": [
          {
            "expr": "(1 - avg(rate(node_cpu_seconds_total{mode=\"idle\"}[5m]))) * 100",
            "refId": "A"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "unit": "percent",
            "thresholds": {
              "steps": [
                {"color": "green", "value": null},
                {"color": "yellow", "value": 70},
                {"color": "red", "value": 90}
              ]
            }
          }
        },
        "gridPos": {"h": 8, "w": 12, "x": 0, "y": 0}
      },
      {
        "id": 2,
        "title": "Memory Usage",
        "type": "stat",
        "targets": [
          {
            "expr": "(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100",
            "refId": "A"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "unit": "percent",
            "thresholds": {
              "steps": [
                {"color": "green", "value": null},
                {"color": "yellow", "value": 80},
                {"color": "red", "value": 95}
              ]
            }
          }
        },
        "gridPos": {"h": 8, "w": 12, "x": 12, "y": 0}
      },
      {
        "id": 3,
        "title": "Pod Count by Namespace",
        "type": "table",
        "targets": [
          {
            "expr": "count(kube_pod_info) by (namespace)",
            "refId": "A",
            "format": "table"
          }
        ],
        "gridPos": {"h": 8, "w": 24, "x": 0, "y": 8}
      }
    ],
    "time": {
      "from": "now-6h",
      "to": "now"
    },
    "refresh": "30s"
  }
}

Configure AlertManager rules

Create custom alerting rules for important Kubernetes metrics and resource thresholds.

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: kubernetes-alerts
  namespace: monitoring
  labels:
    app: kube-prometheus-stack
spec:
  groups:
  - name: kubernetes.rules
    rules:
    - alert: KubernetesPodCrashLooping
      expr: rate(kube_pod_container_status_restarts_total[1h])  60  5 > 0
      for: 15m
      labels:
        severity: critical
      annotations:
        summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} is crash looping"
        description: "Pod {{ $labels.namespace }}/{{ $labels.pod }} has been restarting {{ $value }} times in the last 5 minutes"
    
    - alert: KubernetesNodeNotReady
      expr: kube_node_status_condition{condition="Ready",status="true"} == 0
      for: 10m
      labels:
        severity: critical
      annotations:
        summary: "Kubernetes Node {{ $labels.node }} is not ready"
        description: "Node {{ $labels.node }} has been not ready for more than 10 minutes"
    
    - alert: KubernetesPodMemoryUsage
      expr: (container_memory_working_set_bytes / container_spec_memory_limit_bytes) * 100 > 90
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} high memory usage"
        description: "Pod {{ $labels.namespace }}/{{ $labels.pod }} memory usage is above 90%"
kubectl apply -f k8s-alerts.yaml

Configure RBAC for monitoring

Create monitoring service account

Create a dedicated service account with proper RBAC permissions for the monitoring stack.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus-monitoring
  namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus-monitoring
rules:
  • apiGroups: [""]
resources: ["nodes", "nodes/metrics", "services", "endpoints", "pods"] verbs: ["get", "list", "watch"]
  • apiGroups: ["extensions"]
resources: ["ingresses"] verbs: ["get", "list", "watch"]
  • apiGroups: ["networking.k8s.io"]
resources: ["ingresses"] verbs: ["get", "list", "watch"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: prometheus-monitoring roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheus-monitoring subjects:
  • kind: ServiceAccount
name: prometheus-monitoring namespace: monitoring
kubectl apply -f monitoring-rbac.yaml

Create ingress for external access

Set up ingress resources to access Grafana and Prometheus from outside the cluster with proper TLS.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: grafana-ingress
  namespace: monitoring
  annotations:
    kubernetes.io/ingress.class: "nginx"
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  tls:
  - hosts:
    - grafana.example.com
    secretName: grafana-tls
  rules:
  - host: grafana.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: monitoring-grafana
            port:
              number: 80
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: prometheus-ingress
  namespace: monitoring
  annotations:
    kubernetes.io/ingress.class: "nginx"
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  tls:
  - hosts:
    - prometheus.example.com
    secretName: prometheus-tls
  rules:
  - host: prometheus.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: monitoring-kube-prometheus-prometheus
            port:
              number: 9090
kubectl apply -f monitoring-ingress.yaml

Verify your setup

Check that all monitoring components are running and collecting metrics properly.

kubectl get pods -n monitoring
kubectl get servicemonitors -n monitoring
kubectl get podmonitors -n monitoring
kubectl get prometheusrules -n monitoring

Verify Prometheus targets are being discovered:

kubectl port-forward -n monitoring svc/monitoring-kube-prometheus-prometheus 9090:9090
curl http://localhost:9090/api/v1/targets

Test Grafana access and verify dashboards are loading:

kubectl port-forward -n monitoring svc/monitoring-grafana 3000:80
curl -u admin:secure-admin-password-123 http://localhost:3000/api/health

Check that metrics are being collected:

kubectl exec -n monitoring deployment/monitoring-kube-prometheus-operator -- \
  promtool query instant 'up'
kubectl logs -n monitoring -l app.kubernetes.io/name=prometheus

Common issues

SymptomCauseFix
Prometheus pods stuck in pendingInsufficient storage or resourcesCheck PVC status with kubectl get pvc -n monitoring and verify storage class exists
Grafana shows no data sourcePrometheus service URL incorrectVerify service name with kubectl get svc -n monitoring and update datasource URL
ServiceMonitor not discovering targetsLabel selectors don't matchCheck service labels match ServiceMonitor selector with kubectl describe servicemonitor
High memory usage in PrometheusToo many metrics or long retentionReduce retention period or add resource limits in values.yaml
AlertManager not sending alertsMissing or incorrect configurationCheck AlertManager config with kubectl get secret -n monitoring monitoring-kube-prometheus-alertmanager -o yaml
Node exporter metrics missingDaemonSet not deployed on all nodesCheck node selector and tolerations with kubectl get daemonset -n monitoring

Next steps

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle managed devops services for businesses that depend on uptime. From initial setup to ongoing operations.