Monitor Kubernetes with Prometheus Operator

Set up complete cluster monitoring using Prometheus Operator with automated metrics collection, custom dashboards, and intelligent alerting for production Kubernetes environments.

Prerequisites

Kubernetes cluster with kubectl access
Helm 3 installed
At least 8GB RAM available in cluster
Storage class configured for persistent volumes

What this solves

Prometheus Operator simplifies monitoring Kubernetes clusters by automating the deployment and management of Prometheus, Grafana, and Alertmanager. It provides custom resource definitions (CRDs) that make it easy to configure monitoring for your applications and infrastructure without manually managing configuration files.

Step-by-step installation

Update system packages and install prerequisites

Start by ensuring your system is up to date and install required tools for Kubernetes management.

sudo apt update && sudo apt upgrade -y
sudo apt install -y curl wget git

sudo dnf update -y
sudo dnf install -y curl wget git

Install Helm package manager

Helm will help us deploy Prometheus Operator and manage its configuration easily.

curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
helm version

Add Prometheus Community Helm repository

Add the official repository that contains the kube-prometheus-stack chart.

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

Create monitoring namespace

Create a dedicated namespace for all monitoring components to keep them organized.

kubectl create namespace monitoring

Create Prometheus Operator values file

Configure the installation with custom settings for production use.

prometheus:
  prometheusSpec:
    retention: 30d
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: standard
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 50Gi
    resources:
      requests:
        memory: 2Gi
        cpu: 500m
      limits:
        memory: 4Gi
        cpu: 2

grafana:
  adminPassword: SecureGrafanaPass123!
  persistence:
    enabled: true
    storageClassName: standard
    size: 10Gi
  resources:
    requests:
      memory: 256Mi
      cpu: 100m
    limits:
      memory: 512Mi
      cpu: 200m

alertmanager:
  alertmanagerSpec:
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: standard
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 10Gi
    resources:
      requests:
        memory: 256Mi
        cpu: 100m
      limits:
        memory: 512Mi
        cpu: 200m

kubeStateMetrics:
  enabled: true

nodeExporter:
  enabled: true

prometheusNodeExporter:
  hostRootFsMount:
    enabled: false

Install Prometheus Operator with Helm

Deploy the complete monitoring stack using the configuration file we created.

helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --values prometheus-values.yaml \
  --version 65.1.0

Verify Prometheus Operator installation

Check that all monitoring components are running properly.

kubectl get pods -n monitoring
kubectl get svc -n monitoring

Create ServiceMonitor for application monitoring

Configure automatic discovery and scraping of application metrics using ServiceMonitor CRD.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: nginx-service-monitor
  namespace: monitoring
  labels:
    app: nginx
    release: kube-prometheus-stack
spec:
  selector:
    matchLabels:
      app: nginx
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics
  namespaceSelector:
    matchNames:
    - default
    - production

kubectl apply -f servicemonitor-example.yaml

Create PodMonitor for pod-level monitoring

Set up direct pod monitoring for applications that expose metrics on specific ports.

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: app-pod-monitor
  namespace: monitoring
  labels:
    app: myapp
    release: kube-prometheus-stack
spec:
  selector:
    matchLabels:
      app: myapp
  podMetricsEndpoints:
  - port: metrics
    interval: 30s
    path: /metrics
  namespaceSelector:
    matchNames:
    - default
    - production

kubectl apply -f podmonitor-example.yaml

Configure Grafana access

Set up port forwarding to access Grafana dashboard locally.

kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80

Configure custom Grafana dashboard

Create a custom dashboard for cluster overview with key metrics.

{
  "dashboard": {
    "id": null,
    "title": "Kubernetes Cluster Overview",
    "tags": ["kubernetes", "cluster"],
    "timezone": "browser",
    "panels": [
      {
        "id": 1,
        "title": "CPU Usage by Node",
        "type": "graph",
        "targets": [
          {
            "expr": "100 - (avg by (instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)",
            "legendFormat": "{{instance}}"
          }
        ]
      },
      {
        "id": 2,
        "title": "Memory Usage by Node",
        "type": "graph",
        "targets": [
          {
            "expr": "(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100",
            "legendFormat": "{{instance}}"
          }
        ]
      },
      {
        "id": 3,
        "title": "Pod Count by Namespace",
        "type": "stat",
        "targets": [
          {
            "expr": "count by (namespace) (kube_pod_info)",
            "legendFormat": "{{namespace}}"
          }
        ]
      }
    ],
    "time": {
      "from": "now-1h",
      "to": "now"
    },
    "refresh": "30s"
  }
}

Create Alertmanager configuration

Set up email notifications and alerting rules for critical cluster events.

apiVersion: v1
kind: Secret
metadata:
  name: alertmanager-kube-prometheus-stack-alertmanager
  namespace: monitoring
stringData:
  alertmanager.yml: |
    global:
      smtp_smarthost: 'mail.example.com:587'
      smtp_from: 'alerts@example.com'
      smtp_auth_username: 'alerts@example.com'
      smtp_auth_password: 'your-email-password'
    
    route:
      group_by: ['alertname', 'instance']
      group_wait: 10s
      group_interval: 10s
      repeat_interval: 1h
      receiver: 'web.hook'
      routes:
      - match:
          severity: critical
        receiver: email-critical
      - match:
          severity: warning
        receiver: email-warning
    
    receivers:
    - name: 'web.hook'
      webhook_configs:
      - url: 'http://127.0.0.1:5001/'
    - name: 'email-critical'
      email_configs:
      - to: 'devops@example.com'
        subject: '[CRITICAL] {{ .GroupLabels.alertname }}'
        body: |
          {{ range .Alerts }}
          Alert: {{ .Annotations.summary }}
          Description: {{ .Annotations.description }}
          {{ end }}
    - name: 'email-warning'
      email_configs:
      - to: 'monitoring@example.com'
        subject: '[WARNING] {{ .GroupLabels.alertname }}'
        body: |
          {{ range .Alerts }}
          Alert: {{ .Annotations.summary }}
          Description: {{ .Annotations.description }}
          {{ end }}

kubectl apply -f alertmanager-config.yaml

Create custom PrometheusRule for alerting

Define specific alerting rules for your cluster monitoring needs.

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: custom-cluster-alerts
  namespace: monitoring
  labels:
    app: kube-prometheus-stack
    release: kube-prometheus-stack
spec:
  groups:
  - name: cluster.rules
    rules:
    - alert: HighCPUUsage
      expr: 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "High CPU usage on {{ $labels.instance }}"
        description: "CPU usage is above 80% for more than 5 minutes on {{ $labels.instance }}"
    
    - alert: HighMemoryUsage
      expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "High memory usage on {{ $labels.instance }}"
        description: "Memory usage is above 85% for more than 5 minutes on {{ $labels.instance }}"
    
    - alert: PodCrashLooping
      expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Pod {{ $labels.pod }} is crash looping"
        description: "Pod {{ $labels.pod }} in namespace {{ $labels.namespace }} is restarting frequently"
    
    - alert: NodeNotReady
      expr: kube_node_status_condition{condition="Ready",status="true"} == 0
      for: 10m
      labels:
        severity: critical
      annotations:
        summary: "Node {{ $labels.node }} is not ready"
        description: "Node {{ $labels.node }} has been not ready for more than 10 minutes"

kubectl apply -f custom-alerts.yaml

Configure persistent storage for metrics

Set up proper storage classes and volume claims for long-term metrics retention.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: prometheus-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

kubectl apply -f storage-class.yaml

Configure Grafana dashboards for Kubernetes cluster visualization

Access Grafana interface

Open your browser and navigate to the Grafana interface using the port forward we set up earlier.

# Access Grafana at http://localhost:3000
Username: admin
Password: SecureGrafanaPass123! (from our values file)

Import pre-built Kubernetes dashboards

Import community dashboards for comprehensive cluster visualization. These provide immediate insights into cluster health.

# Popular dashboard IDs to import:
315 - Kubernetes cluster monitoring
8588 - 1 Node Exporter for Prometheus Dashboard
7249 - Kubernetes Cluster
6417 - Kubernetes cluster overview

Create custom dashboard for application metrics

Build a specialized dashboard for monitoring your specific applications and services running in the cluster.

{
  "dashboard": {
    "title": "Application Metrics Dashboard",
    "panels": [
      {
        "title": "HTTP Request Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "sum(rate(http_requests_total[5m])) by (service)",
            "legendFormat": "{{service}}"
          }
        ]
      },
      {
        "title": "HTTP Request Duration",
        "type": "graph",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))",
            "legendFormat": "95th percentile - {{service}}"
          }
        ]
      },
      {
        "title": "Database Connection Pool",
        "type": "stat",
        "targets": [
          {
            "expr": "database_connections_active / database_connections_max * 100",
            "legendFormat": "Pool Usage %"
          }
        ]
      }
    ]
  }
}

Set up Alertmanager rules and notifications

Configure Slack notifications

Set up Slack webhook integration for real-time alerts to your team channels.

apiVersion: v1
kind: Secret
metadata:
  name: alertmanager-slack-config
  namespace: monitoring
stringData:
  alertmanager.yml: |
    global:
      slack_api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
    
    route:
      group_by: ['alertname']
      group_wait: 10s
      group_interval: 10s
      repeat_interval: 1h
      receiver: 'slack-notifications'
      routes:
      - match:
          severity: critical
        receiver: 'slack-critical'
    
    receivers:
    - name: 'slack-notifications'
      slack_configs:
      - channel: '#monitoring'
        title: 'Kubernetes Alert'
        text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'
    
    - name: 'slack-critical'
      slack_configs:
      - channel: '#critical-alerts'
        title: 'CRITICAL: Kubernetes Alert'
        text: '{{ range .Alerts }}{{ .Annotations.summary }}\nDescription: {{ .Annotations.description }}{{ end }}'
        color: 'danger'

kubectl apply -f alertmanager-slack.yaml

Test alerting configuration

Verify that alerts are properly configured and firing when conditions are met.

# Check Alertmanager status
kubectl port-forward -n monitoring svc/kube-prometheus-stack-alertmanager 9093:9093

Access Alertmanager UI at http://localhost:9093
Check active alerts and verify routing configuration

Create runbook annotations

Add detailed runbook links and troubleshooting steps to your alerting rules.

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: runbook-alerts
  namespace: monitoring
  labels:
    app: kube-prometheus-stack
    release: kube-prometheus-stack
spec:
  groups:
  - name: runbook.rules
    rules:
    - alert: KubernetesPodNotReady
      expr: kube_pod_status_ready{condition="false"} == 1
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Pod {{ $labels.pod }} not ready in namespace {{ $labels.namespace }}"
        description: "Pod has been in a non-ready state for more than 5 minutes"
        runbook_url: "https://runbooks.example.com/kubernetes/pod-not-ready"
        action: "Check pod logs: kubectl logs {{ $labels.pod }} -n {{ $labels.namespace }}"
    
    - alert: KubernetesNodeDiskPressure
      expr: kube_node_status_condition{condition="DiskPressure",status="true"} == 1
      for: 2m
      labels:
        severity: critical
      annotations:
        summary: "Node {{ $labels.node }} has disk pressure"
        description: "Node is experiencing disk pressure which may affect pod scheduling"
        runbook_url: "https://runbooks.example.com/kubernetes/node-disk-pressure"
        action: "Check disk usage: kubectl describe node {{ $labels.node }}"

kubectl apply -f runbook-alerts.yaml

Verify your setup

# Check all monitoring components
kubectl get pods -n monitoring
kubectl get svc -n monitoring
kubectl get servicemonitors -n monitoring
kubectl get podmonitors -n monitoring
kubectl get prometheusrules -n monitoring

Verify Prometheus targets
kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090
Access http://localhost:9090/targets

Check Grafana dashboards
kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80
Access http://localhost:3000

Verify Alertmanager
kubectl port-forward -n monitoring svc/kube-prometheus-stack-alertmanager 9093:9093
Access http://localhost:9093

Common issues

Symptom	Cause	Fix
Pods stuck in Pending state	Insufficient cluster resources	Reduce resource requests or add more nodes
ServiceMonitor not discovering targets	Label selector mismatch	Verify labels match between ServiceMonitor and Service
Grafana dashboard shows no data	Prometheus not scraping metrics	Check Prometheus targets page for scraping errors
Alerts not firing	PrometheusRule labels missing	Ensure PrometheusRule has correct release label
Persistent volumes not mounting	StorageClass not available	Create appropriate StorageClass or use existing one
High memory usage in Prometheus	Too many time series or long retention	Reduce retention period or add resource limits

Next steps

Running this in production?

Need this managed? Setting this up once is straightforward. Keeping it patched, monitored, backed up and performant across environments is the harder part. See how we run infrastructure like this for European teams.

Automated install script

Run this to automate the entire setup

install.sh

#!/usr/bin/env bash

set -euo pipefail

# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color

# Global variables
NAMESPACE="${NAMESPACE:-monitoring}"
HELM_CHART_VERSION="${HELM_CHART_VERSION:-65.1.0}"
GRAFANA_PASSWORD="${GRAFANA_PASSWORD:-SecureGrafanaPass123!}"
PROMETHEUS_STORAGE="${PROMETHEUS_STORAGE:-50Gi}"
GRAFANA_STORAGE="${GRAFANA_STORAGE:-10Gi}"
ALERTMANAGER_STORAGE="${ALERTMANAGER_STORAGE:-10Gi}"

# Cleanup function for rollback on error
cleanup() {
    echo -e "${RED}[ERROR] Installation failed. Cleaning up...${NC}"
    helm uninstall kube-prometheus-stack -n "$NAMESPACE" 2>/dev/null || true
    kubectl delete namespace "$NAMESPACE" 2>/dev/null || true
    rm -f /tmp/prometheus-values.yaml
    rm -f /tmp/servicemonitor-example.yaml
    rm -f /tmp/podmonitor-example.yaml
    exit 1
}

trap cleanup ERR

usage() {
    echo "Usage: $0 [OPTIONS]"
    echo "Options:"
    echo "  --namespace NAME        Monitoring namespace (default: monitoring)"
    echo "  --grafana-password PWD  Grafana admin password (default: SecureGrafanaPass123!)"
    echo "  --prometheus-storage S  Prometheus storage size (default: 50Gi)"
    echo "  --grafana-storage S     Grafana storage size (default: 10Gi)"
    echo "  --alertmanager-storage S Alertmanager storage size (default: 10Gi)"
    echo "  --help                  Show this help message"
    exit 1
}

print_info() {
    echo -e "${BLUE}[INFO] $1${NC}"
}

print_success() {
    echo -e "${GREEN}[SUCCESS] $1${NC}"
}

print_warning() {
    echo -e "${YELLOW}[WARNING] $1${NC}"
}

print_error() {
    echo -e "${RED}[ERROR] $1${NC}"
}

# Parse command line arguments
while [[ $# -gt 0 ]]; do
    case $1 in
        --namespace)
            NAMESPACE="$2"
            shift 2
            ;;
        --grafana-password)
            GRAFANA_PASSWORD="$2"
            shift 2
            ;;
        --prometheus-storage)
            PROMETHEUS_STORAGE="$2"
            shift 2
            ;;
        --grafana-storage)
            GRAFANA_STORAGE="$2"
            shift 2
            ;;
        --alertmanager-storage)
            ALERTMANAGER_STORAGE="$2"
            shift 2
            ;;
        --help)
            usage
            ;;
        *)
            print_error "Unknown option: $1"
            usage
            ;;
    esac
done

# Check if running as root or with sudo
if [[ $EUID -ne 0 ]]; then
    print_error "This script must be run as root or with sudo"
    exit 1
fi

# Auto-detect distribution
if [ -f /etc/os-release ]; then
    . /etc/os-release
    case "$ID" in
        ubuntu|debian)
            PKG_MGR="apt"
            PKG_INSTALL="apt install -y"
            PKG_UPDATE="apt update && apt upgrade -y"
            ;;
        almalinux|rocky|centos|rhel|ol)
            PKG_MGR="dnf"
            PKG_INSTALL="dnf install -y"
            PKG_UPDATE="dnf update -y"
            ;;
        fedora)
            PKG_MGR="dnf"
            PKG_INSTALL="dnf install -y"
            PKG_UPDATE="dnf update -y"
            ;;
        amzn)
            PKG_MGR="yum"
            PKG_INSTALL="yum install -y"
            PKG_UPDATE="yum update -y"
            ;;
        *)
            print_error "Unsupported distribution: $ID"
            exit 1
            ;;
    esac
else
    print_error "Cannot detect distribution. /etc/os-release not found."
    exit 1
fi

print_info "Detected distribution: $PRETTY_NAME"
print_info "Package manager: $PKG_MGR"

# Check prerequisites
echo "[1/10] Checking prerequisites..."
if ! command -v kubectl &> /dev/null; then
    print_error "kubectl is not installed. Please install kubectl first."
    exit 1
fi

if ! kubectl cluster-info &> /dev/null; then
    print_error "kubectl cannot connect to Kubernetes cluster. Please configure kubectl first."
    exit 1
fi

print_success "Prerequisites check passed"

# Update system packages
echo "[2/10] Updating system packages..."
$PKG_UPDATE
$PKG_INSTALL curl wget git
print_success "System packages updated"

# Install Helm
echo "[3/10] Installing Helm package manager..."
if ! command -v helm &> /dev/null; then
    curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
    print_success "Helm installed successfully"
else
    print_info "Helm is already installed"
fi

helm version --short

# Add Prometheus Community Helm repository
echo "[4/10] Adding Prometheus Community Helm repository..."
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
print_success "Helm repository added and updated"

# Create monitoring namespace
echo "[5/10] Creating monitoring namespace..."
if kubectl get namespace "$NAMESPACE" &> /dev/null; then
    print_warning "Namespace $NAMESPACE already exists"
else
    kubectl create namespace "$NAMESPACE"
    print_success "Namespace $NAMESPACE created"
fi

# Create Prometheus Operator values file
echo "[6/10] Creating Prometheus Operator configuration..."
cat > /tmp/prometheus-values.yaml << EOF
prometheus:
  prometheusSpec:
    retention: 30d
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: standard
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: $PROMETHEUS_STORAGE
    resources:
      requests:
        memory: 2Gi
        cpu: 500m
      limits:
        memory: 4Gi
        cpu: 2

grafana:
  adminPassword: $GRAFANA_PASSWORD
  persistence:
    enabled: true
    storageClassName: standard
    size: $GRAFANA_STORAGE
  resources:
    requests:
      memory: 256Mi
      cpu: 100m
    limits:
      memory: 512Mi
      cpu: 200m

alertmanager:
  alertmanagerSpec:
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: standard
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: $ALERTMANAGER_STORAGE
    resources:
      requests:
        memory: 256Mi
        cpu: 100m
      limits:
        memory: 512Mi
        cpu: 200m

kubeStateMetrics:
  enabled: true

nodeExporter:
  enabled: true

prometheusNodeExporter:
  hostRootFsMount:
    enabled: false
EOF
chmod 644 /tmp/prometheus-values.yaml
print_success "Configuration file created"

# Install Prometheus Operator with Helm
echo "[7/10] Installing Prometheus Operator with Helm..."
helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
    --namespace "$NAMESPACE" \
    --values /tmp/prometheus-values.yaml \
    --version "$HELM_CHART_VERSION" \
    --wait \
    --timeout 10m
print_success "Prometheus Operator installed successfully"

# Wait for pods to be ready
echo "[8/10] Waiting for monitoring components to be ready..."
kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=prometheus -n "$NAMESPACE" --timeout=300s
kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=grafana -n "$NAMESPACE" --timeout=300s
kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=alertmanager -n "$NAMESPACE" --timeout=300s
print_success "All monitoring components are ready"

# Create example ServiceMonitor
echo "[9/10] Creating example ServiceMonitor..."
cat > /tmp/servicemonitor-example.yaml << EOF
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: nginx-service-monitor
  namespace: $NAMESPACE
  labels:
    app: nginx
    release: kube-prometheus-stack
spec:
  selector:
    matchLabels:
      app: nginx
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics
  namespaceSelector:
    matchNames:
    - default
    - production
EOF
chmod 644 /tmp/servicemonitor-example.yaml

# Create example PodMonitor
cat > /tmp/podmonitor-example.yaml << EOF
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: app-pod-monitor
  namespace: $NAMESPACE
  labels:
    app: myapp
    release: kube-prometheus-stack
spec:
  selector:
    matchLabels:
      app: myapp
  podMetricsEndpoints:
  - port: metrics
    interval: 30s
    path: /metrics
  namespaceSelector:
    matchNames:
    - default
    - production
EOF
chmod 644 /tmp/podmonitor-example.yaml
print_success "Example monitoring configurations created"

# Verify installation
echo "[10/10] Verifying Prometheus Operator installation..."
kubectl get pods -n "$NAMESPACE"
kubectl get svc -n "$NAMESPACE"

# Get service endpoints
PROMETHEUS_SVC=$(kubectl get svc -n "$NAMESPACE" | grep prometheus-operated | awk '{print $1}' | head -1)
GRAFANA_SVC=$(kubectl get svc -n "$NAMESPACE" | grep grafana | awk '{print $1}' | head -1)
ALERTMANAGER_SVC=$(kubectl get svc -n "$NAMESPACE" | grep alertmanager-operated | awk '{print $1}' | head -1)

print_success "Prometheus Operator installation completed successfully!"
echo ""
print_info "=== Access Information ==="
print_info "Namespace: $NAMESPACE"
print_info "Prometheus service: $PROMETHEUS_SVC"
print_info "Grafana service: $GRAFANA_SVC (admin password: $GRAFANA_PASSWORD)"
print_info "Alertmanager service: $ALERTMANAGER_SVC"
echo ""
print_info "To access services locally, use port-forward:"
print_info "kubectl port-forward -n $NAMESPACE svc/$GRAFANA_SVC 3000:80"
print_info "kubectl port-forward -n $NAMESPACE svc/$PROMETHEUS_SVC 9090:9090"
print_info "kubectl port-forward -n $NAMESPACE svc/$ALERTMANAGER_SVC 9093:9093"
echo ""
print_info "Example configurations saved to:"
print_info "- ServiceMonitor: /tmp/servicemonitor-example.yaml"
print_info "- PodMonitor: /tmp/podmonitor-example.yaml"

# Cleanup temporary values file
rm -f /tmp/prometheus-values.yaml

Review the script before running. Execute with: bash install.sh

#prometheus #kubernetes #grafana #alertmanager #monitoring

Monitor Kubernetes cluster with Prometheus Operator for comprehensive observability

Prerequisites

What this solves

Step-by-step installation

Update system packages and install prerequisites

Install Helm package manager

Add Prometheus Community Helm repository

Create monitoring namespace

Create Prometheus Operator values file

Install Prometheus Operator with Helm

Verify Prometheus Operator installation

Create ServiceMonitor for application monitoring

Create PodMonitor for pod-level monitoring

Configure Grafana access

Configure custom Grafana dashboard

Create Alertmanager configuration

Create custom PrometheusRule for alerting

Configure persistent storage for metrics

Configure Grafana dashboards for Kubernetes cluster visualization

Access Grafana interface

Username: admin

Password: SecureGrafanaPass123! (from our values file)

Import pre-built Kubernetes dashboards

315 - Kubernetes cluster monitoring

8588 - 1 Node Exporter for Prometheus Dashboard

7249 - Kubernetes Cluster

6417 - Kubernetes cluster overview

Create custom dashboard for application metrics

Set up Alertmanager rules and notifications

Configure Slack notifications

Test alerting configuration

Access Alertmanager UI at http://localhost:9093

Check active alerts and verify routing configuration

Create runbook annotations

Verify your setup

Verify Prometheus targets

Access http://localhost:9090/targets

Check Grafana dashboards

Access http://localhost:3000

Verify Alertmanager

Access http://localhost:9093

Common issues

Next steps

Running this in production?

Related tutorials

Configure Consul Connect service mesh monitoring with distributed tracing

Configure OpenTelemetry custom metrics for application monitoring with Prometheus and Grafana

Configure Jaeger with Elasticsearch backend security and encryption

Don't want to manage this yourself?

`Password: SecureGrafanaPass123! (from our values file)`

`6417 - Kubernetes cluster overview`

`Check active alerts and verify routing configuration`

`Access http://localhost:9093`