Deploy a production-ready Prometheus monitoring stack on Kubernetes using Helm charts with ServiceMonitors, alerting rules, and comprehensive cluster observability for metrics collection and monitoring.
Prerequisites
- Kubernetes cluster with admin access
- kubectl configured
- At least 8GB RAM and 4 CPU cores available
- Storage provisioner configured
- Internet access for Helm charts
What this solves
Kubernetes clusters generate massive amounts of metrics from nodes, pods, services, and applications that need centralized monitoring and alerting. This tutorial shows you how to deploy a complete Prometheus monitoring stack using Helm charts for production-grade cluster observability. You'll configure ServiceMonitors to automatically discover and scrape metrics, set up alerting rules for proactive issue detection, and establish a foundation for comprehensive Kubernetes monitoring.
Step-by-step installation
Update system packages and install prerequisites
Start by updating your system and installing required packages for Kubernetes operations.
sudo apt update && sudo apt upgrade -y
sudo apt install -y curl wget git
Verify Kubernetes cluster access
Ensure your kubectl is configured and you have admin access to your Kubernetes cluster.
kubectl cluster-info
kubectl get nodes
kubectl auth can-i '' '' --all-namespaces
Install Helm 3
Download and install Helm 3 for Kubernetes package management. Skip this step if you already have Helm 3 installed.
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
helm version --short
For detailed Helm configuration with security features, see our comprehensive Helm 3 setup guide.
Add Prometheus Community Helm repository
Add the official Prometheus Community Helm repository that contains the kube-prometheus-stack chart.
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm search repo prometheus-community/kube-prometheus-stack
Create monitoring namespace
Create a dedicated namespace for your monitoring stack to isolate monitoring components.
kubectl create namespace monitoring
kubectl label namespace monitoring name=monitoring
Create Prometheus values configuration
Create a custom values file to configure Prometheus, Grafana, and AlertManager for your environment.
prometheus:
prometheusSpec:
retention: 30d
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: standard
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50Gi
resources:
limits:
cpu: 2000m
memory: 4Gi
requests:
cpu: 1000m
memory: 2Gi
additionalScrapeConfigs:
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
grafana:
enabled: true
adminPassword: "SecureAdminPass123!"
persistence:
enabled: true
storageClassName: standard
size: 10Gi
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 250m
memory: 512Mi
grafana.ini:
security:
disable_gravatar: true
users:
allow_sign_up: false
auth.anonymous:
enabled: false
alertmanager:
alertmanagerSpec:
storage:
volumeClaimTemplate:
spec:
storageClassName: standard
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
resources:
limits:
cpu: 200m
memory: 512Mi
requests:
cpu: 100m
memory: 256Mi
kubeStateMetrics:
enabled: true
nodeExporter:
enabled: true
kubelet:
enabled: true
serviceMonitor:
interval: 30s
Deploy Prometheus monitoring stack
Install the complete Prometheus stack using Helm with your custom configuration.
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--values prometheus-values.yaml \
--version 65.1.1
kubectl --namespace monitoring get pods -l "release=prometheus"
Configure service exposure
Create NodePort or LoadBalancer services to access Prometheus, Grafana, and AlertManager UIs.
---
apiVersion: v1
kind: Service
metadata:
name: prometheus-server-nodeport
namespace: monitoring
spec:
type: NodePort
ports:
- port: 9090
targetPort: 9090
nodePort: 30090
selector:
app.kubernetes.io/name: prometheus
prometheus: prometheus-kube-prometheus-prometheus
---
apiVersion: v1
kind: Service
metadata:
name: grafana-nodeport
namespace: monitoring
spec:
type: NodePort
ports:
- port: 80
targetPort: 3000
nodePort: 30091
selector:
app.kubernetes.io/name: grafana
---
apiVersion: v1
kind: Service
metadata:
name: alertmanager-nodeport
namespace: monitoring
spec:
type: NodePort
ports:
- port: 9093
targetPort: 9093
nodePort: 30092
selector:
app.kubernetes.io/name: alertmanager
kubectl apply -f monitoring-services.yaml
Create custom ServiceMonitor for application monitoring
Configure a ServiceMonitor to automatically discover and scrape metrics from your applications.
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: webapp-metrics
namespace: monitoring
labels:
app: webapp
release: prometheus
spec:
selector:
matchLabels:
app: webapp
metrics: enabled
endpoints:
- port: metrics
interval: 30s
path: /metrics
scrapeTimeout: 10s
namespaceSelector:
any: true
kubectl apply -f app-servicemonitor.yaml
Configure alerting rules
Create PrometheusRule custom resources to define alerting conditions for cluster and application monitoring.
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: cluster-monitoring-rules
namespace: monitoring
labels:
app: kube-prometheus-stack
release: prometheus
spec:
groups:
- name: cluster.rules
rules:
- alert: NodeDown
expr: up{job="node-exporter"} == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Node {{ $labels.instance }} is down"
description: "Node {{ $labels.instance }} has been down for more than 5 minutes."
- alert: HighCPUUsage
expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 10m
labels:
severity: warning
annotations:
summary: "High CPU usage on {{ $labels.instance }}"
description: "CPU usage is above 80% for more than 10 minutes on {{ $labels.instance }}."
- alert: HighMemoryUsage
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
for: 10m
labels:
severity: warning
annotations:
summary: "High memory usage on {{ $labels.instance }}"
description: "Memory usage is above 90% for more than 10 minutes on {{ $labels.instance }}."
- alert: PodCrashLooping
expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
for: 5m
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} is crash looping"
description: "Pod {{ $labels.namespace }}/{{ $labels.pod }} has been restarting frequently."
- alert: PersistentVolumeUsageHigh
expr: 100 * (kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes) > 85
for: 5m
labels:
severity: warning
annotations:
summary: "PV usage high on {{ $labels.persistentvolumeclaim }}"
description: "Persistent Volume {{ $labels.persistentvolumeclaim }} usage is above 85%."
- name: kubernetes.rules
rules:
- alert: KubernetesAPIServerDown
expr: up{job="kubernetes-apiservers"} == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Kubernetes API server is down"
description: "Kubernetes API server has been down for more than 5 minutes."
- alert: KubeletDown
expr: up{job="kubernetes-nodes"} == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Kubelet on {{ $labels.instance }} is down"
description: "Kubelet on node {{ $labels.instance }} has been down for more than 5 minutes."
kubectl apply -f cluster-alerts.yaml
Configure AlertManager for notifications
Update AlertManager configuration to send notifications via email, Slack, or other channels.
apiVersion: v1
kind: Secret
metadata:
name: alertmanager-prometheus-kube-prometheus-alertmanager
namespace: monitoring
type: Opaque
stringData:
alertmanager.yml: |
global:
smtp_smarthost: 'smtp.example.com:587'
smtp_from: 'alerts@example.com'
smtp_auth_username: 'alerts@example.com'
smtp_auth_password: 'smtp-password-here'
route:
group_by: ['alertname', 'cluster', 'service']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'web.hook'
routes:
- match:
severity: critical
receiver: 'critical-alerts'
- match:
severity: warning
receiver: 'warning-alerts'
receivers:
- name: 'web.hook'
email_configs:
- to: 'admin@example.com'
subject: '[ALERT] {{ .GroupLabels.alertname }}'
body: |
{{ range .Alerts }}
Alert: {{ .Annotations.summary }}
Description: {{ .Annotations.description }}
{{ end }}
- name: 'critical-alerts'
email_configs:
- to: 'critical-alerts@example.com'
subject: '[CRITICAL] {{ .GroupLabels.alertname }}'
body: |
CRITICAL ALERT
{{ range .Alerts }}
Alert: {{ .Annotations.summary }}
Description: {{ .Annotations.description }}
{{ end }}
- name: 'warning-alerts'
email_configs:
- to: 'warnings@example.com'
subject: '[WARNING] {{ .GroupLabels.alertname }}'
body: |
{{ range .Alerts }}
Alert: {{ .Annotations.summary }}
Description: {{ .Annotations.description }}
{{ end }}
kubectl apply -f alertmanager-config.yaml
Verify your setup
Check that all components are running and accessible, then verify metrics collection and alerting functionality.
# Check all monitoring pods are running
kubectl get pods -n monitoring
Verify Prometheus targets are being scraped
kubectl port-forward -n monitoring svc/prometheus-operated 9090:9090 &
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | select(.health != "up") | .labels'
Check ServiceMonitors are discovered
kubectl get servicemonitors -n monitoring
Verify PrometheusRules are loaded
kubectl get prometheusrules -n monitoring
Test Grafana access
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80 &
curl -s http://admin:SecureAdminPass123!@localhost:3000/api/health
Check AlertManager configuration
kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-alertmanager 9093:9093 &
curl -s http://localhost:9093/api/v1/status
Access your monitoring interfaces:
- Prometheus: http://your-node-ip:30090
- Grafana: http://your-node-ip:30091 (admin/SecureAdminPass123!)
- AlertManager: http://your-node-ip:30092
Common issues
| Symptom | Cause | Fix |
|---|---|---|
| Pods stuck in Pending | Insufficient cluster resources | Reduce resource requests in values.yaml or add more nodes |
| ServiceMonitor not discovering targets | Label selector mismatch | Verify service labels match ServiceMonitor selector |
| Persistent volumes not provisioning | Missing StorageClass | Create default StorageClass or specify existing one |
| Alerts not firing | PrometheusRule labels missing | Ensure PrometheusRule has correct labels matching Prometheus selector |
| Grafana dashboards missing data | Prometheus datasource misconfigured | Check datasource URL points to prometheus-operated service |
| High memory usage on Prometheus | Too many metrics or long retention | Reduce retention period or add resource limits |
Next steps
- Monitor Docker containers with Prometheus and Grafana for additional container monitoring
- Monitor Istio service mesh with Prometheus and Grafana for service mesh observability
- Configure Prometheus long-term storage with Thanos for scalable metrics retention
- Implement custom Prometheus exporters for application metrics to monitor your applications
- Configure advanced Grafana dashboards and alerting for enhanced visualization
Automated install script
Run this to automate the entire setup
#!/usr/bin/env bash
set -euo pipefail
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color
# Default values
PROMETHEUS_NAMESPACE="monitoring"
GRAFANA_PASSWORD="SecureAdminPass123!"
STORAGE_CLASS="standard"
PROMETHEUS_STORAGE="50Gi"
GRAFANA_STORAGE="10Gi"
ALERTMANAGER_STORAGE="10Gi"
# Usage function
usage() {
echo "Usage: $0 [OPTIONS]"
echo "Options:"
echo " -n, --namespace NAME Monitoring namespace (default: monitoring)"
echo " -p, --password PASS Grafana admin password (default: SecureAdminPass123!)"
echo " -s, --storage-class SC Storage class (default: standard)"
echo " -h, --help Show this help message"
exit 1
}
# Parse command line arguments
while [[ $# -gt 0 ]]; do
case $1 in
-n|--namespace)
PROMETHEUS_NAMESPACE="$2"
shift 2
;;
-p|--password)
GRAFANA_PASSWORD="$2"
shift 2
;;
-s|--storage-class)
STORAGE_CLASS="$2"
shift 2
;;
-h|--help)
usage
;;
*)
echo -e "${RED}Error: Unknown option $1${NC}"
usage
;;
esac
done
# Error handling
cleanup() {
echo -e "${RED}Installation failed. Cleaning up...${NC}"
kubectl delete namespace "$PROMETHEUS_NAMESPACE" --ignore-not-found=true
helm repo remove prometheus-community 2>/dev/null || true
rm -f /tmp/prometheus-values.yaml
exit 1
}
trap cleanup ERR
log_success() {
echo -e "${GREEN}✓ $1${NC}"
}
log_warning() {
echo -e "${YELLOW}⚠ $1${NC}"
}
log_error() {
echo -e "${RED}✗ $1${NC}"
}
# Detect OS distribution
if [ -f /etc/os-release ]; then
. /etc/os-release
case "$ID" in
ubuntu|debian)
PKG_MGR="apt"
PKG_UPDATE="apt update"
PKG_INSTALL="apt install -y"
PKG_UPGRADE="apt upgrade -y"
;;
almalinux|rocky|centos|rhel|ol|fedora)
PKG_MGR="dnf"
PKG_UPDATE="dnf check-update || true"
PKG_INSTALL="dnf install -y"
PKG_UPGRADE="dnf update -y"
;;
amzn)
PKG_MGR="yum"
PKG_UPDATE="yum check-update || true"
PKG_INSTALL="yum install -y"
PKG_UPGRADE="yum update -y"
;;
*)
log_error "Unsupported distribution: $ID"
exit 1
;;
esac
else
log_error "Cannot detect OS distribution"
exit 1
fi
echo "[1/8] Checking prerequisites..."
# Check if running as root or with sudo
if [[ $EUID -eq 0 ]]; then
SUDO=""
else
if ! command -v sudo &> /dev/null; then
log_error "This script requires sudo privileges"
exit 1
fi
SUDO="sudo"
fi
echo "[2/8] Updating system packages..."
$SUDO $PKG_UPDATE
$SUDO $PKG_UPGRADE
$SUDO $PKG_INSTALL curl wget git
log_success "System packages updated"
echo "[3/8] Verifying Kubernetes cluster access..."
if ! command -v kubectl &> /dev/null; then
log_error "kubectl not found. Please install kubectl first"
exit 1
fi
kubectl cluster-info > /dev/null 2>&1 || {
log_error "Cannot connect to Kubernetes cluster"
exit 1
}
kubectl get nodes > /dev/null 2>&1 || {
log_error "Cannot list cluster nodes"
exit 1
}
if ! kubectl auth can-i '*' '*' --all-namespaces > /dev/null 2>&1; then
log_warning "Limited cluster permissions detected. Some features may not work"
fi
log_success "Kubernetes cluster access verified"
echo "[4/8] Installing Helm 3..."
if command -v helm &> /dev/null; then
HELM_VERSION=$(helm version --short 2>/dev/null | grep -o 'v[0-9]\+' | head -1)
if [[ "$HELM_VERSION" == "v3" ]]; then
log_success "Helm 3 already installed"
else
log_warning "Helm 2 detected, installing Helm 3..."
curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
fi
else
curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
fi
helm version --short > /dev/null 2>&1 || {
log_error "Helm installation failed"
exit 1
}
log_success "Helm 3 installed successfully"
echo "[5/8] Adding Prometheus Community Helm repository..."
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm search repo prometheus-community/kube-prometheus-stack > /dev/null
log_success "Prometheus Community Helm repository added"
echo "[6/8] Creating monitoring namespace..."
kubectl create namespace "$PROMETHEUS_NAMESPACE" --dry-run=client -o yaml | kubectl apply -f -
kubectl label namespace "$PROMETHEUS_NAMESPACE" name="$PROMETHEUS_NAMESPACE" --overwrite
log_success "Monitoring namespace created: $PROMETHEUS_NAMESPACE"
echo "[7/8] Creating Prometheus configuration..."
cat > /tmp/prometheus-values.yaml << EOF
prometheus:
prometheusSpec:
retention: 30d
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: $STORAGE_CLASS
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: $PROMETHEUS_STORAGE
resources:
limits:
cpu: 2000m
memory: 4Gi
requests:
cpu: 1000m
memory: 2Gi
grafana:
enabled: true
adminPassword: "$GRAFANA_PASSWORD"
persistence:
enabled: true
storageClassName: $STORAGE_CLASS
size: $GRAFANA_STORAGE
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 250m
memory: 512Mi
grafana.ini:
security:
disable_gravatar: true
users:
allow_sign_up: false
auth.anonymous:
enabled: false
alertmanager:
alertmanagerSpec:
storage:
volumeClaimTemplate:
spec:
storageClassName: $STORAGE_CLASS
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: $ALERTMANAGER_STORAGE
resources:
limits:
cpu: 200m
memory: 512Mi
requests:
cpu: 100m
memory: 256Mi
kubeStateMetrics:
enabled: true
nodeExporter:
enabled: true
kubelet:
enabled: true
serviceMonitor:
interval: 30s
EOF
chmod 644 /tmp/prometheus-values.yaml
log_success "Prometheus configuration created"
echo "[8/8] Deploying Prometheus monitoring stack..."
helm upgrade --install prometheus prometheus-community/kube-prometheus-stack \
--namespace "$PROMETHEUS_NAMESPACE" \
--values /tmp/prometheus-values.yaml \
--wait --timeout=10m
log_success "Prometheus monitoring stack deployed successfully"
echo ""
echo "Verifying installation..."
kubectl get pods -n "$PROMETHEUS_NAMESPACE" --no-headers | while read -r pod status ready restarts age; do
if [[ "$status" == "Running" ]]; then
log_success "Pod $pod is running"
else
log_warning "Pod $pod status: $status"
fi
done
echo ""
log_success "Kubernetes monitoring with Prometheus installed successfully!"
echo ""
echo "Access URLs:"
echo "• Prometheus: kubectl port-forward -n $PROMETHEUS_NAMESPACE svc/prometheus-kube-prometheus-prometheus 9090:9090"
echo "• Grafana: kubectl port-forward -n $PROMETHEUS_NAMESPACE svc/prometheus-grafana 3000:80"
echo "• AlertManager: kubectl port-forward -n $PROMETHEUS_NAMESPACE svc/prometheus-kube-prometheus-alertmanager 9093:9093"
echo ""
echo "Grafana credentials:"
echo "• Username: admin"
echo "• Password: $GRAFANA_PASSWORD"
echo ""
echo "Clean up temporary files..."
rm -f /tmp/prometheus-values.yaml
Review the script before running. Execute with: bash install.sh