Monitor Kubernetes network policies with Prometheus and Grafana for enhanced cluster security

Advanced 45 min May 29, 2026 120 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Set up comprehensive monitoring for Kubernetes network policies using Prometheus and Grafana. Configure CNI metrics collection, create security dashboards, and implement alerting for policy violations and traffic anomalies.

Prerequisites

  • Running Kubernetes cluster with CNI (Cilium or Calico)
  • kubectl configured with cluster admin access
  • Helm 3 installed
  • Basic understanding of Kubernetes networking

What this solves

Kubernetes network policies control traffic between pods but provide limited visibility into enforcement and violations. This tutorial sets up comprehensive monitoring using Prometheus to collect CNI metrics from Cilium or Calico, Grafana dashboards for network policy visualization, and alerting rules for security violations. You'll gain real-time insights into allowed and denied connections, policy effectiveness, and potential security threats.

Prerequisites

Before starting, ensure you have a running Kubernetes cluster with either Cilium or Calico CNI installed. You'll need cluster admin access and basic familiarity with Kubernetes network policies. This tutorial assumes you have kubectl configured and can deploy resources to your cluster.

Step-by-step configuration

Install Prometheus Operator

Deploy the Prometheus Operator to manage Prometheus instances and monitoring resources in your cluster.

kubectl create namespace monitoring

Add the Prometheus community Helm repository and install the operator:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus-operator prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false \
  --set prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues=false

Configure Cilium metrics (if using Cilium)

Enable Cilium metrics collection for network policy monitoring. Create a Cilium configuration to expose policy enforcement metrics:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cilium-config
  namespace: kube-system
data:
  enable-prometheus-serve: "true"
  prometheus-serve-addr: ":9962"
  operator-prometheus-serve-addr: ":9963"
  enable-policy-verdict: "true"
  monitor-aggregation: medium
  enable-hubble: "true"
  hubble-metrics-server: ":9965"
  hubble-metrics: "dns,drop,tcp,flow,icmp,http"

Apply the configuration and restart Cilium:

kubectl apply -f cilium-metrics.yaml
kubectl rollout restart daemonset/cilium -n kube-system

Configure Calico metrics (if using Calico)

For Calico deployments, enable Felix metrics to monitor network policy enforcement:

apiVersion: v1
kind: ConfigMap
metadata:
  name: calico-config
  namespace: kube-system
data:
  calico_backend: "bird"
  cluster_type: "k8s,bgp"
  felix_prometheusmetricsenabled: "true"
  felix_prometheusmetricsport: "9091"
  felix_reportingintervalsecs: "0"
  felix_reportingttlsecs: "0"

Apply the configuration:

kubectl apply -f calico-metrics.yaml
kubectl rollout restart daemonset/calico-node -n kube-system

Create ServiceMonitor for CNI metrics

Configure Prometheus to scrape CNI metrics by creating appropriate ServiceMonitor resources. For Cilium:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: cilium-agent
  namespace: monitoring
spec:
  selector:
    matchLabels:
      k8s-app: cilium
  endpoints:
  - port: prometheus
    interval: 30s
    path: /metrics
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: hubble-metrics
  namespace: monitoring
spec:
  selector:
    matchLabels:
      k8s-app: hubble-metrics
  endpoints:
  - port: hubble-metrics
    interval: 30s
    path: /metrics

For Calico, create this ServiceMonitor:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: calico-felix
  namespace: monitoring
spec:
  selector:
    matchLabels:
      k8s-app: calico-node
  endpoints:
  - port: calico-metrics
    interval: 30s
    path: /metrics

Apply the appropriate ServiceMonitor:

kubectl apply -f cilium-servicemonitor.yaml

OR for Calico:

kubectl apply -f calico-servicemonitor.yaml

Create Grafana dashboards

Import network policy monitoring dashboards to visualize CNI metrics. Create a ConfigMap with dashboard JSON:

apiVersion: v1
kind: ConfigMap
metadata:
  name: network-policy-dashboard
  namespace: monitoring
  labels:
    grafana_dashboard: "1"
data:
  network-policy.json: |
    {
      "dashboard": {
        "id": null,
        "title": "Network Policy Monitoring",
        "tags": ["kubernetes", "network", "security"],
        "timezone": "browser",
        "panels": [
          {
            "id": 1,
            "title": "Policy Verdicts",
            "type": "stat",
            "targets": [
              {
                "expr": "sum(rate(cilium_policy_verdict_total[5m])) by (verdict)",
                "legendFormat": "{{verdict}}"
              }
            ],
            "fieldConfig": {
              "defaults": {
                "color": {
                  "mode": "palette-classic"
                }
              }
            },
            "gridPos": {"h": 8, "w": 12, "x": 0, "y": 0}
          },
          {
            "id": 2,
            "title": "Denied Connections by Source",
            "type": "table",
            "targets": [
              {
                "expr": "topk(10, sum(rate(cilium_policy_verdict_total{verdict=\"DENIED\"}[5m])) by (source, destination))",
                "format": "table",
                "instant": true
              }
            ],
            "gridPos": {"h": 8, "w": 12, "x": 12, "y": 0}
          }
        ],
        "time": {
          "from": "now-1h",
          "to": "now"
        },
        "refresh": "30s"
      }
    }

Apply the dashboard:

kubectl apply -f network-policy-dashboard.yaml

Configure alerting rules

Create Prometheus alerting rules for network policy violations and security events:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: network-policy-alerts
  namespace: monitoring
spec:
  groups:
  - name: network-policy.rules
    interval: 30s
    rules:
    - alert: HighPolicyViolationRate
      expr: rate(cilium_policy_verdict_total{verdict="DENIED"}[5m]) > 10
      for: 2m
      labels:
        severity: warning
        component: network-policy
      annotations:
        summary: "High network policy violation rate detected"
        description: "Network policy is denying {{ $value }} connections per second for {{ $labels.source }} to {{ $labels.destination }}"
    
    - alert: SuspiciousTrafficPattern
      expr: |
        (
          rate(cilium_policy_verdict_total{verdict="DENIED"}[10m]) > 
          rate(cilium_policy_verdict_total{verdict="DENIED"}[1h] offset 1h) * 3
        )
      for: 5m
      labels:
        severity: critical
        component: network-policy
      annotations:
        summary: "Suspicious traffic pattern detected"
        description: "Denied connection rate has increased 3x compared to the same time yesterday"
    
    - alert: NetworkPolicyEngineDown
      expr: up{job="cilium-agent"} == 0
      for: 1m
      labels:
        severity: critical
        component: network-policy
      annotations:
        summary: "CNI policy engine is down"
        description: "Cilium agent on {{ $labels.instance }} has been down for more than 1 minute"
    
    - alert: PolicyLoadErrors
      expr: rate(cilium_policy_regeneration_total{outcome="fail"}[5m]) > 0
      for: 2m
      labels:
        severity: warning
        component: network-policy
      annotations:
        summary: "Network policy load errors"
        description: "Policy regeneration failures detected on {{ $labels.instance }}"
    
    - alert: UnexpectedAllowedConnections
      expr: |
        rate(cilium_policy_verdict_total{verdict="ALLOWED"}[5m]) > 
        quantile_over_time(0.95, rate(cilium_policy_verdict_total{verdict="ALLOWED"}[5m])[7d:1h]) * 1.5
      for: 10m
      labels:
        severity: warning
        component: network-policy
      annotations:
        summary: "Unusual allowed connection volume"
        description: "Allowed connections are 50% higher than the 95th percentile over the last week"

Apply the alerting rules:

kubectl apply -f network-policy-alerts.yaml

Configure Alertmanager for notifications

Set up Alertmanager to send notifications for network policy alerts. You can integrate with Slack notifications or email:

apiVersion: v1
kind: Secret
metadata:
  name: alertmanager-main
  namespace: monitoring
stringData:
  alertmanager.yml: |
    global:
      smtp_smarthost: 'localhost:587'
      smtp_from: 'alerts@example.com'
    
    route:
      group_by: ['alertname', 'component']
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 12h
      receiver: 'network-security-team'
      routes:
      - match:
          component: network-policy
        receiver: 'network-security-team'
        group_wait: 10s
        repeat_interval: 5m
    
    receivers:
    - name: 'network-security-team'
      email_configs:
      - to: 'security@example.com'
        subject: 'Network Policy Alert: {{ .GroupLabels.alertname }}'
        body: |
          {{ range .Alerts }}
          Alert: {{ .Annotations.summary }}
          Description: {{ .Annotations.description }}
          {{ end }}

Apply the Alertmanager configuration:

kubectl apply -f alertmanager-config.yaml

Deploy sample network policies for testing

Create test network policies to generate metrics and validate monitoring setup:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all-ingress
  namespace: default
spec:
  podSelector: {}
  policyTypes:
  - Ingress
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-backend
  namespace: default
spec:
  podSelector:
    matchLabels:
      app: backend
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 8080
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-frontend
spec:
  replicas: 1
  selector:
    matchLabels:
      app: frontend
  template:
    metadata:
      labels:
        app: frontend
    spec:
      containers:
      - name: frontend
        image: nginx:alpine
        ports:
        - containerPort: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-backend
spec:
  replicas: 1
  selector:
    matchLabels:
      app: backend
  template:
    metadata:
      labels:
        app: backend
    spec:
      containers:
      - name: backend
        image: nginx:alpine
        ports:
        - containerPort: 8080

Deploy the test resources:

kubectl apply -f test-network-policies.yaml

Access Grafana dashboard

Port-forward to access Grafana and view your network policy dashboards:

kubectl port-forward -n monitoring svc/prometheus-operator-grafana 3000:80

Access Grafana at http://localhost:3000. The default credentials are admin/prom-operator. Navigate to the "Network Policy Monitoring" dashboard to view metrics.

Verify your setup

Check that all monitoring components are running correctly:

# Verify Prometheus is scraping CNI metrics
kubectl port-forward -n monitoring svc/prometheus-operator-kube-p-prometheus 9090:9090

Access Prometheus at http://localhost:9090 and query for CNI metrics:

cilium_policy_verdict_total

or for Calico:

felix_cluster_num_workload_endpoints

Test network policy enforcement by generating traffic:

# Get pod names
kubectl get pods -l app=frontend -o jsonpath='{.items[0].metadata.name}'

Test allowed connection (should work)

kubectl exec -it -- wget -qO- test-backend:8080

Test denied connection (should fail)

kubectl run test-pod --image=nginx:alpine --rm -it -- wget -qO- test-backend:8080

Check that alerts are configured correctly:

kubectl get prometheusrules -n monitoring
kubectl describe prometheusrule network-policy-alerts -n monitoring

Advanced configuration

Configure custom metrics retention

Set up longer retention for network policy metrics to track security trends over time:

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: kube-prometheus-stack-kube-prom-prometheus
  namespace: monitoring
spec:
  retention: "90d"
  retentionSize: "50GB"
  storage:
    volumeClaimTemplate:
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 100Gi

Apply the retention configuration:

kubectl apply -f prometheus-retention.yaml

Create security compliance dashboard

Build a compliance-focused dashboard for security audits:

apiVersion: v1
kind: ConfigMap
metadata:
  name: compliance-dashboard
  namespace: monitoring
  labels:
    grafana_dashboard: "1"
data:
  compliance.json: |
    {
      "dashboard": {
        "title": "Network Security Compliance",
        "panels": [
          {
            "title": "Policy Coverage by Namespace",
            "type": "piechart",
            "targets": [{
              "expr": "count by (namespace) (kube_networkpolicy_info)"
            }]
          },
          {
            "title": "Monthly Violation Trends",
            "type": "graph",
            "targets": [{
              "expr": "increase(cilium_policy_verdict_total{verdict=\"DENIED\"}[30d])"
            }]
          }
        ]
      }
    }

Common issues

SymptomCauseFix
No CNI metrics in PrometheusServiceMonitor not detecting CNI podsCheck CNI pod labels match ServiceMonitor selector
Grafana dashboard shows no dataWrong metrics queries for CNI typeUse cilium_ metrics for Cilium, felix_ for Calico
Alerts not firing for policy violationsMetrics path or port misconfiguredVerify CNI metrics endpoint with kubectl port-forward
High memory usage in PrometheusToo many high-cardinality metricsAdjust metric collection interval and retention
Missing policy verdict metricsCNI not configured to expose verdictsEnable policy verdict logging in CNI configuration

Next steps

Running this in production?

Want this handled for you? Running this at scale adds a second layer of work: capacity planning, failover drills, cost control, and on-call. Our managed platform covers monitoring, backups and 24/7 response by default.

Need help?

Don't want to manage this yourself?

We handle managed devops services for businesses that depend on uptime. From initial setup to ongoing operations.