Monitor Kubernetes Network Policies with Prometheus

Set up comprehensive monitoring for Kubernetes network policies using Prometheus and Grafana. Configure CNI metrics collection, create security dashboards, and implement alerting for policy violations and traffic anomalies.

Prerequisites

Running Kubernetes cluster with CNI (Cilium or Calico)
kubectl configured with cluster admin access
Helm 3 installed
Basic understanding of Kubernetes networking

What this solves

Kubernetes network policies control traffic between pods but provide limited visibility into enforcement and violations. This tutorial sets up comprehensive monitoring using Prometheus to collect CNI metrics from Cilium or Calico, Grafana dashboards for network policy visualization, and alerting rules for security violations. You'll gain real-time insights into allowed and denied connections, policy effectiveness, and potential security threats.

Prerequisites

Before starting, ensure you have a running Kubernetes cluster with either Cilium or Calico CNI installed. You'll need cluster admin access and basic familiarity with Kubernetes network policies. This tutorial assumes you have kubectl configured and can deploy resources to your cluster.

Step-by-step configuration

Install Prometheus Operator

Deploy the Prometheus Operator to manage Prometheus instances and monitoring resources in your cluster.

kubectl create namespace monitoring

Add the Prometheus community Helm repository and install the operator:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus-operator prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false \
  --set prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues=false

Configure Cilium metrics (if using Cilium)

Enable Cilium metrics collection for network policy monitoring. Create a Cilium configuration to expose policy enforcement metrics:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cilium-config
  namespace: kube-system
data:
  enable-prometheus-serve: "true"
  prometheus-serve-addr: ":9962"
  operator-prometheus-serve-addr: ":9963"
  enable-policy-verdict: "true"
  monitor-aggregation: medium
  enable-hubble: "true"
  hubble-metrics-server: ":9965"
  hubble-metrics: "dns,drop,tcp,flow,icmp,http"

Apply the configuration and restart Cilium:

kubectl apply -f cilium-metrics.yaml
kubectl rollout restart daemonset/cilium -n kube-system

Configure Calico metrics (if using Calico)

For Calico deployments, enable Felix metrics to monitor network policy enforcement:

apiVersion: v1
kind: ConfigMap
metadata:
  name: calico-config
  namespace: kube-system
data:
  calico_backend: "bird"
  cluster_type: "k8s,bgp"
  felix_prometheusmetricsenabled: "true"
  felix_prometheusmetricsport: "9091"
  felix_reportingintervalsecs: "0"
  felix_reportingttlsecs: "0"

Apply the configuration:

kubectl apply -f calico-metrics.yaml
kubectl rollout restart daemonset/calico-node -n kube-system

Create ServiceMonitor for CNI metrics

Configure Prometheus to scrape CNI metrics by creating appropriate ServiceMonitor resources. For Cilium:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: cilium-agent
  namespace: monitoring
spec:
  selector:
    matchLabels:
      k8s-app: cilium
  endpoints:
  - port: prometheus
    interval: 30s
    path: /metrics
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: hubble-metrics
  namespace: monitoring
spec:
  selector:
    matchLabels:
      k8s-app: hubble-metrics
  endpoints:
  - port: hubble-metrics
    interval: 30s
    path: /metrics

For Calico, create this ServiceMonitor:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: calico-felix
  namespace: monitoring
spec:
  selector:
    matchLabels:
      k8s-app: calico-node
  endpoints:
  - port: calico-metrics
    interval: 30s
    path: /metrics

Apply the appropriate ServiceMonitor:

kubectl apply -f cilium-servicemonitor.yaml
# OR for Calico:
</code><h2><code>kubectl apply -f calico-servicemonitor.yaml</code></h2></pre></div><div class="step"><h3>Create Grafana dashboards</h3><p>Import network policy monitoring dashboards to visualize CNI metrics. Create a ConfigMap with dashboard JSON:</p><pre class="terminal"><code>apiVersion: v1
kind: ConfigMap
metadata:
  name: network-policy-dashboard
  namespace: monitoring
  labels:
    grafana_dashboard: "1"
data:
  network-policy.json: |
    {
      "dashboard": {
        "id": null,
        "title": "Network Policy Monitoring",
        "tags": ["kubernetes", "network", "security"],
        "timezone": "browser",
        "panels": [
          {
            "id": 1,
            "title": "Policy Verdicts",
            "type": "stat",
            "targets": [
              {
                "expr": "sum(rate(cilium_policy_verdict_total[5m])) by (verdict)",
                "legendFormat": "{{verdict}}"
              }
            ],
            "fieldConfig": {
              "defaults": {
                "color": {
                  "mode": "palette-classic"
                }
              }
            },
            "gridPos": {"h": 8, "w": 12, "x": 0, "y": 0}
          },
          {
            "id": 2,
            "title": "Denied Connections by Source",
            "type": "table",
            "targets": [
              {
                "expr": "topk(10, sum(rate(cilium_policy_verdict_total{verdict=\"DENIED\"}[5m])) by (source, destination))",
                "format": "table",
                "instant": true
              }
            ],
            "gridPos": {"h": 8, "w": 12, "x": 12, "y": 0}
          }
        ],
        "time": {
          "from": "now-1h",
          "to": "now"
        },
        "refresh": "30s"
      }
    }

Apply the dashboard:

kubectl apply -f network-policy-dashboard.yaml

Configure alerting rules

Create Prometheus alerting rules for network policy violations and security events:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: network-policy-alerts
  namespace: monitoring
spec:
  groups:
  - name: network-policy.rules
    interval: 30s
    rules:
    - alert: HighPolicyViolationRate
      expr: rate(cilium_policy_verdict_total{verdict="DENIED"}[5m]) > 10
      for: 2m
      labels:
        severity: warning
        component: network-policy
      annotations:
        summary: "High network policy violation rate detected"
        description: "Network policy is denying {{ $value }} connections per second for {{ $labels.source }} to {{ $labels.destination }}"
    
    - alert: SuspiciousTrafficPattern
      expr: |
        (
          rate(cilium_policy_verdict_total{verdict="DENIED"}[10m]) > 
          rate(cilium_policy_verdict_total{verdict="DENIED"}[1h] offset 1h) * 3
        )
      for: 5m
      labels:
        severity: critical
        component: network-policy
      annotations:
        summary: "Suspicious traffic pattern detected"
        description: "Denied connection rate has increased 3x compared to the same time yesterday"
    
    - alert: NetworkPolicyEngineDown
      expr: up{job="cilium-agent"} == 0
      for: 1m
      labels:
        severity: critical
        component: network-policy
      annotations:
        summary: "CNI policy engine is down"
        description: "Cilium agent on {{ $labels.instance }} has been down for more than 1 minute"
    
    - alert: PolicyLoadErrors
      expr: rate(cilium_policy_regeneration_total{outcome="fail"}[5m]) > 0
      for: 2m
      labels:
        severity: warning
        component: network-policy
      annotations:
        summary: "Network policy load errors"
        description: "Policy regeneration failures detected on {{ $labels.instance }}"
    
    - alert: UnexpectedAllowedConnections
      expr: |
        rate(cilium_policy_verdict_total{verdict="ALLOWED"}[5m]) > 
        quantile_over_time(0.95, rate(cilium_policy_verdict_total{verdict="ALLOWED"}[5m])[7d:1h]) * 1.5
      for: 10m
      labels:
        severity: warning
        component: network-policy
      annotations:
        summary: "Unusual allowed connection volume"
        description: "Allowed connections are 50% higher than the 95th percentile over the last week"

Apply the alerting rules:

kubectl apply -f network-policy-alerts.yaml

Configure Alertmanager for notifications

Set up Alertmanager to send notifications for network policy alerts. You can integrate with Slack notifications or email:

apiVersion: v1
kind: Secret
metadata:
  name: alertmanager-main
  namespace: monitoring
stringData:
  alertmanager.yml: |
    global:
      smtp_smarthost: 'localhost:587'
      smtp_from: 'alerts@example.com'
    
    route:
      group_by: ['alertname', 'component']
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 12h
      receiver: 'network-security-team'
      routes:
      - match:
          component: network-policy
        receiver: 'network-security-team'
        group_wait: 10s
        repeat_interval: 5m
    
    receivers:
    - name: 'network-security-team'
      email_configs:
      - to: 'security@example.com'
        subject: 'Network Policy Alert: {{ .GroupLabels.alertname }}'
        body: |
          {{ range .Alerts }}
          Alert: {{ .Annotations.summary }}
          Description: {{ .Annotations.description }}
          {{ end }}

Apply the Alertmanager configuration:

kubectl apply -f alertmanager-config.yaml

Deploy sample network policies for testing

Create test network policies to generate metrics and validate monitoring setup:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all-ingress
  namespace: default
spec:
  podSelector: {}
  policyTypes:
  - Ingress
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-backend
  namespace: default
spec:
  podSelector:
    matchLabels:
      app: backend
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 8080
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-frontend
spec:
  replicas: 1
  selector:
    matchLabels:
      app: frontend
  template:
    metadata:
      labels:
        app: frontend
    spec:
      containers:
      - name: frontend
        image: nginx:alpine
        ports:
        - containerPort: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-backend
spec:
  replicas: 1
  selector:
    matchLabels:
      app: backend
  template:
    metadata:
      labels:
        app: backend
    spec:
      containers:
      - name: backend
        image: nginx:alpine
        ports:
        - containerPort: 8080

Deploy the test resources:

kubectl apply -f test-network-policies.yaml

Access Grafana dashboard

Port-forward to access Grafana and view your network policy dashboards:

kubectl port-forward -n monitoring svc/prometheus-operator-grafana 3000:80

Access Grafana at http://localhost:3000. The default credentials are admin/prom-operator. Navigate to the "Network Policy Monitoring" dashboard to view metrics.

Verify your setup

Check that all monitoring components are running correctly:

# Verify Prometheus is scraping CNI metrics
kubectl port-forward -n monitoring svc/prometheus-operator-kube-p-prometheus 9090:9090

Access Prometheus at http://localhost:9090 and query for CNI metrics:

cilium_policy_verdict_total
# or for Calico:
felix_cluster_num_workload_endpoints

Test network policy enforcement by generating traffic:

# Get pod names
kubectl get pods -l app=frontend -o jsonpath='{.items[0].metadata.name}'

# Test allowed connection (should work)
kubectl exec -it

Check that alerts are configured correctly:

kubectl get prometheusrules -n monitoring
kubectl describe prometheusrule network-policy-alerts -n monitoring

Advanced configuration

Configure custom metrics retention

Set up longer retention for network policy metrics to track security trends over time:

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: kube-prometheus-stack-kube-prom-prometheus
  namespace: monitoring
spec:
  retention: "90d"
  retentionSize: "50GB"
  storage:
    volumeClaimTemplate:
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 100Gi

Apply the retention configuration:

kubectl apply -f prometheus-retention.yaml

Create security compliance dashboard

Build a compliance-focused dashboard for security audits:

apiVersion: v1
kind: ConfigMap
metadata:
  name: compliance-dashboard
  namespace: monitoring
  labels:
    grafana_dashboard: "1"
data:
  compliance.json: |
    {
      "dashboard": {
        "title": "Network Security Compliance",
        "panels": [
          {
            "title": "Policy Coverage by Namespace",
            "type": "piechart",
            "targets": [{
              "expr": "count by (namespace) (kube_networkpolicy_info)"
            }]
          },
          {
            "title": "Monthly Violation Trends",
            "type": "graph",
            "targets": [{
              "expr": "increase(cilium_policy_verdict_total{verdict=\"DENIED\"}[30d])"
            }]
          }
        ]
      }
    }

Common issues

Symptom	Cause	Fix
No CNI metrics in Prometheus	ServiceMonitor not detecting CNI pods	Check CNI pod labels match ServiceMonitor selector
Grafana dashboard shows no data	Wrong metrics queries for CNI type	Use cilium_ metrics for Cilium, felix_ for Calico
Alerts not firing for policy violations	Metrics path or port misconfigured	Verify CNI metrics endpoint with `kubectl port-forward`
High memory usage in Prometheus	Too many high-cardinality metrics	Adjust metric collection interval and retention
Missing policy verdict metrics	CNI not configured to expose verdicts	Enable policy verdict logging in CNI configuration

Next steps

Configure Kubernetes RBAC with service accounts and role bindings to secure monitoring access
Implement Kubernetes network policies for pod-to-pod security to expand your policy coverage
Configure Cilium BGP peering with MetalLB integration for advanced networking
Set up automated compliance scanning for network policies
Implement network policy testing and validation framework

Running this in production?

Want this handled for you? Running this at scale adds a second layer of work: capacity planning, failover drills, cost control, and on-call. Our managed platform covers monitoring, backups and 24/7 response by default.

#kubernetes #network-policies #prometheus #grafana #cilium #calico #security #monitoring #alerting

Monitor Kubernetes network policies with Prometheus and Grafana for enhanced cluster security

Prerequisites

What this solves

Prerequisites

Step-by-step configuration

Install Prometheus Operator

Configure Cilium metrics (if using Cilium)

Configure Calico metrics (if using Calico)

Create ServiceMonitor for CNI metrics

Configure alerting rules

Configure Alertmanager for notifications

Deploy sample network policies for testing

Access Grafana dashboard

Verify your setup

Advanced configuration

Configure custom metrics retention

Create security compliance dashboard

Common issues

Next steps

Running this in production?

Gerelateerde tutorials

Configure Consul Connect service mesh monitoring with distributed tracing

Configure OpenTelemetry custom metrics for application monitoring with Prometheus and Grafana

Configure Jaeger with Elasticsearch backend security and encryption

Wil je dit niet zelf beheren?