Set up comprehensive monitoring for Kubernetes network policies using Prometheus and Grafana. Configure CNI metrics collection, create security dashboards, and implement alerting for policy violations and traffic anomalies.
Prerequisites
- Running Kubernetes cluster with CNI (Cilium or Calico)
- kubectl configured with cluster admin access
- Helm 3 installed
- Basic understanding of Kubernetes networking
What this solves
Kubernetes network policies control traffic between pods but provide limited visibility into enforcement and violations. This tutorial sets up comprehensive monitoring using Prometheus to collect CNI metrics from Cilium or Calico, Grafana dashboards for network policy visualization, and alerting rules for security violations. You'll gain real-time insights into allowed and denied connections, policy effectiveness, and potential security threats.
Prerequisites
Before starting, ensure you have a running Kubernetes cluster with either Cilium or Calico CNI installed. You'll need cluster admin access and basic familiarity with Kubernetes network policies. This tutorial assumes you have kubectl configured and can deploy resources to your cluster.
Step-by-step configuration
Install Prometheus Operator
Deploy the Prometheus Operator to manage Prometheus instances and monitoring resources in your cluster.
kubectl create namespace monitoringAdd the Prometheus community Helm repository and install the operator:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus-operator prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false \
--set prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues=falseConfigure Cilium metrics (if using Cilium)
Enable Cilium metrics collection for network policy monitoring. Create a Cilium configuration to expose policy enforcement metrics:
apiVersion: v1
kind: ConfigMap
metadata:
name: cilium-config
namespace: kube-system
data:
enable-prometheus-serve: "true"
prometheus-serve-addr: ":9962"
operator-prometheus-serve-addr: ":9963"
enable-policy-verdict: "true"
monitor-aggregation: medium
enable-hubble: "true"
hubble-metrics-server: ":9965"
hubble-metrics: "dns,drop,tcp,flow,icmp,http"Apply the configuration and restart Cilium:
kubectl apply -f cilium-metrics.yaml
kubectl rollout restart daemonset/cilium -n kube-systemConfigure Calico metrics (if using Calico)
For Calico deployments, enable Felix metrics to monitor network policy enforcement:
apiVersion: v1
kind: ConfigMap
metadata:
name: calico-config
namespace: kube-system
data:
calico_backend: "bird"
cluster_type: "k8s,bgp"
felix_prometheusmetricsenabled: "true"
felix_prometheusmetricsport: "9091"
felix_reportingintervalsecs: "0"
felix_reportingttlsecs: "0"Apply the configuration:
kubectl apply -f calico-metrics.yaml
kubectl rollout restart daemonset/calico-node -n kube-systemCreate ServiceMonitor for CNI metrics
Configure Prometheus to scrape CNI metrics by creating appropriate ServiceMonitor resources. For Cilium:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: cilium-agent
namespace: monitoring
spec:
selector:
matchLabels:
k8s-app: cilium
endpoints:
- port: prometheus
interval: 30s
path: /metrics
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: hubble-metrics
namespace: monitoring
spec:
selector:
matchLabels:
k8s-app: hubble-metrics
endpoints:
- port: hubble-metrics
interval: 30s
path: /metricsFor Calico, create this ServiceMonitor:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: calico-felix
namespace: monitoring
spec:
selector:
matchLabels:
k8s-app: calico-node
endpoints:
- port: calico-metrics
interval: 30s
path: /metricsApply the appropriate ServiceMonitor:
kubectl apply -f cilium-servicemonitor.yaml
OR for Calico:
kubectl apply -f calico-servicemonitor.yaml
Create Grafana dashboards
Import network policy monitoring dashboards to visualize CNI metrics. Create a ConfigMap with dashboard JSON:
apiVersion: v1
kind: ConfigMap
metadata:
name: network-policy-dashboard
namespace: monitoring
labels:
grafana_dashboard: "1"
data:
network-policy.json: |
{
"dashboard": {
"id": null,
"title": "Network Policy Monitoring",
"tags": ["kubernetes", "network", "security"],
"timezone": "browser",
"panels": [
{
"id": 1,
"title": "Policy Verdicts",
"type": "stat",
"targets": [
{
"expr": "sum(rate(cilium_policy_verdict_total[5m])) by (verdict)",
"legendFormat": "{{verdict}}"
}
],
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
}
}
},
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 0}
},
{
"id": 2,
"title": "Denied Connections by Source",
"type": "table",
"targets": [
{
"expr": "topk(10, sum(rate(cilium_policy_verdict_total{verdict=\"DENIED\"}[5m])) by (source, destination))",
"format": "table",
"instant": true
}
],
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 0}
}
],
"time": {
"from": "now-1h",
"to": "now"
},
"refresh": "30s"
}
}Apply the dashboard:
kubectl apply -f network-policy-dashboard.yamlConfigure alerting rules
Create Prometheus alerting rules for network policy violations and security events:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: network-policy-alerts
namespace: monitoring
spec:
groups:
- name: network-policy.rules
interval: 30s
rules:
- alert: HighPolicyViolationRate
expr: rate(cilium_policy_verdict_total{verdict="DENIED"}[5m]) > 10
for: 2m
labels:
severity: warning
component: network-policy
annotations:
summary: "High network policy violation rate detected"
description: "Network policy is denying {{ $value }} connections per second for {{ $labels.source }} to {{ $labels.destination }}"
- alert: SuspiciousTrafficPattern
expr: |
(
rate(cilium_policy_verdict_total{verdict="DENIED"}[10m]) >
rate(cilium_policy_verdict_total{verdict="DENIED"}[1h] offset 1h) * 3
)
for: 5m
labels:
severity: critical
component: network-policy
annotations:
summary: "Suspicious traffic pattern detected"
description: "Denied connection rate has increased 3x compared to the same time yesterday"
- alert: NetworkPolicyEngineDown
expr: up{job="cilium-agent"} == 0
for: 1m
labels:
severity: critical
component: network-policy
annotations:
summary: "CNI policy engine is down"
description: "Cilium agent on {{ $labels.instance }} has been down for more than 1 minute"
- alert: PolicyLoadErrors
expr: rate(cilium_policy_regeneration_total{outcome="fail"}[5m]) > 0
for: 2m
labels:
severity: warning
component: network-policy
annotations:
summary: "Network policy load errors"
description: "Policy regeneration failures detected on {{ $labels.instance }}"
- alert: UnexpectedAllowedConnections
expr: |
rate(cilium_policy_verdict_total{verdict="ALLOWED"}[5m]) >
quantile_over_time(0.95, rate(cilium_policy_verdict_total{verdict="ALLOWED"}[5m])[7d:1h]) * 1.5
for: 10m
labels:
severity: warning
component: network-policy
annotations:
summary: "Unusual allowed connection volume"
description: "Allowed connections are 50% higher than the 95th percentile over the last week"Apply the alerting rules:
kubectl apply -f network-policy-alerts.yamlConfigure Alertmanager for notifications
Set up Alertmanager to send notifications for network policy alerts. You can integrate with Slack notifications or email:
apiVersion: v1
kind: Secret
metadata:
name: alertmanager-main
namespace: monitoring
stringData:
alertmanager.yml: |
global:
smtp_smarthost: 'localhost:587'
smtp_from: 'alerts@example.com'
route:
group_by: ['alertname', 'component']
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receiver: 'network-security-team'
routes:
- match:
component: network-policy
receiver: 'network-security-team'
group_wait: 10s
repeat_interval: 5m
receivers:
- name: 'network-security-team'
email_configs:
- to: 'security@example.com'
subject: 'Network Policy Alert: {{ .GroupLabels.alertname }}'
body: |
{{ range .Alerts }}
Alert: {{ .Annotations.summary }}
Description: {{ .Annotations.description }}
{{ end }}Apply the Alertmanager configuration:
kubectl apply -f alertmanager-config.yamlDeploy sample network policies for testing
Create test network policies to generate metrics and validate monitoring setup:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-all-ingress
namespace: default
spec:
podSelector: {}
policyTypes:
- Ingress
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-frontend-to-backend
namespace: default
spec:
podSelector:
matchLabels:
app: backend
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- protocol: TCP
port: 8080
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-frontend
spec:
replicas: 1
selector:
matchLabels:
app: frontend
template:
metadata:
labels:
app: frontend
spec:
containers:
- name: frontend
image: nginx:alpine
ports:
- containerPort: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-backend
spec:
replicas: 1
selector:
matchLabels:
app: backend
template:
metadata:
labels:
app: backend
spec:
containers:
- name: backend
image: nginx:alpine
ports:
- containerPort: 8080Deploy the test resources:
kubectl apply -f test-network-policies.yamlAccess Grafana dashboard
Port-forward to access Grafana and view your network policy dashboards:
kubectl port-forward -n monitoring svc/prometheus-operator-grafana 3000:80Access Grafana at http://localhost:3000. The default credentials are admin/prom-operator. Navigate to the "Network Policy Monitoring" dashboard to view metrics.
Verify your setup
Check that all monitoring components are running correctly:
# Verify Prometheus is scraping CNI metrics
kubectl port-forward -n monitoring svc/prometheus-operator-kube-p-prometheus 9090:9090Access Prometheus at http://localhost:9090 and query for CNI metrics:
cilium_policy_verdict_total
or for Calico:
felix_cluster_num_workload_endpointsTest network policy enforcement by generating traffic:
# Get pod names
kubectl get pods -l app=frontend -o jsonpath='{.items[0].metadata.name}'
Test allowed connection (should work)
kubectl exec -it -- wget -qO- test-backend:8080
Test denied connection (should fail)
kubectl run test-pod --image=nginx:alpine --rm -it -- wget -qO- test-backend:8080 Check that alerts are configured correctly:
kubectl get prometheusrules -n monitoring
kubectl describe prometheusrule network-policy-alerts -n monitoringAdvanced configuration
Configure custom metrics retention
Set up longer retention for network policy metrics to track security trends over time:
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: kube-prometheus-stack-kube-prom-prometheus
namespace: monitoring
spec:
retention: "90d"
retentionSize: "50GB"
storage:
volumeClaimTemplate:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100GiApply the retention configuration:
kubectl apply -f prometheus-retention.yamlCreate security compliance dashboard
Build a compliance-focused dashboard for security audits:
apiVersion: v1
kind: ConfigMap
metadata:
name: compliance-dashboard
namespace: monitoring
labels:
grafana_dashboard: "1"
data:
compliance.json: |
{
"dashboard": {
"title": "Network Security Compliance",
"panels": [
{
"title": "Policy Coverage by Namespace",
"type": "piechart",
"targets": [{
"expr": "count by (namespace) (kube_networkpolicy_info)"
}]
},
{
"title": "Monthly Violation Trends",
"type": "graph",
"targets": [{
"expr": "increase(cilium_policy_verdict_total{verdict=\"DENIED\"}[30d])"
}]
}
]
}
}Common issues
| Symptom | Cause | Fix |
|---|---|---|
| No CNI metrics in Prometheus | ServiceMonitor not detecting CNI pods | Check CNI pod labels match ServiceMonitor selector |
| Grafana dashboard shows no data | Wrong metrics queries for CNI type | Use cilium_ metrics for Cilium, felix_ for Calico |
| Alerts not firing for policy violations | Metrics path or port misconfigured | Verify CNI metrics endpoint with kubectl port-forward |
| High memory usage in Prometheus | Too many high-cardinality metrics | Adjust metric collection interval and retention |
| Missing policy verdict metrics | CNI not configured to expose verdicts | Enable policy verdict logging in CNI configuration |
Next steps
- Configure Kubernetes RBAC with service accounts and role bindings to secure monitoring access
- Implement Kubernetes network policies for pod-to-pod security to expand your policy coverage
- Configure Cilium BGP peering with MetalLB integration for advanced networking
- Set up automated compliance scanning for network policies
- Implement network policy testing and validation framework