Set up comprehensive ArgoCD monitoring with Prometheus metrics collection, custom service discovery, and Grafana dashboards. Configure alerting rules for deployment failures and performance issues to maintain GitOps visibility.
Prerequisites
- Kubernetes cluster with kubectl access
- ArgoCD installed and running
- Helm 3.x installed
- Basic understanding of Kubernetes RBAC
What this solves
ArgoCD deployments need monitoring to track application sync status, deployment failures, and cluster performance. This tutorial configures Prometheus to collect ArgoCD metrics with service discovery, sets up RBAC permissions for metrics exposure, creates Grafana dashboards for visualization, and implements alerting rules for proactive issue detection.
Prerequisites
You need a running Kubernetes cluster with ArgoCD installed and kubectl access. This tutorial assumes you have basic familiarity with Kubernetes manifests and RBAC concepts. For comprehensive cluster monitoring setup, see our guide on monitoring Kubernetes clusters with Prometheus Operator.
Step-by-step configuration
Install Prometheus server
Install Prometheus using the community Helm chart with ArgoCD-specific configuration.
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
Create Prometheus configuration
Configure Prometheus with ArgoCD service discovery and scrape intervals optimized for GitOps monitoring.
prometheus:
prometheusSpec:
serviceMonitorSelectorNilUsesHelmValues: false
serviceMonitorSelector: {}
ruleSelectorNilUsesHelmValues: false
ruleSelector: {}
scrapeInterval: 30s
evaluationInterval: 30s
retention: 15d
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: default
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50Gi
additionalScrapeConfigs:
- job_name: 'argocd-metrics'
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- argocd
relabel_configs:
- source_labels: [__meta_kubernetes_service_name]
action: keep
regex: argocd-metrics
- source_labels: [__meta_kubernetes_endpoint_port_name]
action: keep
regex: metrics
- job_name: 'argocd-server-metrics'
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- argocd
relabel_configs:
- source_labels: [__meta_kubernetes_service_name]
action: keep
regex: argocd-server-metrics
- job_name: 'argocd-repo-server-metrics'
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- argocd
relabel_configs:
- source_labels: [__meta_kubernetes_service_name]
action: keep
regex: argocd-repo-server
Deploy Prometheus with ArgoCD monitoring
Install the Prometheus stack with the ArgoCD-specific configuration.
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace \
--values prometheus-values.yaml \
--set grafana.adminPassword='your-secure-password'
Configure ArgoCD metrics exposure
Enable metrics endpoints in ArgoCD components for Prometheus scraping.
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-cmd-params-cm
namespace: argocd
data:
server.metrics.enabled: "true"
controller.metrics.enabled: "true"
reposerver.metrics.enabled: "true"
applicationsetcontroller.metrics.enabled: "true"
Apply ArgoCD metrics configuration
Update ArgoCD configuration to expose metrics endpoints.
kubectl apply -f argocd-metrics-config.yaml
kubectl rollout restart deployment argocd-server -n argocd
kubectl rollout restart deployment argocd-application-controller -n argocd
kubectl rollout restart deployment argocd-repo-server -n argocd
Create RBAC permissions for metrics
Set up proper RBAC permissions for Prometheus to access ArgoCD metrics endpoints.
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus-argocd
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/proxy
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups: ["extensions"]
resources:
- ingresses
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus-argocd
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus-argocd
subjects:
- kind: ServiceAccount
name: prometheus-kube-prometheus-prometheus
namespace: monitoring
Apply RBAC configuration
Create the necessary permissions for Prometheus to scrape ArgoCD metrics.
kubectl apply -f argocd-prometheus-rbac.yaml
Create ArgoCD ServiceMonitor
Configure Prometheus Operator ServiceMonitor resources for automatic discovery of ArgoCD metrics.
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: argocd-server-metrics
namespace: argocd
labels:
app.kubernetes.io/name: argocd-server-metrics
app.kubernetes.io/part-of: argocd
spec:
selector:
matchLabels:
app.kubernetes.io/name: argocd-server-metrics
endpoints:
- port: metrics
interval: 30s
path: /metrics
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: argocd-application-controller
namespace: argocd
labels:
app.kubernetes.io/name: argocd-application-controller
app.kubernetes.io/part-of: argocd
spec:
selector:
matchLabels:
app.kubernetes.io/name: argocd-application-controller
endpoints:
- port: metrics
interval: 30s
path: /metrics
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: argocd-repo-server
namespace: argocd
labels:
app.kubernetes.io/name: argocd-repo-server
app.kubernetes.io/part-of: argocd
spec:
selector:
matchLabels:
app.kubernetes.io/name: argocd-repo-server
endpoints:
- port: metrics
interval: 30s
path: /metrics
Deploy ServiceMonitor resources
Apply the ServiceMonitor configurations for Prometheus Operator to discover ArgoCD metrics automatically.
kubectl apply -f argocd-servicemonitors.yaml
Create ArgoCD alerting rules
Configure Prometheus alerting rules for ArgoCD deployment failures and performance issues.
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: argocd-alerts
namespace: argocd
labels:
app.kubernetes.io/name: argocd
app.kubernetes.io/part-of: argocd
spec:
groups:
- name: argocd.rules
rules:
- alert: ArgoCDAppHealthDegraded
expr: argocd_app_health_status{health_status!="Healthy"} == 1
for: 15m
labels:
severity: warning
annotations:
summary: "ArgoCD Application {{ $labels.name }} health is degraded"
description: "ArgoCD Application {{ $labels.name }} in namespace {{ $labels.namespace }} has been in {{ $labels.health_status }} state for more than 15 minutes."
- alert: ArgoCDAppSyncFailed
expr: argocd_app_sync_total{phase="Failed"} > 0
for: 5m
labels:
severity: critical
annotations:
summary: "ArgoCD Application {{ $labels.name }} sync failed"
description: "ArgoCD Application {{ $labels.name }} sync has failed. Check the application status and logs."
- alert: ArgoCDAppOutOfSync
expr: argocd_app_sync_status{sync_status!="Synced"} == 1
for: 30m
labels:
severity: warning
annotations:
summary: "ArgoCD Application {{ $labels.name }} is out of sync"
description: "ArgoCD Application {{ $labels.name }} has been out of sync for more than 30 minutes. Current status: {{ $labels.sync_status }}."
- alert: ArgoCDRepoServerDown
expr: up{job="argocd-repo-server"} == 0
for: 5m
labels:
severity: critical
annotations:
summary: "ArgoCD Repository Server is down"
description: "ArgoCD Repository Server has been down for more than 5 minutes."
- alert: ArgoCDServerDown
expr: up{job="argocd-server-metrics"} == 0
for: 5m
labels:
severity: critical
annotations:
summary: "ArgoCD Server is down"
description: "ArgoCD Server has been down for more than 5 minutes."
- alert: ArgoCDControllerHighMemory
expr: (container_memory_working_set_bytes{container="argocd-application-controller"} / container_spec_memory_limit_bytes{container="argocd-application-controller"}) * 100 > 90
for: 15m
labels:
severity: warning
annotations:
summary: "ArgoCD Controller high memory usage"
description: "ArgoCD Application Controller memory usage is above 90% for more than 15 minutes."
- alert: ArgoCDControllerHighCPU
expr: rate(container_cpu_usage_seconds_total{container="argocd-application-controller"}[5m]) * 100 > 80
for: 15m
labels:
severity: warning
annotations:
summary: "ArgoCD Controller high CPU usage"
description: "ArgoCD Application Controller CPU usage is above 80% for more than 15 minutes."
Apply alerting rules
Deploy the PrometheusRule resource to enable ArgoCD-specific alerting.
kubectl apply -f argocd-alerts.yaml
Access Grafana dashboard
Get the Grafana admin password and access the dashboard to configure ArgoCD visualizations.
kubectl get secret prometheus-grafana -n monitoring -o jsonpath="{.data.admin-password}" | base64 -d
echo
kubectl port-forward service/prometheus-grafana 3000:80 -n monitoring
Import ArgoCD Grafana dashboard
Create a comprehensive ArgoCD dashboard configuration for Grafana.
{
"dashboard": {
"id": null,
"title": "ArgoCD Metrics",
"tags": ["argocd", "gitops"],
"timezone": "browser",
"panels": [
{
"id": 1,
"title": "Application Health Status",
"type": "stat",
"targets": [
{
"expr": "sum by (health_status) (argocd_app_health_status)",
"legendFormat": "{{ health_status }}"
}
],
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 0}
},
{
"id": 2,
"title": "Application Sync Status",
"type": "piechart",
"targets": [
{
"expr": "sum by (sync_status) (argocd_app_sync_status)",
"legendFormat": "{{ sync_status }}"
}
],
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 0}
},
{
"id": 3,
"title": "Repository Server Performance",
"type": "graph",
"targets": [
{
"expr": "rate(argocd_git_request_duration_seconds_sum[5m]) / rate(argocd_git_request_duration_seconds_count[5m])",
"legendFormat": "Average Git Request Duration"
}
],
"gridPos": {"h": 8, "w": 24, "x": 0, "y": 8}
},
{
"id": 4,
"title": "Controller Memory Usage",
"type": "graph",
"targets": [
{
"expr": "container_memory_working_set_bytes{container=\"argocd-application-controller\"} / 1024 / 1024",
"legendFormat": "Controller Memory (MB)"
}
],
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 16}
},
{
"id": 5,
"title": "Controller CPU Usage",
"type": "graph",
"targets": [
{
"expr": "rate(container_cpu_usage_seconds_total{container=\"argocd-application-controller\"}[5m]) * 100",
"legendFormat": "Controller CPU (%)"
}
],
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 16}
}
],
"time": {
"from": "now-1h",
"to": "now"
},
"refresh": "30s"
}
}
Verify your monitoring setup
Check that Prometheus is successfully scraping ArgoCD metrics and alerts are configured properly.
# Check Prometheus targets
kubectl port-forward service/prometheus-kube-prometheus-prometheus 9090:9090 -n monitoring &
Verify ArgoCD metrics are being collected
curl -s http://localhost:9090/api/v1/query?query=argocd_app_info
Check ServiceMonitor discovery
kubectl get servicemonitor -n argocd
Verify alerting rules are loaded
kubectl get prometheusrule -n argocd
Access Grafana at http://localhost:3000 and import the ArgoCD dashboard. You should see application health status, sync status, and performance metrics.
Configure alerting notifications
Set up Alertmanager to send notifications when ArgoCD issues occur. For detailed alert configuration with Slack and email integration, see our Prometheus Alertmanager notifications guide.
Create Alertmanager configuration
Configure notification channels for ArgoCD alerts.
apiVersion: v1
kind: Secret
metadata:
name: alertmanager-kube-prometheus-alertmanager
namespace: monitoring
stringData:
alertmanager.yml: |
global:
smtp_smarthost: 'smtp.gmail.com:587'
smtp_from: 'alerts@example.com'
smtp_auth_username: 'alerts@example.com'
smtp_auth_password: 'your-app-password'
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'web.hook'
routes:
- match:
alertname: ArgoCDAppSyncFailed
receiver: 'critical-alerts'
- match:
severity: warning
receiver: 'warning-alerts'
receivers:
- name: 'web.hook'
webhook_configs:
- url: 'http://127.0.0.1:5001/'
- name: 'critical-alerts'
email_configs:
- to: 'devops-team@example.com'
subject: 'ArgoCD Critical Alert: {{ .GroupLabels.alertname }}'
body: |
{{ range .Alerts }}
Alert: {{ .Annotations.summary }}
Description: {{ .Annotations.description }}
{{ end }}
- name: 'warning-alerts'
email_configs:
- to: 'devops-alerts@example.com'
subject: 'ArgoCD Warning: {{ .GroupLabels.alertname }}'
body: |
{{ range .Alerts }}
Alert: {{ .Annotations.summary }}
Description: {{ .Annotations.description }}
{{ end }}
Apply Alertmanager configuration
Update the Alertmanager configuration with ArgoCD-specific routing rules.
kubectl apply -f alertmanager-config.yaml
Common issues
| Symptom | Cause | Fix |
|---|---|---|
| No ArgoCD metrics in Prometheus | ServiceMonitor not discovered | Check ServiceMonitor labels match Prometheus selector: kubectl get servicemonitor -o yaml |
| ArgoCD targets show as down | Metrics endpoints not enabled | Verify ArgoCD ConfigMap has metrics enabled and restart deployments |
| RBAC permission denied errors | Missing Prometheus permissions | Apply RBAC configuration: kubectl apply -f argocd-prometheus-rbac.yaml |
| Grafana shows no data | Incorrect dashboard queries | Check Prometheus has ArgoCD metrics: curl localhost:9090/api/v1/label/__name__/values | grep argocd |
| Alerts not firing | PrometheusRule not loaded | Verify rule labels match Prometheus ruleSelector: kubectl get prometheusrule -n argocd -o yaml |
Next steps
- Set up ArgoCD Image Updater for automatic deployments
- Configure ArgoCD notifications for Slack and Teams
- Implement ArgoCD multi-cluster management
- Configure ArgoCD backup and disaster recovery