Monitor Caddy and Consul integration with Prometheus and Grafana dashboards

Intermediate 45 min Jun 03, 2026 57 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Set up comprehensive monitoring for Caddy reverse proxy and Consul service discovery with Prometheus metrics collection and Grafana dashboards for performance insights and alerting.

Prerequisites

  • Root access to the server
  • Caddy and Consul already installed and running
  • Basic familiarity with systemd services
  • SMTP server for alert notifications

What this solves

When running Caddy as a reverse proxy with Consul for service discovery, you need visibility into both systems to maintain performance and catch issues early. This tutorial sets up Prometheus exporters for both services and creates Grafana dashboards to monitor proxy metrics, service health, and cluster status in real-time.

Step-by-step configuration

Update system packages

Start by updating your package manager to ensure you get the latest versions of all components.

sudo apt update && sudo apt upgrade -y
sudo dnf update -y

Install Prometheus

Download and install Prometheus to collect metrics from Caddy and Consul. We'll create a dedicated user for security.

sudo useradd --no-create-home --shell /bin/false prometheus
sudo mkdir /etc/prometheus
sudo mkdir /var/lib/prometheus
sudo chown prometheus:prometheus /etc/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus
cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
tar xvf prometheus-2.45.0.linux-amd64.tar.gz
sudo cp prometheus-2.45.0.linux-amd64/prometheus /usr/local/bin/
sudo cp prometheus-2.45.0.linux-amd64/promtool /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/prometheus
sudo chown prometheus:prometheus /usr/local/bin/promtool

Configure Prometheus for Caddy and Consul

Create the main Prometheus configuration file with scrape targets for both Caddy metrics and Consul endpoints.

global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "caddy_rules.yml"
  - "consul_rules.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - localhost:9093

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'caddy'
    static_configs:
      - targets: ['localhost:2019']
    metrics_path: '/metrics'
    scrape_interval: 30s

  - job_name: 'consul'
    static_configs:
      - targets: ['localhost:8500']
    metrics_path: '/v1/agent/metrics'
    params:
      format: ['prometheus']
    scrape_interval: 30s

  - job_name: 'consul-services'
    consul_sd_configs:
      - server: 'localhost:8500'
        services: []
    relabel_configs:
      - source_labels: [__meta_consul_service]
        target_label: service
      - source_labels: [__meta_consul_node]
        target_label: node
sudo chown prometheus:prometheus /etc/prometheus/prometheus.yml

Configure Caddy to expose metrics

Enable Caddy's built-in metrics endpoint by adding the admin API configuration to your Caddyfile.

{
  admin :2019
  servers {
    metrics
  }
}

example.com {
  reverse_proxy consul.service.consul:8080 {
    health_uri /health
    health_interval 10s
    health_timeout 5s
  }
  
  log {
    output file /var/log/caddy/access.log {
      roll_size 100mb
      roll_keep 5
    }
    format json
  }
}
sudo systemctl reload caddy

Enable Consul metrics

Configure Consul to expose Prometheus metrics through its HTTP API by updating the agent configuration.

{
  "telemetry": {
    "prometheus_retention_time": "30s",
    "disable_hostname": false
  },
  "ports": {
    "grpc": 8502
  },
  "connect": {
    "enabled": true
  }
}
sudo systemctl reload consul

Create alerting rules for Caddy

Define Prometheus alerting rules to monitor Caddy performance and availability issues.

groups:
  • name: caddy.rules
rules: - alert: CaddyDown expr: up{job="caddy"} == 0 for: 1m labels: severity: critical annotations: summary: "Caddy server is down" description: "Caddy has been down for more than 1 minute" - alert: CaddyHighRequestLatency expr: histogram_quantile(0.95, rate(caddy_http_request_duration_seconds_bucket[5m])) > 0.5 for: 2m labels: severity: warning annotations: summary: "High request latency on Caddy" description: "95th percentile latency is {{ $value }}s" - alert: CaddyHighErrorRate expr: rate(caddy_http_requests_total{status=~"5.."}[5m]) / rate(caddy_http_requests_total[5m]) > 0.05 for: 2m labels: severity: critical annotations: summary: "High error rate on Caddy" description: "Error rate is {{ $value | humanizePercentage }}" - alert: CaddyUpstreamDown expr: caddy_reverse_proxy_upstreams_healthy == 0 for: 1m labels: severity: critical annotations: summary: "Caddy upstream is down" description: "No healthy upstreams available for {{ $labels.upstream }}" - alert: CaddyHighMemoryUsage expr: process_resident_memory_bytes{job="caddy"} / 1024 / 1024 > 500 for: 5m labels: severity: warning annotations: summary: "Caddy high memory usage" description: "Caddy memory usage is {{ $value }}MB"

Create alerting rules for Consul

Set up Consul-specific alerts for cluster health, service discovery issues, and performance problems.

groups:
  • name: consul.rules
rules: - alert: ConsulDown expr: up{job="consul"} == 0 for: 1m labels: severity: critical annotations: summary: "Consul agent is down" description: "Consul agent has been down for more than 1 minute" - alert: ConsulLeaderMissing expr: consul_raft_leader == 0 for: 1m labels: severity: critical annotations: summary: "Consul cluster has no leader" description: "Consul cluster is without a leader" - alert: ConsulHighMemoryUsage expr: consul_runtime_alloc_bytes / 1024 / 1024 > 1000 for: 5m labels: severity: warning annotations: summary: "Consul high memory usage" description: "Consul memory usage is {{ $value }}MB" - alert: ConsulServiceUnhealthy expr: consul_health_service_query_count{status!="passing"} > 0 for: 2m labels: severity: warning annotations: summary: "Consul service health check failing" description: "Service {{ $labels.service }} health check is failing" - alert: ConsulNodeUnhealthy expr: consul_health_node_query_count{status!="passing"} > 0 for: 2m labels: severity: critical annotations: summary: "Consul node health check failing" description: "Node {{ $labels.node }} health check is failing" - alert: ConsulRaftLogGrowth expr: increase(consul_raft_commitIndex[1h]) > 10000 for: 5m labels: severity: warning annotations: summary: "High Consul raft log growth" description: "Raft log has grown by {{ $value }} entries in the last hour"

Set correct permissions for Prometheus files

Ensure Prometheus can read all configuration files by setting appropriate ownership and permissions.

sudo chown prometheus:prometheus /etc/prometheus/caddy_rules.yml
sudo chown prometheus:prometheus /etc/prometheus/consul_rules.yml
sudo chmod 644 /etc/prometheus/*.yml

Create Prometheus systemd service

Set up Prometheus as a systemd service for automatic startup and management.

[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
    --config.file /etc/prometheus/prometheus.yml \
    --storage.tsdb.path /var/lib/prometheus/ \
    --web.console.templates=/etc/prometheus/consoles \
    --web.console.libraries=/etc/prometheus/console_libraries \
    --web.listen-address=0.0.0.0:9090 \
    --web.enable-lifecycle \
    --storage.tsdb.retention.time=30d

[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable --now prometheus

Install Grafana

Add the Grafana repository and install it to create dashboards for monitoring data visualization.

sudo apt install -y software-properties-common wget
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt update
sudo apt install -y grafana
sudo dnf install -y wget
sudo tee /etc/yum.repos.d/grafana.repo <

Configure Grafana data source

Create a Grafana configuration to automatically provision the Prometheus data source.

apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://localhost:9090
    isDefault: true
    editable: true

Create Caddy dashboard configuration

Set up a comprehensive Grafana dashboard to monitor Caddy reverse proxy metrics and performance.

{
  "dashboard": {
    "id": null,
    "title": "Caddy Reverse Proxy Monitoring",
    "tags": ["caddy", "proxy"],
    "timezone": "browser",
    "panels": [
      {
        "id": 1,
        "title": "HTTP Requests per Second",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(caddy_http_requests_total[5m])",
            "legendFormat": "{{method}} {{host}}"
          }
        ],
        "yAxes": [
          {
            "label": "requests/sec"
          }
        ],
        "gridPos": {
          "h": 8,
          "w": 12,
          "x": 0,
          "y": 0
        }
      },
      {
        "id": 2,
        "title": "Response Time Percentiles",
        "type": "graph",
        "targets": [
          {
            "expr": "histogram_quantile(0.50, rate(caddy_http_request_duration_seconds_bucket[5m]))",
            "legendFormat": "50th percentile"
          },
          {
            "expr": "histogram_quantile(0.95, rate(caddy_http_request_duration_seconds_bucket[5m]))",
            "legendFormat": "95th percentile"
          },
          {
            "expr": "histogram_quantile(0.99, rate(caddy_http_request_duration_seconds_bucket[5m]))",
            "legendFormat": "99th percentile"
          }
        ],
        "yAxes": [
          {
            "label": "seconds"
          }
        ],
        "gridPos": {
          "h": 8,
          "w": 12,
          "x": 12,
          "y": 0
        }
      },
      {
        "id": 3,
        "title": "HTTP Status Codes",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(caddy_http_requests_total{status=~\"2..\"}[5m])",
            "legendFormat": "2xx Success"
          },
          {
            "expr": "rate(caddy_http_requests_total{status=~\"4..\"}[5m])",
            "legendFormat": "4xx Client Error"
          },
          {
            "expr": "rate(caddy_http_requests_total{status=~\"5..\"}[5m])",
            "legendFormat": "5xx Server Error"
          }
        ],
        "gridPos": {
          "h": 8,
          "w": 12,
          "x": 0,
          "y": 8
        }
      },
      {
        "id": 4,
        "title": "Upstream Health Status",
        "type": "stat",
        "targets": [
          {
            "expr": "caddy_reverse_proxy_upstreams_healthy",
            "legendFormat": "{{upstream}}"
          }
        ],
        "gridPos": {
          "h": 8,
          "w": 12,
          "x": 12,
          "y": 8
        }
      }
    ],
    "time": {
      "from": "now-1h",
      "to": "now"
    },
    "refresh": "30s"
  }
}

Create Consul dashboard configuration

Build a Grafana dashboard specifically for monitoring Consul cluster health and service discovery metrics.

{
  "dashboard": {
    "id": null,
    "title": "Consul Cluster Monitoring",
    "tags": ["consul", "service-discovery"],
    "timezone": "browser",
    "panels": [
      {
        "id": 1,
        "title": "Consul Nodes Status",
        "type": "stat",
        "targets": [
          {
            "expr": "consul_serf_lan_members",
            "legendFormat": "Active Nodes"
          }
        ],
        "gridPos": {
          "h": 4,
          "w": 6,
          "x": 0,
          "y": 0
        }
      },
      {
        "id": 2,
        "title": "Raft Leader Status",
        "type": "stat",
        "targets": [
          {
            "expr": "consul_raft_leader",
            "legendFormat": "Has Leader"
          }
        ],
        "gridPos": {
          "h": 4,
          "w": 6,
          "x": 6,
          "y": 0
        }
      },
      {
        "id": 3,
        "title": "Service Health Checks",
        "type": "graph",
        "targets": [
          {
            "expr": "consul_health_service_query_count{status=\"passing\"}",
            "legendFormat": "Passing"
          },
          {
            "expr": "consul_health_service_query_count{status=\"warning\"}",
            "legendFormat": "Warning"
          },
          {
            "expr": "consul_health_service_query_count{status=\"critical\"}",
            "legendFormat": "Critical"
          }
        ],
        "gridPos": {
          "h": 8,
          "w": 12,
          "x": 0,
          "y": 4
        }
      },
      {
        "id": 4,
        "title": "Memory Usage",
        "type": "graph",
        "targets": [
          {
            "expr": "consul_runtime_alloc_bytes / 1024 / 1024",
            "legendFormat": "Allocated Memory (MB)"
          },
          {
            "expr": "consul_runtime_sys_bytes / 1024 / 1024",
            "legendFormat": "System Memory (MB)"
          }
        ],
        "yAxes": [
          {
            "label": "MB"
          }
        ],
        "gridPos": {
          "h": 8,
          "w": 12,
          "x": 12,
          "y": 4
        }
      },
      {
        "id": 5,
        "title": "Raft Transactions",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(consul_raft_apply[5m])",
            "legendFormat": "Apply Rate"
          },
          {
            "expr": "consul_raft_commitIndex",
            "legendFormat": "Commit Index"
          }
        ],
        "gridPos": {
          "h": 8,
          "w": 24,
          "x": 0,
          "y": 12
        }
      }
    ],
    "time": {
      "from": "now-1h",
      "to": "now"
    },
    "refresh": "30s"
  }
}

Create dashboard provisioning configuration

Set up Grafana to automatically load the dashboard configurations on startup.

apiVersion: 1

providers:
  - name: 'default'
    orgId: 1
    folder: ''
    type: file
    disableDeletion: false
    updateIntervalSeconds: 10
    allowUiUpdates: true
    options:
      path: /etc/grafana/provisioning/dashboards

Install and configure Alertmanager

Set up Alertmanager to handle alerts from Prometheus and send notifications for critical issues.

cd /tmp
wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz
tar xvf alertmanager-0.26.0.linux-amd64.tar.gz
sudo cp alertmanager-0.26.0.linux-amd64/alertmanager /usr/local/bin/
sudo cp alertmanager-0.26.0.linux-amd64/amtool /usr/local/bin/
sudo useradd --no-create-home --shell /bin/false alertmanager
sudo mkdir /etc/alertmanager
sudo mkdir /var/lib/alertmanager
sudo chown alertmanager:alertmanager /etc/alertmanager
sudo chown alertmanager:alertmanager /var/lib/alertmanager
sudo chown alertmanager:alertmanager /usr/local/bin/alertmanager
sudo chown alertmanager:alertmanager /usr/local/bin/amtool

Configure Alertmanager notifications

Set up Alertmanager to send email notifications for critical alerts from both Caddy and Consul monitoring.

global:
  smtp_smarthost: 'smtp.example.com:587'
  smtp_from: 'alerts@example.com'
  smtp_auth_username: 'alerts@example.com'
  smtp_auth_password: 'your_smtp_password'

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'web.hook'

receivers:
  • name: 'web.hook'
email_configs: - to: 'admin@example.com' subject: '[ALERT] {{ .GroupLabels.alertname }}' body: | {{ range .Alerts }} Alert: {{ .Annotations.summary }} Description: {{ .Annotations.description }} Labels: {{ .Labels }} {{ end }} inhibit_rules: - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname', 'dev', 'instance']
sudo chown alertmanager:alertmanager /etc/alertmanager/alertmanager.yml

Create Alertmanager systemd service

Set up Alertmanager as a systemd service for automatic startup and process management.

[Unit]
Description=Alertmanager
Wants=network-online.target
After=network-online.target

[Service]
User=alertmanager
Group=alertmanager
Type=simple
ExecStart=/usr/local/bin/alertmanager \
    --config.file /etc/alertmanager/alertmanager.yml \
    --storage.path /var/lib/alertmanager/ \
    --web.listen-address=0.0.0.0:9093 \
    --web.external-url=http://localhost:9093

[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable --now alertmanager

Start and enable Grafana

Enable Grafana to start on boot and start the service to begin monitoring dashboard access.

sudo systemctl enable --now grafana-server

Configure firewall rules

Open necessary ports for accessing Prometheus, Grafana, and Alertmanager web interfaces securely.

sudo ufw allow 9090/tcp comment 'Prometheus'
sudo ufw allow 3000/tcp comment 'Grafana'
sudo ufw allow 9093/tcp comment 'Alertmanager'
sudo ufw allow 2019/tcp comment 'Caddy Admin'
sudo firewall-cmd --permanent --add-port=9090/tcp
sudo firewall-cmd --permanent --add-port=3000/tcp
sudo firewall-cmd --permanent --add-port=9093/tcp
sudo firewall-cmd --permanent --add-port=2019/tcp
sudo firewall-cmd --reload
Security Note: In production, restrict these ports to specific IP ranges or use a VPN. Never expose monitoring interfaces to the public internet without authentication.

Verify your setup

Check that all components are running and collecting metrics properly before proceeding to dashboard configuration.

# Check service status
sudo systemctl status prometheus grafana-server alertmanager

Verify Prometheus is scraping targets

curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[].health'

Test Caddy metrics endpoint

curl -s http://localhost:2019/metrics | grep caddy_http_requests_total

Test Consul metrics endpoint

curl -s http://localhost:8500/v1/agent/metrics?format=prometheus | grep consul_

Check Grafana is accessible

curl -I http://localhost:3000

Access Grafana at http://your_server_ip:3000 with default credentials admin/admin. The dashboards should automatically load and display metrics from both Caddy and Consul. You can find more details on securing web servers in our Caddy SSL certificates tutorial.

Common issues

SymptomCauseFix
Prometheus shows targets as downServices not exposing metricsCheck service configs and restart: sudo systemctl restart caddy consul
No data in Grafana dashboardsData source not configuredVerify Prometheus data source: curl -s http://localhost:9090/api/v1/query?query=up
Caddy metrics endpoint 404Admin API not enabledAdd admin :2019 to Caddyfile global block and reload
Consul metrics return emptyTelemetry not configuredAdd telemetry config to /etc/consul.d/metrics.json and restart
Alertmanager not sending emailsSMTP configuration incorrectTest with: /usr/local/bin/amtool config check /etc/alertmanager/alertmanager.yml
Dashboard panels show no dataMetric names changedCheck available metrics: curl -s http://localhost:9090/api/v1/label/__name__/values

Next steps

Running this in production?

Want this handled for you? Setting up monitoring once is straightforward. Keeping it patched, tuned, and responding to alerts 24/7 across environments is the harder part. See how we run infrastructure like this for European SaaS and e-commerce teams.

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle managed devops services for businesses that depend on uptime. From initial setup to ongoing operations.