Monitor Consul with Prometheus and Grafana

Set up comprehensive monitoring for HashiCorp Consul using Prometheus metrics collection and Grafana dashboards. Configure telemetry export, alerting rules, and visualization for service discovery health and performance.

Prerequisites

Running Consul cluster
Root or sudo access
Network access between monitoring components
Basic understanding of Prometheus and Grafana

What this solves

Consul service discovery requires comprehensive monitoring to ensure healthy service registration, discovery performance, and cluster stability. This tutorial configures Prometheus to collect Consul telemetry metrics and creates Grafana dashboards for visualizing service health, cluster performance, and automated alerting on critical issues.

Prerequisites

You need a running Consul cluster with basic configuration. If you haven't set up Consul yet, follow our Consul installation guide first. You'll also need Prometheus and Grafana installed on your monitoring server.

Step-by-step configuration

Install monitoring stack components

Install Prometheus and Grafana on your monitoring server to collect and visualize Consul metrics.

sudo apt update
sudo apt install -y prometheus grafana curl wget
sudo systemctl enable prometheus grafana-server

sudo dnf update -y
sudo dnf install -y prometheus grafana curl wget
sudo systemctl enable prometheus grafana-server

Configure Consul telemetry

Enable Consul's built-in telemetry and metrics export to make data available for Prometheus scraping.

telemetry {
  prometheus_retention_time = "30s"
  disable_hostname = true
  enable_hostname_label = true
  
  statsd_address = "127.0.0.1:8125"
  statsite_address = "127.0.0.1:8126"
  
  metrics_prefix = "consul"
  
  circonus_api_token = ""
  circonus_api_app = "consul"
  
  dogstatsd_addr = ""
  dogstatsd_tags = []
}

Enable Consul HTTP API metrics endpoint

Configure Consul to expose metrics through its HTTP API endpoint for Prometheus collection.

ports {
  http = 8500
}

ui_config {
  enabled = true
  metrics_provider = "prometheus"
  metrics_proxy {
    base_url = "http://localhost:9090"
  }
}

connect {
  enabled = true
}

log_level = "INFO"
log_json = true

Restart Consul with telemetry configuration

Apply the telemetry configuration by restarting Consul and verify the metrics endpoint is accessible.

sudo systemctl restart consul
sudo systemctl status consul

Test metrics endpoint
curl -s http://localhost:8500/v1/agent/metrics?format=prometheus | head -20

Configure Prometheus to scrape Consul metrics

Add Consul as a scrape target in Prometheus configuration to collect telemetry data every 15 seconds.

global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "/etc/prometheus/rules/consul_alerts.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - alertmanager:9093

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'consul'
    static_configs:
      - targets: ['localhost:8500']
    metrics_path: '/v1/agent/metrics'
    params:
      format: ['prometheus']
    scrape_interval: 15s
    scrape_timeout: 10s

  - job_name: 'consul-services'
    consul_sd_configs:
      - server: 'localhost:8500'
        services: []
    relabel_configs:
      - source_labels: [__meta_consul_service]
        target_label: service
      - source_labels: [__meta_consul_node]
        target_label: node

Create Consul alerting rules

Define Prometheus alerting rules for critical Consul events like node failures, service health issues, and cluster problems.

sudo mkdir -p /etc/prometheus/rules

groups:
  - name: consul_alerts
    rules:
      - alert: ConsulNodeDown
        expr: consul_health_node_status != 1
        for: 30s
        labels:
          severity: critical
        annotations:
          summary: "Consul node {{ $labels.node }} is down"
          description: "Consul node {{ $labels.node }} has been down for more than 30 seconds"

      - alert: ConsulServiceUnhealthy
        expr: consul_health_service_status != 1
        for: 60s
        labels:
          severity: warning
        annotations:
          summary: "Consul service {{ $labels.service_name }} is unhealthy"
          description: "Service {{ $labels.service_name }} on node {{ $labels.node }} has been unhealthy for more than 1 minute"

      - alert: ConsulLeaderElection
        expr: consul_raft_leader != 1
        for: 10s
        labels:
          severity: critical
        annotations:
          summary: "Consul cluster has no leader"
          description: "Consul cluster is without a leader for more than 10 seconds"

      - alert: ConsulHighMemoryUsage
        expr: consul_runtime_alloc_bytes / consul_runtime_sys_bytes > 0.8
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Consul memory usage is high"
          description: "Consul is using more than 80% of allocated memory"

      - alert: ConsulRPCErrors
        expr: rate(consul_rpc_request_errors_total[5m]) > 0.1
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "High Consul RPC error rate"
          description: "Consul RPC error rate is {{ $value }} errors per second"

Restart Prometheus with new configuration

Apply the Prometheus configuration and verify it's successfully scraping Consul metrics.

sudo systemctl restart prometheus
sudo systemctl status prometheus

Check if Consul target is being scraped
curl -s http://localhost:9090/api/v1/targets | grep consul

Configure Grafana data source

Add Prometheus as a data source in Grafana to enable dashboard creation for Consul metrics.

sudo systemctl start grafana-server
sudo systemctl status grafana-server

Access Grafana at http://your-server-ip:3000 (admin/admin), then add Prometheus data source at http://localhost:9090.

Create Consul overview dashboard

Import a comprehensive Grafana dashboard for Consul monitoring with panels for cluster health, service discovery, and performance metrics.

{
  "dashboard": {
    "id": null,
    "title": "Consul Service Discovery Monitoring",
    "tags": ["consul", "service-discovery"],
    "timezone": "browser",
    "panels": [
      {
        "id": 1,
        "title": "Consul Cluster Status",
        "type": "stat",
        "targets": [
          {
            "expr": "consul_raft_leader",
            "legendFormat": "Leader Status"
          }
        ],
        "gridPos": {"h": 8, "w": 12, "x": 0, "y": 0}
      },
      {
        "id": 2,
        "title": "Active Services",
        "type": "stat",
        "targets": [
          {
            "expr": "count(consul_catalog_service_query_count)",
            "legendFormat": "Services"
          }
        ],
        "gridPos": {"h": 8, "w": 12, "x": 12, "y": 0}
      },
      {
        "id": 3,
        "title": "Service Health Status",
        "type": "graph",
        "targets": [
          {
            "expr": "consul_health_service_status",
            "legendFormat": "{{ service_name }}"
          }
        ],
        "gridPos": {"h": 8, "w": 24, "x": 0, "y": 8}
      },
      {
        "id": 4,
        "title": "RPC Request Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(consul_rpc_request_total[5m])",
            "legendFormat": "RPC Requests/sec"
          }
        ],
        "gridPos": {"h": 8, "w": 12, "x": 0, "y": 16}
      },
      {
        "id": 5,
        "title": "Memory Usage",
        "type": "graph",
        "targets": [
          {
            "expr": "consul_runtime_alloc_bytes",
            "legendFormat": "Allocated Memory"
          }
        ],
        "gridPos": {"h": 8, "w": 12, "x": 12, "y": 16}
      }
    ],
    "time": {
      "from": "now-1h",
      "to": "now"
    },
    "refresh": "5s"
  }
}

Import dashboard into Grafana

Use the Grafana API to import the Consul monitoring dashboard programmatically.

curl -X POST \
  http://admin:admin@localhost:3000/api/dashboards/db \
  -H 'Content-Type: application/json' \
  -d @/tmp/consul_dashboard.json

Configure Grafana alerting

Set up Grafana alert notifications to send alerts via email or Slack when Consul issues occur.

[alerting]
enabled = true
execute_alerts = true
max_attempts = 3
min_interval_seconds = 10

[smtp]
enabled = true
host = localhost:587
user = alerts@example.com
password = your-smtp-password
skip_verify = false
from_address = consul-alerts@example.com
from_name = Consul Monitoring

[unified_alerting]
enabled = true

Create service discovery health checks

Configure additional monitoring for service registration and discovery performance metrics.

service {
  name = "consul-monitoring"
  port = 8500
  tags = ["monitoring", "metrics"]
  
  check {
    name = "Consul HTTP API"
    http = "http://localhost:8500/v1/status/leader"
    interval = "10s"
    timeout = "3s"
  }
  
  check {
    name = "Consul Metrics Endpoint"
    http = "http://localhost:8500/v1/agent/metrics?format=prometheus"
    interval = "30s"
    timeout = "5s"
  }
}

Restart services and verify monitoring

Apply all configuration changes and verify the complete monitoring pipeline is working correctly.

sudo systemctl restart consul prometheus grafana-server
sudo systemctl status consul prometheus grafana-server

Verify metrics collection
curl -s "http://localhost:9090/api/v1/query?query=consul_raft_leader" | grep -o '"value":\[.*\]'

Verify your monitoring setup

Test that all monitoring components are working correctly and collecting Consul metrics.

# Check Consul metrics endpoint
curl -s http://localhost:8500/v1/agent/metrics?format=prometheus | grep consul_raft_leader

Verify Prometheus is scraping Consul
curl -s "http://localhost:9090/api/v1/targets" | grep consul | head -5

Test Grafana API
curl -s http://admin:admin@localhost:3000/api/health

Check active services in Consul
curl -s http://localhost:8500/v1/catalog/services | jq .

Note: It may take 2-3 minutes for metrics to appear in Prometheus and Grafana after configuration changes. Use the Grafana Explore feature to test queries before creating dashboards.

Key metrics to monitor

Metric	Description	Prometheus Query
Leader Status	Whether node is cluster leader	consul_raft_leader
Service Health	Health status of registered services	consul_health_service_status
Node Health	Health status of cluster nodes	consul_health_node_status
RPC Requests	Rate of RPC requests	rate(consul_rpc_request_total[5m])
Memory Usage	Current memory allocation	consul_runtime_alloc_bytes
KV Store Operations	Key-value store transaction rate	rate(consul_kvs_apply[5m])

Common monitoring issues

Symptom	Cause	Fix
No metrics in Prometheus	Consul telemetry not enabled	Add telemetry block to Consul config and restart
Consul target shows as down	Metrics endpoint not accessible	Check consul_sd_config in prometheus.yml
Grafana shows no data	Prometheus data source not configured	Add http://localhost:9090 as Prometheus data source
Alerts not firing	Alert rules syntax error	Validate YAML syntax in consul_alerts.yml
High memory usage alerts	Normal for large service catalogs	Adjust alert thresholds based on environment

Performance optimization tips

For production environments, consider these optimizations to reduce monitoring overhead while maintaining visibility:

Adjust scrape intervals for performance

Reduce monitoring overhead by tuning scrape intervals based on your requirements.

# For high-traffic environments
scrape_configs:
  - job_name: 'consul'
    static_configs:
      - targets: ['localhost:8500']
    scrape_interval: 30s  # Increased from 15s
    scrape_timeout: 15s   # Increased timeout

This monitoring setup provides comprehensive visibility into your Consul service discovery infrastructure. You can extend it by adding custom metrics for specific services or integrating with your existing monitoring stack. For additional security, consider implementing Consul ACL security to protect metrics endpoints.

Next steps

Automated install script

Run this to automate the entire setup

install.sh

#!/usr/bin/env bash

set -euo pipefail

# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m'

# Default values
CONSUL_HOST="${1:-localhost}"
GRAFANA_ADMIN_PASS="${2:-admin}"

# Usage message
usage() {
    echo "Usage: $0 [consul_host] [grafana_admin_password]"
    echo "  consul_host: Consul server hostname/IP (default: localhost)"
    echo "  grafana_admin_password: Grafana admin password (default: admin)"
    exit 1
}

# Error handling
cleanup() {
    echo -e "${RED}[ERROR] Installation failed. Cleaning up...${NC}"
    systemctl stop prometheus grafana-server 2>/dev/null || true
    rm -f /tmp/consul_monitoring_* 2>/dev/null || true
}
trap cleanup ERR

# Check if running as root or with sudo
if [[ $EUID -ne 0 ]]; then
    echo -e "${RED}[ERROR] This script must be run as root or with sudo${NC}"
    exit 1
fi

# Auto-detect distribution
if [ -f /etc/os-release ]; then
    . /etc/os-release
    case "$ID" in
        ubuntu|debian)
            PKG_MGR="apt"
            PKG_UPDATE="apt update"
            PKG_INSTALL="apt install -y"
            PROM_CONFIG="/etc/prometheus/prometheus.yml"
            PROM_RULES_DIR="/etc/prometheus/rules"
            GRAFANA_CONFIG="/etc/grafana/grafana.ini"
            GRAFANA_USER="grafana"
            ;;
        almalinux|rocky|centos|rhel|ol|fedora)
            PKG_MGR="dnf"
            PKG_UPDATE="dnf update -y"
            PKG_INSTALL="dnf install -y"
            PROM_CONFIG="/etc/prometheus/prometheus.yml"
            PROM_RULES_DIR="/etc/prometheus/rules"
            GRAFANA_CONFIG="/etc/grafana/grafana.ini"
            GRAFANA_USER="grafana"
            ;;
        amzn)
            PKG_MGR="yum"
            PKG_UPDATE="yum update -y"
            PKG_INSTALL="yum install -y"
            PROM_CONFIG="/etc/prometheus/prometheus.yml"
            PROM_RULES_DIR="/etc/prometheus/rules"
            GRAFANA_CONFIG="/etc/grafana/grafana.ini"
            GRAFANA_USER="grafana"
            ;;
        *)
            echo -e "${RED}[ERROR] Unsupported distribution: $ID${NC}"
            exit 1
            ;;
    esac
else
    echo -e "${RED}[ERROR] Cannot detect distribution${NC}"
    exit 1
fi

echo -e "${BLUE}[INFO] Installing Consul monitoring with Prometheus and Grafana on $PRETTY_NAME${NC}"

# Step 1: Update packages and install monitoring stack
echo -e "${GREEN}[1/8] Installing monitoring stack components...${NC}"
$PKG_UPDATE

if [[ "$PKG_MGR" == "apt" ]]; then
    $PKG_INSTALL prometheus grafana curl wget
else
    $PKG_INSTALL prometheus grafana curl wget
fi

# Step 2: Configure Prometheus
echo -e "${GREEN}[2/8] Configuring Prometheus...${NC}"
mkdir -p "$PROM_RULES_DIR"
chown prometheus:prometheus "$PROM_RULES_DIR"
chmod 755 "$PROM_RULES_DIR"

cat > "$PROM_CONFIG" << 'EOF'
global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "/etc/prometheus/rules/consul_alerts.yml"

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'consul'
    static_configs:
      - targets: ['CONSUL_HOST:8500']
    metrics_path: '/v1/agent/metrics'
    params:
      format: ['prometheus']
    scrape_interval: 15s
    scrape_timeout: 10s

  - job_name: 'consul-services'
    consul_sd_configs:
      - server: 'CONSUL_HOST:8500'
        services: []
    relabel_configs:
      - source_labels: [__meta_consul_service]
        target_label: service
      - source_labels: [__meta_consul_node]
        target_label: node
EOF

sed -i "s/CONSUL_HOST/$CONSUL_HOST/g" "$PROM_CONFIG"
chown prometheus:prometheus "$PROM_CONFIG"
chmod 644 "$PROM_CONFIG"

# Step 3: Create Consul alerting rules
echo -e "${GREEN}[3/8] Creating Consul alerting rules...${NC}"
cat > "$PROM_RULES_DIR/consul_alerts.yml" << 'EOF'
groups:
  - name: consul_alerts
    rules:
      - alert: ConsulNodeDown
        expr: consul_health_node_status != 1
        for: 30s
        labels:
          severity: critical
        annotations:
          summary: "Consul node {{ $labels.node }} is down"
          description: "Consul node {{ $labels.node }} has been down for more than 30 seconds"

      - alert: ConsulServiceUnhealthy
        expr: consul_health_service_status != 1
        for: 60s
        labels:
          severity: warning
        annotations:
          summary: "Consul service {{ $labels.service_name }} is unhealthy"
          description: "Service {{ $labels.service_name }} on node {{ $labels.node }} has been unhealthy for more than 60 seconds"

      - alert: ConsulLeaderElection
        expr: increase(consul_raft_leader_lastcontact_count[5m]) > 5
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "Consul leader election issues"
          description: "Consul cluster experiencing leader election issues"

      - alert: ConsulHighMemoryUsage
        expr: consul_runtime_alloc_bytes / consul_runtime_sys_bytes > 0.9
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High memory usage in Consul"
          description: "Consul memory usage is above 90%"
EOF

chown prometheus:prometheus "$PROM_RULES_DIR/consul_alerts.yml"
chmod 644 "$PROM_RULES_DIR/consul_alerts.yml"

# Step 4: Configure Grafana
echo -e "${GREEN}[4/8] Configuring Grafana...${NC}"
sed -i "s/;admin_password = admin/admin_password = $GRAFANA_ADMIN_PASS/" "$GRAFANA_CONFIG"
chown root:$GRAFANA_USER "$GRAFANA_CONFIG"
chmod 640 "$GRAFANA_CONFIG"

# Step 5: Configure firewall
echo -e "${GREEN}[5/8] Configuring firewall...${NC}"
if command -v firewall-cmd >/dev/null 2>&1; then
    systemctl start firewalld 2>/dev/null || true
    firewall-cmd --permanent --add-port=9090/tcp --quiet || true
    firewall-cmd --permanent --add-port=3000/tcp --quiet || true
    firewall-cmd --reload --quiet || true
elif command -v ufw >/dev/null 2>&1; then
    ufw allow 9090/tcp >/dev/null 2>&1 || true
    ufw allow 3000/tcp >/dev/null 2>&1 || true
fi

# Step 6: Start and enable services
echo -e "${GREEN}[6/8] Starting monitoring services...${NC}"
systemctl enable prometheus grafana-server
systemctl restart prometheus
systemctl restart grafana-server

# Wait for services to start
sleep 5

# Step 7: Verify services are running
echo -e "${GREEN}[7/8] Verifying services...${NC}"
if ! systemctl is-active --quiet prometheus; then
    echo -e "${RED}[ERROR] Prometheus failed to start${NC}"
    exit 1
fi

if ! systemctl is-active --quiet grafana-server; then
    echo -e "${RED}[ERROR] Grafana failed to start${NC}"
    exit 1
fi

# Step 8: Test connectivity and add Prometheus datasource to Grafana
echo -e "${GREEN}[8/8] Configuring Grafana datasource...${NC}"
sleep 10

# Wait for Grafana to be fully ready
for i in {1..30}; do
    if curl -s http://localhost:3000/api/health >/dev/null 2>&1; then
        break
    fi
    sleep 2
done

# Add Prometheus as datasource
curl -s -X POST \
    -H "Content-Type: application/json" \
    -d '{
        "name": "Prometheus",
        "type": "prometheus",
        "url": "http://localhost:9090",
        "access": "proxy",
        "isDefault": true
    }' \
    "http://admin:${GRAFANA_ADMIN_PASS}@localhost:3000/api/datasources" >/dev/null || true

echo -e "${GREEN}[SUCCESS] Consul monitoring setup completed!${NC}"
echo -e "${BLUE}Services configured:${NC}"
echo "  - Prometheus: http://localhost:9090"
echo "  - Grafana: http://localhost:3000 (admin/$GRAFANA_ADMIN_PASS)"
echo ""
echo -e "${YELLOW}[INFO] Next steps:${NC}"
echo "1. Configure Consul telemetry in your consul.hcl:"
echo "   telemetry {"
echo "     prometheus_retention_time = \"30s\""
echo "     disable_hostname = true"
echo "   }"
echo "2. Restart Consul: systemctl restart consul"
echo "3. Test metrics: curl http://$CONSUL_HOST:8500/v1/agent/metrics?format=prometheus"
echo "4. Import Consul dashboards in Grafana"

Review the script before running. Execute with: bash install.sh

#consul #prometheus #grafana #monitoring #service-discovery

Monitor Consul with Prometheus and Grafana for service discovery observability

Prerequisites

What this solves

Prerequisites

Step-by-step configuration

Install monitoring stack components

Configure Consul telemetry

Enable Consul HTTP API metrics endpoint

Restart Consul with telemetry configuration

Test metrics endpoint

Configure Prometheus to scrape Consul metrics

Create Consul alerting rules

Restart Prometheus with new configuration

Check if Consul target is being scraped

Configure Grafana data source

Create Consul overview dashboard

Import dashboard into Grafana

Configure Grafana alerting

Create service discovery health checks

Restart services and verify monitoring

Verify metrics collection

Verify your monitoring setup

Verify Prometheus is scraping Consul

Test Grafana API

Check active services in Consul

Key metrics to monitor

Common monitoring issues

Performance optimization tips

Adjust scrape intervals for performance

Next steps

Related tutorials

Configure Consul Connect service mesh monitoring with distributed tracing

Configure OpenTelemetry custom metrics for application monitoring with Prometheus and Grafana

Configure Jaeger with Elasticsearch backend security and encryption

Don't want to manage this yourself?