Monitor Consul with Prometheus and Grafana for service discovery observability

Intermediate 35 min Apr 08, 2026 24 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Set up comprehensive monitoring for HashiCorp Consul using Prometheus metrics collection and Grafana dashboards. Configure telemetry export, alerting rules, and visualization for service discovery health and performance.

Prerequisites

  • Running Consul cluster
  • Root or sudo access
  • Network access between monitoring components
  • Basic understanding of Prometheus and Grafana

What this solves

Consul service discovery requires comprehensive monitoring to ensure healthy service registration, discovery performance, and cluster stability. This tutorial configures Prometheus to collect Consul telemetry metrics and creates Grafana dashboards for visualizing service health, cluster performance, and automated alerting on critical issues.

Prerequisites

You need a running Consul cluster with basic configuration. If you haven't set up Consul yet, follow our Consul installation guide first. You'll also need Prometheus and Grafana installed on your monitoring server.

Step-by-step configuration

Install monitoring stack components

Install Prometheus and Grafana on your monitoring server to collect and visualize Consul metrics.

sudo apt update
sudo apt install -y prometheus grafana curl wget
sudo systemctl enable prometheus grafana-server
sudo dnf update -y
sudo dnf install -y prometheus grafana curl wget
sudo systemctl enable prometheus grafana-server

Configure Consul telemetry

Enable Consul's built-in telemetry and metrics export to make data available for Prometheus scraping.

telemetry {
  prometheus_retention_time = "30s"
  disable_hostname = true
  enable_hostname_label = true
  
  statsd_address = "127.0.0.1:8125"
  statsite_address = "127.0.0.1:8126"
  
  metrics_prefix = "consul"
  
  circonus_api_token = ""
  circonus_api_app = "consul"
  
  dogstatsd_addr = ""
  dogstatsd_tags = []
}

Enable Consul HTTP API metrics endpoint

Configure Consul to expose metrics through its HTTP API endpoint for Prometheus collection.

ports {
  http = 8500
}

ui_config {
  enabled = true
  metrics_provider = "prometheus"
  metrics_proxy {
    base_url = "http://localhost:9090"
  }
}

connect {
  enabled = true
}

log_level = "INFO"
log_json = true

Restart Consul with telemetry configuration

Apply the telemetry configuration by restarting Consul and verify the metrics endpoint is accessible.

sudo systemctl restart consul
sudo systemctl status consul

Test metrics endpoint

curl -s http://localhost:8500/v1/agent/metrics?format=prometheus | head -20

Configure Prometheus to scrape Consul metrics

Add Consul as a scrape target in Prometheus configuration to collect telemetry data every 15 seconds.

global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "/etc/prometheus/rules/consul_alerts.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - alertmanager:9093

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'consul'
    static_configs:
      - targets: ['localhost:8500']
    metrics_path: '/v1/agent/metrics'
    params:
      format: ['prometheus']
    scrape_interval: 15s
    scrape_timeout: 10s

  - job_name: 'consul-services'
    consul_sd_configs:
      - server: 'localhost:8500'
        services: []
    relabel_configs:
      - source_labels: [__meta_consul_service]
        target_label: service
      - source_labels: [__meta_consul_node]
        target_label: node

Create Consul alerting rules

Define Prometheus alerting rules for critical Consul events like node failures, service health issues, and cluster problems.

sudo mkdir -p /etc/prometheus/rules
groups:
  - name: consul_alerts
    rules:
      - alert: ConsulNodeDown
        expr: consul_health_node_status != 1
        for: 30s
        labels:
          severity: critical
        annotations:
          summary: "Consul node {{ $labels.node }} is down"
          description: "Consul node {{ $labels.node }} has been down for more than 30 seconds"

      - alert: ConsulServiceUnhealthy
        expr: consul_health_service_status != 1
        for: 60s
        labels:
          severity: warning
        annotations:
          summary: "Consul service {{ $labels.service_name }} is unhealthy"
          description: "Service {{ $labels.service_name }} on node {{ $labels.node }} has been unhealthy for more than 1 minute"

      - alert: ConsulLeaderElection
        expr: consul_raft_leader != 1
        for: 10s
        labels:
          severity: critical
        annotations:
          summary: "Consul cluster has no leader"
          description: "Consul cluster is without a leader for more than 10 seconds"

      - alert: ConsulHighMemoryUsage
        expr: consul_runtime_alloc_bytes / consul_runtime_sys_bytes > 0.8
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Consul memory usage is high"
          description: "Consul is using more than 80% of allocated memory"

      - alert: ConsulRPCErrors
        expr: rate(consul_rpc_request_errors_total[5m]) > 0.1
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "High Consul RPC error rate"
          description: "Consul RPC error rate is {{ $value }} errors per second"

Restart Prometheus with new configuration

Apply the Prometheus configuration and verify it's successfully scraping Consul metrics.

sudo systemctl restart prometheus
sudo systemctl status prometheus

Check if Consul target is being scraped

curl -s http://localhost:9090/api/v1/targets | grep consul

Configure Grafana data source

Add Prometheus as a data source in Grafana to enable dashboard creation for Consul metrics.

sudo systemctl start grafana-server
sudo systemctl status grafana-server

Access Grafana at http://your-server-ip:3000 (admin/admin), then add Prometheus data source at http://localhost:9090.

Create Consul overview dashboard

Import a comprehensive Grafana dashboard for Consul monitoring with panels for cluster health, service discovery, and performance metrics.

{
  "dashboard": {
    "id": null,
    "title": "Consul Service Discovery Monitoring",
    "tags": ["consul", "service-discovery"],
    "timezone": "browser",
    "panels": [
      {
        "id": 1,
        "title": "Consul Cluster Status",
        "type": "stat",
        "targets": [
          {
            "expr": "consul_raft_leader",
            "legendFormat": "Leader Status"
          }
        ],
        "gridPos": {"h": 8, "w": 12, "x": 0, "y": 0}
      },
      {
        "id": 2,
        "title": "Active Services",
        "type": "stat",
        "targets": [
          {
            "expr": "count(consul_catalog_service_query_count)",
            "legendFormat": "Services"
          }
        ],
        "gridPos": {"h": 8, "w": 12, "x": 12, "y": 0}
      },
      {
        "id": 3,
        "title": "Service Health Status",
        "type": "graph",
        "targets": [
          {
            "expr": "consul_health_service_status",
            "legendFormat": "{{ service_name }}"
          }
        ],
        "gridPos": {"h": 8, "w": 24, "x": 0, "y": 8}
      },
      {
        "id": 4,
        "title": "RPC Request Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(consul_rpc_request_total[5m])",
            "legendFormat": "RPC Requests/sec"
          }
        ],
        "gridPos": {"h": 8, "w": 12, "x": 0, "y": 16}
      },
      {
        "id": 5,
        "title": "Memory Usage",
        "type": "graph",
        "targets": [
          {
            "expr": "consul_runtime_alloc_bytes",
            "legendFormat": "Allocated Memory"
          }
        ],
        "gridPos": {"h": 8, "w": 12, "x": 12, "y": 16}
      }
    ],
    "time": {
      "from": "now-1h",
      "to": "now"
    },
    "refresh": "5s"
  }
}

Import dashboard into Grafana

Use the Grafana API to import the Consul monitoring dashboard programmatically.

curl -X POST \
  http://admin:admin@localhost:3000/api/dashboards/db \
  -H 'Content-Type: application/json' \
  -d @/tmp/consul_dashboard.json

Configure Grafana alerting

Set up Grafana alert notifications to send alerts via email or Slack when Consul issues occur.

[alerting]
enabled = true
execute_alerts = true
max_attempts = 3
min_interval_seconds = 10

[smtp]
enabled = true
host = localhost:587
user = alerts@example.com
password = your-smtp-password
skip_verify = false
from_address = consul-alerts@example.com
from_name = Consul Monitoring

[unified_alerting]
enabled = true

Create service discovery health checks

Configure additional monitoring for service registration and discovery performance metrics.

service {
  name = "consul-monitoring"
  port = 8500
  tags = ["monitoring", "metrics"]
  
  check {
    name = "Consul HTTP API"
    http = "http://localhost:8500/v1/status/leader"
    interval = "10s"
    timeout = "3s"
  }
  
  check {
    name = "Consul Metrics Endpoint"
    http = "http://localhost:8500/v1/agent/metrics?format=prometheus"
    interval = "30s"
    timeout = "5s"
  }
}

Restart services and verify monitoring

Apply all configuration changes and verify the complete monitoring pipeline is working correctly.

sudo systemctl restart consul prometheus grafana-server
sudo systemctl status consul prometheus grafana-server

Verify metrics collection

curl -s "http://localhost:9090/api/v1/query?query=consul_raft_leader" | grep -o '"value":\[.*\]'

Verify your monitoring setup

Test that all monitoring components are working correctly and collecting Consul metrics.

# Check Consul metrics endpoint
curl -s http://localhost:8500/v1/agent/metrics?format=prometheus | grep consul_raft_leader

Verify Prometheus is scraping Consul

curl -s "http://localhost:9090/api/v1/targets" | grep consul | head -5

Test Grafana API

curl -s http://admin:admin@localhost:3000/api/health

Check active services in Consul

curl -s http://localhost:8500/v1/catalog/services | jq .
Note: It may take 2-3 minutes for metrics to appear in Prometheus and Grafana after configuration changes. Use the Grafana Explore feature to test queries before creating dashboards.

Key metrics to monitor

Metric Description Prometheus Query
Leader Status Whether node is cluster leader consul_raft_leader
Service Health Health status of registered services consul_health_service_status
Node Health Health status of cluster nodes consul_health_node_status
RPC Requests Rate of RPC requests rate(consul_rpc_request_total[5m])
Memory Usage Current memory allocation consul_runtime_alloc_bytes
KV Store Operations Key-value store transaction rate rate(consul_kvs_apply[5m])

Common monitoring issues

Symptom Cause Fix
No metrics in Prometheus Consul telemetry not enabled Add telemetry block to Consul config and restart
Consul target shows as down Metrics endpoint not accessible Check consul_sd_config in prometheus.yml
Grafana shows no data Prometheus data source not configured Add http://localhost:9090 as Prometheus data source
Alerts not firing Alert rules syntax error Validate YAML syntax in consul_alerts.yml
High memory usage alerts Normal for large service catalogs Adjust alert thresholds based on environment

Performance optimization tips

For production environments, consider these optimizations to reduce monitoring overhead while maintaining visibility:

Adjust scrape intervals for performance

Reduce monitoring overhead by tuning scrape intervals based on your requirements.

# For high-traffic environments
scrape_configs:
  - job_name: 'consul'
    static_configs:
      - targets: ['localhost:8500']
    scrape_interval: 30s  # Increased from 15s
    scrape_timeout: 15s   # Increased timeout

This monitoring setup provides comprehensive visibility into your Consul service discovery infrastructure. You can extend it by adding custom metrics for specific services or integrating with your existing monitoring stack. For additional security, consider implementing Consul ACL security to protect metrics endpoints.

Next steps

Automated install script

Run this to automate the entire setup

#consul #prometheus #grafana #monitoring #service-discovery

Need help?

Don't want to manage this yourself?

We handle infrastructure for businesses that depend on uptime. From initial setup to ongoing operations.

Talk to an engineer