Configure Consul Connect service mesh monitoring with distributed tracing

Advanced 45 min Jun 15, 2026 53 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Set up comprehensive monitoring for Consul Connect service mesh with Prometheus metrics, Grafana dashboards, Jaeger distributed tracing, and Envoy proxy observability for production-grade service mesh operations.

Prerequisites

  • Running Consul cluster with Connect enabled
  • At least 8GB RAM for monitoring stack
  • Root or sudo access
  • Basic understanding of service mesh concepts

What this solves

Consul Connect service mesh provides secure service-to-service communication, but operating it reliably requires deep observability into service health, proxy performance, and request flows. This tutorial configures comprehensive monitoring with Prometheus metrics collection, Grafana dashboards for service mesh visualization, and distributed tracing with Jaeger and OpenTelemetry to track requests across your entire service topology.

Prerequisites

You need a running Consul cluster with Connect enabled and at least two services configured to communicate through the service mesh. This tutorial builds on existing Consul Connect infrastructure to add monitoring capabilities.

Update system packages

Start by updating your package manager and installing required dependencies for monitoring components.

sudo apt update && sudo apt upgrade -y
sudo apt install -y wget curl unzip jq
sudo dnf update -y
sudo dnf install -y wget curl unzip jq

Configure Consul metrics collection

Enable Consul telemetry

Configure Consul to export metrics in Prometheus format and enable detailed service mesh telemetry.

telemetry {
  prometheus_retention_time = "24h"
  disable_hostname = true
  metrics_prefix = "consul"
}

connect {
  enabled = true
}

ports {
  grpc = 8502
  http = 8500
}

Configure Connect proxy metrics

Enable detailed metrics collection for Envoy proxies managed by Consul Connect.

connect {
  enabled = true
  
  proxy_defaults {
    config {
      envoy_prometheus_bind_addr = "0.0.0.0:9102"
      envoy_stats_bind_addr = "0.0.0.0:9103"
    }
  }
}

Restart Consul services

Apply the new configuration by restarting Consul on all cluster nodes.

sudo systemctl restart consul
sudo systemctl status consul
curl -s http://localhost:8500/v1/agent/metrics?format=prometheus | head -20

Install and configure Prometheus

Install Prometheus

Download and install the latest version of Prometheus for metrics collection and storage.

wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
tar xvfz prometheus-2.45.0.linux-amd64.tar.gz
sudo mv prometheus-2.45.0.linux-amd64/prometheus /usr/local/bin/
sudo mv prometheus-2.45.0.linux-amd64/promtool /usr/local/bin/
sudo useradd --no-create-home --shell /bin/false prometheus
sudo mkdir -p /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /etc/prometheus /var/lib/prometheus

Configure Prometheus for Consul metrics

Set up Prometheus to discover and scrape metrics from Consul servers and Connect proxies automatically.

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'consul'
    static_configs:
      - targets: ['localhost:8500']
    metrics_path: /v1/agent/metrics
    params:
      format: ['prometheus']
    scrape_interval: 5s

  - job_name: 'consul-connect-proxies'
    consul_sd_configs:
      - server: 'localhost:8500'
        services: []
    relabel_configs:
      - source_labels: [__meta_consul_service_metadata_proxy_type]
        regex: connect-proxy
        action: keep
      - source_labels: [__meta_consul_service_port]
        target_label: __address__
        regex: (.*)
        replacement: ${1}:9102
      - source_labels: [__meta_consul_service]
        target_label: service
      - source_labels: [__meta_consul_node]
        target_label: node
    metrics_path: /metrics
    scrape_interval: 5s

  - job_name: 'envoy-admin'
    consul_sd_configs:
      - server: 'localhost:8500'
        services: []
    relabel_configs:
      - source_labels: [__meta_consul_service_metadata_proxy_type]
        regex: connect-proxy
        action: keep
      - source_labels: [__meta_consul_service_port]
        target_label: __address__
        regex: (.*)
        replacement: ${1}:9103
      - source_labels: [__meta_consul_service]
        target_label: service
      - source_labels: [__meta_consul_node]
        target_label: node
    metrics_path: /stats/prometheus
    scrape_interval: 10s

Create Prometheus systemd service

Configure Prometheus to run as a system service with proper permissions and resource limits.

[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
    --config.file /etc/prometheus/prometheus.yml \
    --storage.tsdb.path /var/lib/prometheus/ \
    --web.console.templates=/etc/prometheus/consoles \
    --web.console.libraries=/etc/prometheus/console_libraries \
    --web.listen-address=0.0.0.0:9090 \
    --web.enable-lifecycle \
    --storage.tsdb.retention.time=30d

[Install]
WantedBy=multi-user.target

Start Prometheus

Enable and start the Prometheus service, then verify it can scrape Consul metrics.

sudo chown -R prometheus:prometheus /etc/prometheus/
sudo systemctl daemon-reload
sudo systemctl enable --now prometheus
sudo systemctl status prometheus
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | {job: .labels.job, health: .health}'

Install and configure Grafana dashboards

Install Grafana

Install Grafana for creating comprehensive service mesh monitoring dashboards.

sudo apt install -y apt-transport-https software-properties-common wget
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list
sudo apt update
sudo apt install -y grafana
sudo tee /etc/yum.repos.d/grafana.repo<

Configure Grafana data source

Add Prometheus as a data source for Grafana to visualize Consul Connect metrics.

apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://localhost:9090
    isDefault: true
    editable: true

Create Consul Connect dashboard

Deploy a comprehensive dashboard for monitoring Consul Connect service mesh metrics.

{
  "dashboard": {
    "id": null,
    "title": "Consul Connect Service Mesh",
    "tags": ["consul", "connect", "service-mesh"],
    "timezone": "browser",
    "panels": [
      {
        "title": "Service Health",
        "type": "stat",
        "targets": [
          {
            "expr": "consul_health_service_query_tag{status=\"passing\"}",
            "legendFormat": "Healthy Services"
          }
        ],
        "gridPos": {"h": 8, "w": 6, "x": 0, "y": 0}
      },
      {
        "title": "Proxy Connections",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(envoy_cluster_upstream_cx_connect_total[5m])",
            "legendFormat": "{{service}} - {{cluster_name}}"
          }
        ],
        "gridPos": {"h": 8, "w": 12, "x": 6, "y": 0}
      },
      {
        "title": "Request Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(envoy_http_inbound_0_0_0_0_20000_http_requests_total[5m])",
            "legendFormat": "{{service}} - Requests/sec"
          }
        ],
        "gridPos": {"h": 8, "w": 12, "x": 0, "y": 8}
      },
      {
        "title": "Response Times",
        "type": "graph",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, rate(envoy_http_inbound_0_0_0_0_20000_http_request_duration_milliseconds_bucket[5m]))",
            "legendFormat": "{{service}} - 95th percentile"
          }
        ],
        "gridPos": {"h": 8, "w": 12, "x": 12, "y": 8}
      }
    ],
    "time": {
      "from": "now-1h",
      "to": "now"
    },
    "refresh": "10s"
  }
}

Start Grafana

Enable and start Grafana, then access the dashboard to verify service mesh metrics visualization.

sudo chown -R grafana:grafana /var/lib/grafana/
sudo systemctl enable --now grafana-server
sudo systemctl status grafana-server
echo "Grafana available at http://localhost:3000 (admin/admin)"

Configure distributed tracing with Jaeger

Install Jaeger

Install Jaeger for distributed tracing across your Consul Connect service mesh.

wget https://github.com/jaegertracing/jaeger/releases/download/v1.47.0/jaeger-1.47.0-linux-amd64.tar.gz
tar -xzf jaeger-1.47.0-linux-amd64.tar.gz
sudo mv jaeger-1.47.0-linux-amd64/jaeger-all-in-one /usr/local/bin/
sudo useradd --no-create-home --shell /bin/false jaeger
sudo mkdir -p /var/lib/jaeger
sudo chown jaeger:jaeger /var/lib/jaeger

Configure Jaeger service

Set up Jaeger to collect traces from Envoy proxies in your service mesh with proper storage configuration.

[Unit]
Description=Jaeger Tracing
After=network.target

[Service]
User=jaeger
Group=jaeger
Type=simple
ExecStart=/usr/local/bin/jaeger-all-in-one \
    --collector.grpc-server.host-port=:14250 \
    --collector.http-server.host-port=:14268 \
    --query.host-port=:16686 \
    --memory.max-traces=50000 \
    --log-level=info
Restart=always

[Install]
WantedBy=multi-user.target

Configure Envoy tracing

Enable distributed tracing in Consul Connect by configuring Envoy proxies to send trace data to Jaeger.

connect {
  enabled = true
  
  proxy_defaults {
    config {
      envoy_tracing_json = jsonencode({
        http = {
          name = "envoy.tracers.zipkin"
          typed_config = {
            "@type" = "type.googleapis.com/envoy.extensions.tracers.zipkin.v3.ZipkinConfig"
            collector_cluster = "jaeger_collector"
            collector_endpoint_version = "HTTP_JSON"
            collector_endpoint = "/api/v2/spans"
            shared_span_context = false
          }
        }
      })
      
      envoy_extra_static_clusters_json = jsonencode({
        jaeger_collector = {
          name = "jaeger_collector"
          connect_timeout = "1s"
          type = "STRICT_DNS"
          lb_policy = "ROUND_ROBIN"
          load_assignment = {
            cluster_name = "jaeger_collector"
            endpoints = [{
              lb_endpoints = [{
                endpoint = {
                  address = {
                    socket_address = {
                      address = "127.0.0.1"
                      port_value = 14268
                    }
                  }
                }
              }]
            }]
          }
        }
      })
    }
  }
}

Start Jaeger and restart Connect proxies

Start the Jaeger service and restart your Connect proxies to enable tracing.

sudo systemctl daemon-reload
sudo systemctl enable --now jaeger
sudo systemctl status jaeger

Restart Consul to pick up tracing configuration

sudo systemctl restart consul

Restart any existing Connect proxies

consul connect proxy -sidecar-for web-service & echo "Jaeger UI available at http://localhost:16686"

Configure OpenTelemetry integration

Install OpenTelemetry Collector

Deploy the OpenTelemetry Collector to provide advanced telemetry processing and export capabilities.

wget https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.80.0/otelcol_0.80.0_linux_amd64.tar.gz
tar -xzf otelcol_0.80.0_linux_amd64.tar.gz
sudo mv otelcol /usr/local/bin/
sudo useradd --no-create-home --shell /bin/false otelcol
sudo mkdir -p /etc/otelcol
sudo chown otelcol:otelcol /etc/otelcol

Configure OpenTelemetry for service mesh

Set up the collector to receive traces from Envoy and export them to Jaeger and metrics to Prometheus.

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  
  zipkin:
    endpoint: 0.0.0.0:9411
    
  prometheus:
    config:
      scrape_configs:
        - job_name: 'envoy-metrics'
          static_configs:
            - targets: ['localhost:9102']

processors:
  batch:
    timeout: 1s
    send_batch_size: 1024
  
  attributes:
    actions:
      - key: service.name
        action: upsert
        from_attribute: service_name
      - key: service.version
        action: upsert
        from_attribute: service_version

exporters:
  jaeger:
    endpoint: http://localhost:14250
    tls:
      insecure: true
  
  prometheus:
    endpoint: "0.0.0.0:8889"
    
  logging:
    loglevel: debug

service:
  pipelines:
    traces:
      receivers: [otlp, zipkin]
      processors: [batch, attributes]
      exporters: [jaeger, logging]
    
    metrics:
      receivers: [prometheus]
      processors: [batch]
      exporters: [prometheus, logging]

Start OpenTelemetry Collector

Create a systemd service for the OpenTelemetry Collector and start it.

[Unit]
Description=OpenTelemetry Collector
After=network.target

[Service]
User=otelcol
Group=otelcol
Type=simple
ExecStart=/usr/local/bin/otelcol --config=/etc/otelcol/config.yaml
Restart=always

[Install]
WantedBy=multi-user.target
sudo chown -R otelcol:otelcol /etc/otelcol/
sudo systemctl daemon-reload
sudo systemctl enable --now otelcol
sudo systemctl status otelcol

Monitor Envoy proxy metrics

Configure enhanced Envoy metrics

Enable comprehensive Envoy metrics collection including circuit breaker status and connection pool metrics.

connect {
  enabled = true
  
  proxy_defaults {
    config {
      envoy_prometheus_bind_addr = "0.0.0.0:9102"
      envoy_stats_bind_addr = "0.0.0.0:9103"
      
      # Enable additional Envoy stats
      envoy_stats_config_json = jsonencode({
        stats_config = {
          histogram_bucket_settings = [
            {
              match = {
                prefix = "http.inbound"
              }
              buckets = [0.5, 1, 5, 10, 25, 50, 100, 250, 500, 1000, 2500, 5000, 10000]
            }
          ]
        }
      })
      
      # Enable circuit breaker and outlier detection stats
      envoy_extra_static_clusters_json = jsonencode({
        circuit_breakers = {
          thresholds = [
            {
              priority = "DEFAULT"
              max_connections = 1024
              max_pending_requests = 256
              max_requests = 1024
              max_retries = 3
            }
          ]
        }
        outlier_detection = {
          consecutive_5xx = 3
          interval = "30s"
          base_ejection_time = "30s"
          max_ejection_percent = 50
        }
      })
    }
  }
}

Create Envoy proxy dashboard

Deploy a specialized dashboard for monitoring Envoy proxy performance and health metrics.

{
  "dashboard": {
    "id": null,
    "title": "Envoy Proxy Metrics",
    "tags": ["envoy", "proxy", "consul-connect"],
    "panels": [
      {
        "title": "Connection Pool Status",
        "type": "graph",
        "targets": [
          {
            "expr": "envoy_cluster_upstream_cx_active",
            "legendFormat": "{{cluster_name}} - Active Connections"
          },
          {
            "expr": "envoy_cluster_upstream_cx_overflow",
            "legendFormat": "{{cluster_name}} - Overflow"
          }
        ]
      },
      {
        "title": "Circuit Breaker Status",
        "type": "stat",
        "targets": [
          {
            "expr": "envoy_cluster_circuit_breakers_default_cx_open",
            "legendFormat": "{{cluster_name}} - Circuit Open"
          }
        ]
      },
      {
        "title": "Request Success Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(envoy_http_inbound_0_0_0_0_20000_http_requests_total{response_code!~\"5..\"}[5m]) / rate(envoy_http_inbound_0_0_0_0_20000_http_requests_total[5m]) * 100",
            "legendFormat": "{{service}} - Success Rate %"
          }
        ]
      },
      {
        "title": "Outlier Detection Events",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(envoy_cluster_outlier_detection_ejections_active[5m])",
            "legendFormat": "{{cluster_name}} - Ejections"
          }
        ]
      }
    ]
  }
}

Restart services for enhanced metrics

Apply the enhanced Envoy configuration by restarting Consul and any running proxies.

sudo systemctl restart consul
sudo systemctl restart grafana-server

Verify metrics endpoints are responding

curl -s http://localhost:9102/metrics | grep envoy_cluster | head -5 curl -s http://localhost:9103/stats | grep circuit_breakers | head -5

Verify your setup

Test the complete monitoring stack by generating some service mesh traffic and verifying that metrics and traces appear in your monitoring systems.

# Check Prometheus targets
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | select(.health == "up") | .labels.job'

Verify Consul metrics

curl -s http://localhost:9090/api/v1/query?query=consul_raft_leader | jq '.data.result[0].value'

Check Envoy proxy metrics

curl -s http://localhost:9090/api/v1/query?query=envoy_cluster_upstream_cx_active | jq '.data.result[].metric'

Generate test traffic through service mesh

for i in {1..10}; do curl -s http://web-service.service.consul/health; sleep 1; done

Verify traces in Jaeger

curl -s http://localhost:16686/api/traces?service=web-service&limit=1 | jq '.data[0].traceID'

Check OpenTelemetry Collector health

curl -s http://localhost:13133/

Verify Grafana can query data

curl -s -u admin:admin http://localhost:3000/api/datasources/proxy/1/api/v1/query?query=up
Note: If some endpoints return errors, check that all services are running and that firewall rules allow the required ports (8500, 9090, 3000, 16686, 9102, 9103).

Common issues

Symptom Cause Fix
No Consul metrics in Prometheus Telemetry not enabled in Consul config Add telemetry block to /etc/consul.d/telemetry.hcl and restart Consul
Envoy proxy targets not discovered Consul service discovery misconfigured Verify consul_sd_configs in Prometheus and check service metadata
No traces appearing in Jaeger Envoy tracing not configured properly Check envoy_tracing_json configuration and restart Connect proxies
Grafana dashboards show no data Prometheus data source not configured Verify Prometheus URL in Grafana data source configuration
High memory usage on Prometheus Too many high-cardinality metrics Add metric_relabel_configs to drop unnecessary labels
OpenTelemetry Collector not receiving data Receiver endpoints not accessible Check firewall rules for ports 4317, 4318, 9411

Next steps

Running this in production?

Want this handled for you? Running this at scale adds a second layer of work: capacity planning, failover drills, cost control, and on-call. See how we run infrastructure like this for European teams.

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle managed devops services for businesses that depend on uptime. From initial setup to ongoing operations.