Configure Consul Connect Service Mesh Monitoring

Set up comprehensive monitoring for Consul Connect service mesh with Prometheus metrics, Grafana dashboards, Jaeger distributed tracing, and Envoy proxy observability for production-grade service mesh operations.

Prerequisites

Running Consul cluster with Connect enabled
At least 8GB RAM for monitoring stack
Root or sudo access
Basic understanding of service mesh concepts

What this solves

Consul Connect service mesh provides secure service-to-service communication, but operating it reliably requires deep observability into service health, proxy performance, and request flows. This tutorial configures comprehensive monitoring with Prometheus metrics collection, Grafana dashboards for service mesh visualization, and distributed tracing with Jaeger and OpenTelemetry to track requests across your entire service topology.

Prerequisites

You need a running Consul cluster with Connect enabled and at least two services configured to communicate through the service mesh. This tutorial builds on existing Consul Connect infrastructure to add monitoring capabilities.

Update system packages

Start by updating your package manager and installing required dependencies for monitoring components.

sudo apt update && sudo apt upgrade -y
sudo apt install -y wget curl unzip jq

sudo dnf update -y
sudo dnf install -y wget curl unzip jq

Configure Consul metrics collection

Enable Consul telemetry

Configure Consul to export metrics in Prometheus format and enable detailed service mesh telemetry.

telemetry {
  prometheus_retention_time = "24h"
  disable_hostname = true
  metrics_prefix = "consul"
}

connect {
  enabled = true
}

ports {
  grpc = 8502
  http = 8500
}

Configure Connect proxy metrics

Enable detailed metrics collection for Envoy proxies managed by Consul Connect.

connect {
  enabled = true
  
  proxy_defaults {
    config {
      envoy_prometheus_bind_addr = "0.0.0.0:9102"
      envoy_stats_bind_addr = "0.0.0.0:9103"
    }
  }
}

Restart Consul services

Apply the new configuration by restarting Consul on all cluster nodes.

sudo systemctl restart consul
sudo systemctl status consul
curl -s http://localhost:8500/v1/agent/metrics?format=prometheus | head -20

Install and configure Prometheus

Install Prometheus

Download and install the latest version of Prometheus for metrics collection and storage.

wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
tar xvfz prometheus-2.45.0.linux-amd64.tar.gz
sudo mv prometheus-2.45.0.linux-amd64/prometheus /usr/local/bin/
sudo mv prometheus-2.45.0.linux-amd64/promtool /usr/local/bin/
sudo useradd --no-create-home --shell /bin/false prometheus
sudo mkdir -p /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /etc/prometheus /var/lib/prometheus

Configure Prometheus for Consul metrics

Set up Prometheus to discover and scrape metrics from Consul servers and Connect proxies automatically.

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'consul'
    static_configs:
      - targets: ['localhost:8500']
    metrics_path: /v1/agent/metrics
    params:
      format: ['prometheus']
    scrape_interval: 5s

  - job_name: 'consul-connect-proxies'
    consul_sd_configs:
      - server: 'localhost:8500'
        services: []
    relabel_configs:
      - source_labels: [__meta_consul_service_metadata_proxy_type]
        regex: connect-proxy
        action: keep
      - source_labels: [__meta_consul_service_port]
        target_label: __address__
        regex: (.*)
        replacement: ${1}:9102
      - source_labels: [__meta_consul_service]
        target_label: service
      - source_labels: [__meta_consul_node]
        target_label: node
    metrics_path: /metrics
    scrape_interval: 5s

  - job_name: 'envoy-admin'
    consul_sd_configs:
      - server: 'localhost:8500'
        services: []
    relabel_configs:
      - source_labels: [__meta_consul_service_metadata_proxy_type]
        regex: connect-proxy
        action: keep
      - source_labels: [__meta_consul_service_port]
        target_label: __address__
        regex: (.*)
        replacement: ${1}:9103
      - source_labels: [__meta_consul_service]
        target_label: service
      - source_labels: [__meta_consul_node]
        target_label: node
    metrics_path: /stats/prometheus
    scrape_interval: 10s

Create Prometheus systemd service

Configure Prometheus to run as a system service with proper permissions and resource limits.

[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
    --config.file /etc/prometheus/prometheus.yml \
    --storage.tsdb.path /var/lib/prometheus/ \
    --web.console.templates=/etc/prometheus/consoles \
    --web.console.libraries=/etc/prometheus/console_libraries \
    --web.listen-address=0.0.0.0:9090 \
    --web.enable-lifecycle \
    --storage.tsdb.retention.time=30d

[Install]
WantedBy=multi-user.target

Start Prometheus

Enable and start the Prometheus service, then verify it can scrape Consul metrics.

sudo chown -R prometheus:prometheus /etc/prometheus/
sudo systemctl daemon-reload
sudo systemctl enable --now prometheus
sudo systemctl status prometheus
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | {job: .labels.job, health: .health}'

Install and configure Grafana dashboards

Install Grafana

Install Grafana for creating comprehensive service mesh monitoring dashboards.

sudo apt install -y apt-transport-https software-properties-common wget
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list
sudo apt update
sudo apt install -y grafana

sudo tee /etc/yum.repos.d/grafana.repo<




Configure Grafana data source
Add Prometheus as a data source for Grafana to visualize Consul Connect metrics.
apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://localhost:9090
    isDefault: true
    editable: true



Create Consul Connect dashboard
Deploy a comprehensive dashboard for monitoring Consul Connect service mesh metrics.
{
  "dashboard": {
    "id": null,
    "title": "Consul Connect Service Mesh",
    "tags": ["consul", "connect", "service-mesh"],
    "timezone": "browser",
    "panels": [
      {
        "title": "Service Health",
        "type": "stat",
        "targets": [
          {
            "expr": "consul_health_service_query_tag{status=\"passing\"}",
            "legendFormat": "Healthy Services"
          }
        ],
        "gridPos": {"h": 8, "w": 6, "x": 0, "y": 0}
      },
      {
        "title": "Proxy Connections",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(envoy_cluster_upstream_cx_connect_total[5m])",
            "legendFormat": "{{service}} - {{cluster_name}}"
          }
        ],
        "gridPos": {"h": 8, "w": 12, "x": 6, "y": 0}
      },
      {
        "title": "Request Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(envoy_http_inbound_0_0_0_0_20000_http_requests_total[5m])",
            "legendFormat": "{{service}} - Requests/sec"
          }
        ],
        "gridPos": {"h": 8, "w": 12, "x": 0, "y": 8}
      },
      {
        "title": "Response Times",
        "type": "graph",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, rate(envoy_http_inbound_0_0_0_0_20000_http_request_duration_milliseconds_bucket[5m]))",
            "legendFormat": "{{service}} - 95th percentile"
          }
        ],
        "gridPos": {"h": 8, "w": 12, "x": 12, "y": 8}
      }
    ],
    "time": {
      "from": "now-1h",
      "to": "now"
    },
    "refresh": "10s"
  }
}



Start Grafana
Enable and start Grafana, then access the dashboard to verify service mesh metrics visualization.
sudo chown -R grafana:grafana /var/lib/grafana/
sudo systemctl enable --now grafana-server
sudo systemctl status grafana-server
echo "Grafana available at http://localhost:3000 (admin/admin)"


Configure distributed tracing with Jaeger


Install Jaeger
Install Jaeger for distributed tracing across your Consul Connect service mesh.
wget https://github.com/jaegertracing/jaeger/releases/download/v1.47.0/jaeger-1.47.0-linux-amd64.tar.gz
tar -xzf jaeger-1.47.0-linux-amd64.tar.gz
sudo mv jaeger-1.47.0-linux-amd64/jaeger-all-in-one /usr/local/bin/
sudo useradd --no-create-home --shell /bin/false jaeger
sudo mkdir -p /var/lib/jaeger
sudo chown jaeger:jaeger /var/lib/jaeger



Configure Jaeger service
Set up Jaeger to collect traces from Envoy proxies in your service mesh with proper storage configuration.
[Unit]
Description=Jaeger Tracing
After=network.target

[Service]
User=jaeger
Group=jaeger
Type=simple
ExecStart=/usr/local/bin/jaeger-all-in-one \
    --collector.grpc-server.host-port=:14250 \
    --collector.http-server.host-port=:14268 \
    --query.host-port=:16686 \
    --memory.max-traces=50000 \
    --log-level=info
Restart=always

[Install]
WantedBy=multi-user.target



Configure Envoy tracing
Enable distributed tracing in Consul Connect by configuring Envoy proxies to send trace data to Jaeger.
connect {
  enabled = true
  
  proxy_defaults {
    config {
      envoy_tracing_json = jsonencode({
        http = {
          name = "envoy.tracers.zipkin"
          typed_config = {
            "@type" = "type.googleapis.com/envoy.extensions.tracers.zipkin.v3.ZipkinConfig"
            collector_cluster = "jaeger_collector"
            collector_endpoint_version = "HTTP_JSON"
            collector_endpoint = "/api/v2/spans"
            shared_span_context = false
          }
        }
      })
      
      envoy_extra_static_clusters_json = jsonencode({
        jaeger_collector = {
          name = "jaeger_collector"
          connect_timeout = "1s"
          type = "STRICT_DNS"
          lb_policy = "ROUND_ROBIN"
          load_assignment = {
            cluster_name = "jaeger_collector"
            endpoints = [{
              lb_endpoints = [{
                endpoint = {
                  address = {
                    socket_address = {
                      address = "127.0.0.1"
                      port_value = 14268
                    }
                  }
                }
              }]
            }]
          }
        }
      })
    }
  }
}



Start Jaeger and restart Connect proxies
Start the Jaeger service and restart your Connect proxies to enable tracing.
sudo systemctl daemon-reload
sudo systemctl enable --now jaeger
sudo systemctl status jaeger

Restart Consul to pick up tracing configuration
sudo systemctl restart consul

Restart any existing Connect proxies
consul connect proxy -sidecar-for web-service &
echo "Jaeger UI available at http://localhost:16686"


Configure OpenTelemetry integration


Install OpenTelemetry Collector
Deploy the OpenTelemetry Collector to provide advanced telemetry processing and export capabilities.
wget https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.80.0/otelcol_0.80.0_linux_amd64.tar.gz
tar -xzf otelcol_0.80.0_linux_amd64.tar.gz
sudo mv otelcol /usr/local/bin/
sudo useradd --no-create-home --shell /bin/false otelcol
sudo mkdir -p /etc/otelcol
sudo chown otelcol:otelcol /etc/otelcol



Configure OpenTelemetry for service mesh
Set up the collector to receive traces from Envoy and export them to Jaeger and metrics to Prometheus.
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  
  zipkin:
    endpoint: 0.0.0.0:9411
    
  prometheus:
    config:
      scrape_configs:
        - job_name: 'envoy-metrics'
          static_configs:
            - targets: ['localhost:9102']

processors:
  batch:
    timeout: 1s
    send_batch_size: 1024
  
  attributes:
    actions:
      - key: service.name
        action: upsert
        from_attribute: service_name
      - key: service.version
        action: upsert
        from_attribute: service_version

exporters:
  jaeger:
    endpoint: http://localhost:14250
    tls:
      insecure: true
  
  prometheus:
    endpoint: "0.0.0.0:8889"
    
  logging:
    loglevel: debug

service:
  pipelines:
    traces:
      receivers: [otlp, zipkin]
      processors: [batch, attributes]
      exporters: [jaeger, logging]
    
    metrics:
      receivers: [prometheus]
      processors: [batch]
      exporters: [prometheus, logging]



Start OpenTelemetry Collector
Create a systemd service for the OpenTelemetry Collector and start it.
[Unit]
Description=OpenTelemetry Collector
After=network.target

[Service]
User=otelcol
Group=otelcol
Type=simple
ExecStart=/usr/local/bin/otelcol --config=/etc/otelcol/config.yaml
Restart=always

[Install]
WantedBy=multi-user.target
sudo chown -R otelcol:otelcol /etc/otelcol/
sudo systemctl daemon-reload
sudo systemctl enable --now otelcol
sudo systemctl status otelcol


Monitor Envoy proxy metrics


Configure enhanced Envoy metrics
Enable comprehensive Envoy metrics collection including circuit breaker status and connection pool metrics.
connect {
  enabled = true
  
  proxy_defaults {
    config {
      envoy_prometheus_bind_addr = "0.0.0.0:9102"
      envoy_stats_bind_addr = "0.0.0.0:9103"
      
      # Enable additional Envoy stats
      envoy_stats_config_json = jsonencode({
        stats_config = {
          histogram_bucket_settings = [
            {
              match = {
                prefix = "http.inbound"
              }
              buckets = [0.5, 1, 5, 10, 25, 50, 100, 250, 500, 1000, 2500, 5000, 10000]
            }
          ]
        }
      })
      
      # Enable circuit breaker and outlier detection stats
      envoy_extra_static_clusters_json = jsonencode({
        circuit_breakers = {
          thresholds = [
            {
              priority = "DEFAULT"
              max_connections = 1024
              max_pending_requests = 256
              max_requests = 1024
              max_retries = 3
            }
          ]
        }
        outlier_detection = {
          consecutive_5xx = 3
          interval = "30s"
          base_ejection_time = "30s"
          max_ejection_percent = 50
        }
      })
    }
  }
}



Create Envoy proxy dashboard
Deploy a specialized dashboard for monitoring Envoy proxy performance and health metrics.
{
  "dashboard": {
    "id": null,
    "title": "Envoy Proxy Metrics",
    "tags": ["envoy", "proxy", "consul-connect"],
    "panels": [
      {
        "title": "Connection Pool Status",
        "type": "graph",
        "targets": [
          {
            "expr": "envoy_cluster_upstream_cx_active",
            "legendFormat": "{{cluster_name}} - Active Connections"
          },
          {
            "expr": "envoy_cluster_upstream_cx_overflow",
            "legendFormat": "{{cluster_name}} - Overflow"
          }
        ]
      },
      {
        "title": "Circuit Breaker Status",
        "type": "stat",
        "targets": [
          {
            "expr": "envoy_cluster_circuit_breakers_default_cx_open",
            "legendFormat": "{{cluster_name}} - Circuit Open"
          }
        ]
      },
      {
        "title": "Request Success Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(envoy_http_inbound_0_0_0_0_20000_http_requests_total{response_code!~\"5..\"}[5m]) / rate(envoy_http_inbound_0_0_0_0_20000_http_requests_total[5m]) * 100",
            "legendFormat": "{{service}} - Success Rate %"
          }
        ]
      },
      {
        "title": "Outlier Detection Events",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(envoy_cluster_outlier_detection_ejections_active[5m])",
            "legendFormat": "{{cluster_name}} - Ejections"
          }
        ]
      }
    ]
  }
}



Restart services for enhanced metrics
Apply the enhanced Envoy configuration by restarting Consul and any running proxies.
sudo systemctl restart consul
sudo systemctl restart grafana-server

Verify metrics endpoints are responding
curl -s http://localhost:9102/metrics | grep envoy_cluster | head -5
curl -s http://localhost:9103/stats | grep circuit_breakers | head -5


Verify your setup
Test the complete monitoring stack by generating some service mesh traffic and verifying that metrics and traces appear in your monitoring systems.

# Check Prometheus targets
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | select(.health == "up") | .labels.job'

Verify Consul metrics
curl -s http://localhost:9090/api/v1/query?query=consul_raft_leader | jq '.data.result[0].value'

Check Envoy proxy metrics
curl -s http://localhost:9090/api/v1/query?query=envoy_cluster_upstream_cx_active | jq '.data.result[].metric'

Generate test traffic through service mesh
for i in {1..10}; do curl -s http://web-service.service.consul/health; sleep 1; done

Verify traces in Jaeger
curl -s http://localhost:16686/api/traces?service=web-service&limit=1 | jq '.data[0].traceID'

Check OpenTelemetry Collector health
curl -s http://localhost:13133/

Verify Grafana can query data
curl -s -u admin:admin http://localhost:3000/api/datasources/proxy/1/api/v1/query?query=up

Note: If some endpoints return errors, check that all services are running and that firewall rules allow the required ports (8500, 9090, 3000, 16686, 9102, 9103).

Common issues



Symptom
Cause
Fix




No Consul metrics in Prometheus
Telemetry not enabled in Consul config
Add telemetry block to /etc/consul.d/telemetry.hcl and restart Consul


Envoy proxy targets not discovered
Consul service discovery misconfigured
Verify consul_sd_configs in Prometheus and check service metadata


No traces appearing in Jaeger
Envoy tracing not configured properly
Check envoy_tracing_json configuration and restart Connect proxies


Grafana dashboards show no data
Prometheus data source not configured
Verify Prometheus URL in Grafana data source configuration


High memory usage on Prometheus
Too many high-cardinality metrics
Add metric_relabel_configs to drop unnecessary labels


OpenTelemetry Collector not receiving data
Receiver endpoints not accessible
Check firewall rules for ports 4317, 4318, 9411




Next steps

Configure Jaeger distributed tracing on Kubernetes cluster with Helm charts and Elasticsearch backend for Kubernetes-based service mesh deployments
Configure OpenTelemetry custom metrics for application monitoring with Prometheus and Grafana for application-level observability
Implement Consul multi-datacenter replication with WAN federation for multi-region service mesh monitoring
Configure Consul Connect service mesh security policies with intentions and ACLs
Setup Consul Connect traffic management with service routing and load balancing


Running this in production?
Want this handled for you? Running this at scale adds a second layer of work: capacity planning, failover drills, cost control, and on-call. See how we run infrastructure like this for European teams.

Symptom	Cause	Fix
No Consul metrics in Prometheus	Telemetry not enabled in Consul config	Add telemetry block to `/etc/consul.d/telemetry.hcl` and restart Consul
Envoy proxy targets not discovered	Consul service discovery misconfigured	Verify consul_sd_configs in Prometheus and check service metadata
No traces appearing in Jaeger	Envoy tracing not configured properly	Check envoy_tracing_json configuration and restart Connect proxies
Grafana dashboards show no data	Prometheus data source not configured	Verify Prometheus URL in Grafana data source configuration
High memory usage on Prometheus	Too many high-cardinality metrics	Add metric_relabel_configs to drop unnecessary labels
OpenTelemetry Collector not receiving data	Receiver endpoints not accessible	Check firewall rules for ports 4317, 4318, 9411



    
            
            
                
                    
                        
                            
                        
                        
                            Automated install script
                            Run this to automate the entire setup
                        
                    
                    
                
                
                    
                        
                            install.sh
                            
                        
                        #!/usr/bin/env bash
set -euo pipefail

# Consul Connect service mesh monitoring install script
# Configures Prometheus, Grafana, and Jaeger for comprehensive monitoring

readonly SCRIPT_NAME=$(basename "$0")
readonly PROMETHEUS_VERSION="2.45.0"
readonly GRAFANA_VERSION="10.0.3"
readonly JAEGER_VERSION="1.47.0"

# Colors
readonly RED='\033[0;31m'
readonly GREEN='\033[0;32m'
readonly YELLOW='\033[1;33m'
readonly NC='\033[0m'

# Cleanup on failure
cleanup() {
    echo -e "${RED}[ERROR] Installation failed. Cleaning up...${NC}"
    systemctl stop prometheus grafana-server jaeger 2>/dev/null || true
    rm -rf /tmp/consul-monitoring-install
}
trap cleanup ERR

# Utility functions
log_info() { echo -e "${GREEN}$1${NC}"; }
log_warn() { echo -e "${YELLOW}$1${NC}"; }
log_error() { echo -e "${RED}$1${NC}"; exit 1; }

usage() {
    cat << EOF
Usage: $SCRIPT_NAME [OPTIONS]
Install Consul Connect service mesh monitoring stack

Options:
    -c, --consul-addr    Consul server address (default: localhost:8500)
    -h, --help          Show this help message

Example:
    $SCRIPT_NAME --consul-addr 10.0.1.100:8500
EOF
    exit 1
}

# Parse arguments
CONSUL_ADDR="localhost:8500"
while [[ $# -gt 0 ]]; do
    case $1 in
        -c|--consul-addr) CONSUL_ADDR="$2"; shift 2 ;;
        -h|--help) usage ;;
        *) log_error "Unknown option: $1. Use -h for help." ;;
    esac
done

# Check prerequisites
[[ $EUID -eq 0 ]] || log_error "This script must be run as root"
command -v systemctl >/dev/null || log_error "systemd is required"

# Detect distribution
if [ -f /etc/os-release ]; then
    . /etc/os-release
    case "$ID" in
        ubuntu|debian) PKG_MGR="apt"; PKG_UPDATE="apt update"; PKG_INSTALL="apt install -y" ;;
        almalinux|rocky|centos|rhel|ol|fedora) PKG_MGR="dnf"; PKG_UPDATE="dnf update -y"; PKG_INSTALL="dnf install -y" ;;
        amzn) PKG_MGR="yum"; PKG_UPDATE="yum update -y"; PKG_INSTALL="yum install -y" ;;
        *) log_error "Unsupported distribution: $ID" ;;
    esac
else
    log_error "Cannot detect distribution"
fi

# Create working directory
mkdir -p /tmp/consul-monitoring-install
cd /tmp/consul-monitoring-install

log_info "[1/8] Updating system packages..."
$PKG_UPDATE
$PKG_INSTALL wget curl unzip jq

log_info "[2/8] Verifying Consul connectivity..."
curl -sf "http://${CONSUL_ADDR}/v1/status/leader" >/dev/null || log_error "Cannot connect to Consul at $CONSUL_ADDR"

log_info "[3/8] Installing Prometheus..."
wget -q "https://github.com/prometheus/prometheus/releases/download/v${PROMETHEUS_VERSION}/prometheus-${PROMETHEUS_VERSION}.linux-amd64.tar.gz"
tar xzf "prometheus-${PROMETHEUS_VERSION}.linux-amd64.tar.gz"
install -o root -g root -m 755 "prometheus-${PROMETHEUS_VERSION}.linux-amd64/prometheus" /usr/local/bin/
install -o root -g root -m 755 "prometheus-${PROMETHEUS_VERSION}.linux-amd64/promtool" /usr/local/bin/

# Create prometheus user and directories
useradd --system --no-create-home --shell /bin/false prometheus || true
mkdir -p /etc/prometheus /var/lib/prometheus
chown prometheus:prometheus /var/lib/prometheus
chmod 750 /var/lib/prometheus

# Configure Prometheus
cat > /etc/prometheus/prometheus.yml << 'EOF'
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'consul'
    static_configs:
      - targets: ['CONSUL_ADDR']
    metrics_path: /v1/agent/metrics
    params:
      format: ['prometheus']
    scrape_interval: 5s

  - job_name: 'consul-connect-proxies'
    consul_sd_configs:
      - server: 'CONSUL_ADDR'
        services: []
    relabel_configs:
      - source_labels: [__meta_consul_service_metadata_proxy_type]
        regex: connect-proxy
        action: keep
      - source_labels: [__meta_consul_service_address, __meta_consul_service_port]
        target_label: __address__
        regex: (.*);(.*)
        replacement: ${1}:9102
      - source_labels: [__meta_consul_service]
        target_label: service
    metrics_path: /stats/prometheus
    scrape_interval: 5s
EOF
sed -i "s/CONSUL_ADDR/${CONSUL_ADDR}/g" /etc/prometheus/prometheus.yml
chown prometheus:prometheus /etc/prometheus/prometheus.yml
chmod 644 /etc/prometheus/prometheus.yml

# Create Prometheus systemd service
cat > /etc/systemd/system/prometheus.service << 'EOF'
[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io/docs/
After=network.target

[Service]
User=prometheus
Group=prometheus
Type=simple
Restart=on-failure
ExecStart=/usr/local/bin/prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --storage.tsdb.path=/var/lib/prometheus/ \
  --web.console.templates=/etc/prometheus/consoles \
  --web.console.libraries=/etc/prometheus/console_libraries \
  --web.listen-address=0.0.0.0:9090 \
  --web.external-url=

[Install]
WantedBy=multi-user.target
EOF

log_info "[4/8] Installing Grafana..."
if [[ "$PKG_MGR" == "apt" ]]; then
    wget -q -O - https://packages.grafana.com/gpg.key | gpg --dearmor > /usr/share/keyrings/grafana.gpg
    echo "deb [signed-by=/usr/share/keyrings/grafana.gpg] https://packages.grafana.com/oss/deb stable main" > /etc/apt/sources.list.d/grafana.list
    apt update
    $PKG_INSTALL grafana
else
    cat > /etc/yum.repos.d/grafana.repo << 'EOF'
[grafana]
name=grafana
baseurl=https://packages.grafana.com/oss/rpm
repo_gpgcheck=1
enabled=1
gpgcheck=1
gpgkey=https://packages.grafana.com/gpg.key
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
EOF
    $PKG_INSTALL grafana
fi

log_info "[5/8] Installing Jaeger..."
wget -q "https://github.com/jaegertracing/jaeger/releases/download/v${JAEGER_VERSION}/jaeger-${JAEGER_VERSION}-linux-amd64.tar.gz"
tar xzf "jaeger-${JAEGER_VERSION}-linux-amd64.tar.gz"
install -o root -g root -m 755 "jaeger-${JAEGER_VERSION}-linux-amd64/jaeger-all-in-one" /usr/local/bin/

# Create jaeger user and directories
useradd --system --no-create-home --shell /bin/false jaeger || true
mkdir -p /var/lib/jaeger
chown jaeger:jaeger /var/lib/jaeger
chmod 750 /var/lib/jaeger

# Create Jaeger systemd service
cat > /etc/systemd/system/jaeger.service << 'EOF'
[Unit]
Description=Jaeger all-in-one
Documentation=https://www.jaegertracing.io/
After=network.target

[Service]
User=jaeger
Group=jaeger
Type=simple
Restart=on-failure
ExecStart=/usr/local/bin/jaeger-all-in-one \
  --collector.grpc-tls.enabled=false \
  --collector.http-tls.enabled=false

[Install]
WantedBy=multi-user.target
EOF

log_info "[6/8] Configuring Grafana datasources..."
mkdir -p /etc/grafana/provisioning/datasources
cat > /etc/grafana/provisioning/datasources/prometheus.yml << 'EOF'
apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://localhost:9090
    isDefault: true
  - name: Jaeger
    type: jaeger
    access: proxy
    url: http://localhost:16686
EOF
chown grafana:grafana /etc/grafana/provisioning/datasources/prometheus.yml

log_info "[7/8] Starting services..."
systemctl daemon-reload
systemctl enable prometheus grafana-server jaeger
systemctl start prometheus
systemctl start grafana-server
systemctl start jaeger

# Configure firewall
if command -v firewall-cmd >/dev/null 2>&1 && systemctl is-active firewalld >/dev/null 2>&1; then
    firewall-cmd --permanent --add-port=3000/tcp  # Grafana
    firewall-cmd --permanent --add-port=9090/tcp  # Prometheus
    firewall-cmd --permanent --add-port=16686/tcp # Jaeger UI
    firewall-cmd --reload
elif command -v ufw >/dev/null 2>&1; then
    ufw allow 3000/tcp
    ufw allow 9090/tcp
    ufw allow 16686/tcp
fi

log_info "[8/8] Verifying installation..."
sleep 10

# Verify services
for service in prometheus grafana-server jaeger; do
    if ! systemctl is-active "$service" >/dev/null; then
        log_error "$service is not running"
    fi
done

# Verify endpoints
endpoints=(
    "localhost:9090/-/healthy"
    "localhost:3000/api/health"
    "localhost:16686/"
)

for endpoint in "${endpoints[@]}"; do
    if ! curl -sf "http://$endpoint" >/dev/null; then
        log_warn "Warning: $endpoint not responding"
    fi
done

# Cleanup
rm -rf /tmp/consul-monitoring-install

log_info "✅ Consul Connect monitoring stack installed successfully!"
echo ""
echo "Access URLs:"
echo "  Grafana:    http://$(hostname -I | awk '{print $1}'):3000 (admin/admin)"
echo "  Prometheus: http://$(hostname -I | awk '{print $1}'):9090"
echo "  Jaeger:     http://$(hostname -I | awk '{print $1}'):16686"
echo ""
echo "Next steps:"
echo "1. Configure Consul agents with telemetry enabled"
echo "2. Enable Connect proxy metrics on your services"
echo "3. Import Consul Connect dashboards in Grafana"
echo "4. Configure OpenTelemetry in your applications"
                    
                    Review the script before running. Execute with: bash install.sh
                
            
        
    
    
            
                            #consul
                            #connect
                            #service-mesh
                            #monitoring
                            #distributed-tracing

Configure Consul Connect service mesh monitoring with distributed tracing

Prerequisites

What this solves

Prerequisites

Update system packages

Configure Consul metrics collection

Enable Consul telemetry

Configure Connect proxy metrics

Restart Consul services

Install and configure Prometheus

Install Prometheus

Configure Prometheus for Consul metrics

Create Prometheus systemd service

Start Prometheus

Install and configure Grafana dashboards

Install Grafana

Configure Grafana data source

Create Consul Connect dashboard

Start Grafana

Configure distributed tracing with Jaeger

Install Jaeger

Configure Jaeger service

Configure Envoy tracing

Start Jaeger and restart Connect proxies

Restart Consul to pick up tracing configuration

Restart any existing Connect proxies

Configure OpenTelemetry integration

Install OpenTelemetry Collector

Configure OpenTelemetry for service mesh

Start OpenTelemetry Collector

Monitor Envoy proxy metrics

Configure enhanced Envoy metrics

Create Envoy proxy dashboard

Restart services for enhanced metrics

Verify metrics endpoints are responding

Verify your setup

Verify Consul metrics

Check Envoy proxy metrics

Generate test traffic through service mesh

Verify traces in Jaeger

Check OpenTelemetry Collector health

Verify Grafana can query data

Common issues

Next steps

Running this in production?

Related tutorials

Configure OpenTelemetry custom metrics for application monitoring with Prometheus and Grafana

Configure Jaeger with Elasticsearch backend security and encryption

Configure ScyllaDB cluster monitoring with Prometheus and Grafana dashboards

Don't want to manage this yourself?