Setup Jaeger Sampling Strategies for High-Volume Production

Configure advanced Jaeger sampling strategies to efficiently capture traces in high-traffic production environments while controlling storage costs and maintaining observability.

Prerequisites

Existing Jaeger installation
Prometheus for metrics
Root or sudo access
Basic understanding of distributed tracing

What this solves

In high-volume production environments, tracing every request creates overwhelming data volumes and storage costs. Jaeger sampling strategies help you capture meaningful traces while controlling resource usage. This tutorial shows you how to implement adaptive sampling, per-service policies, and remote sampling configuration for production-scale distributed tracing.

Prerequisites

You need a running Jaeger deployment with Elasticsearch or another storage backend. If you don't have this yet, follow our Jaeger Kubernetes deployment guide.

Understanding sampling strategies

Jaeger supports several sampling strategies that determine which traces to collect:

Strategy Type	Use Case	Configuration
Const	Fixed percentage sampling	Always sample X% of traces
Probabilistic	Random sampling	Sample based on trace ID
RateLimiting	Maximum traces per second	Cap at N traces/second
Adaptive	Dynamic adjustment	Adjust based on traffic patterns
PerService	Service-specific rules	Different rates per service

Step-by-step configuration

Create sampling strategies configuration

Create a JSON configuration file that defines your sampling strategies. This file tells Jaeger how to sample traces for different services and operations.

{
  "default_strategy": {
    "type": "probabilistic",
    "param": 0.1
  },
  "per_service_strategies": [
    {
      "service": "frontend-service",
      "type": "probabilistic",
      "param": 0.5,
      "max_traces_per_second": 100
    },
    {
      "service": "payment-service",
      "type": "probabilistic",
      "param": 1.0,
      "max_traces_per_second": 50
    },
    {
      "service": "logging-service",
      "type": "probabilistic",
      "param": 0.01,
      "max_traces_per_second": 10
    },
    {
      "service": "health-check",
      "type": "probabilistic",
      "param": 0.001
    }
  ],
  "per_operation_strategies": [
    {
      "service": "frontend-service",
      "operation": "GET /health",
      "type": "probabilistic",
      "param": 0.001
    },
    {
      "service": "api-gateway",
      "operation": "POST /api/orders",
      "type": "probabilistic",
      "param": 0.8,
      "max_traces_per_second": 200
    }
  ]
}

Configure Jaeger Collector with sampling strategies

Update your Jaeger Collector configuration to use the sampling strategies file. This enables remote sampling where the collector serves sampling decisions to clients.

sampling:
  strategies-file: /etc/jaeger/sampling_strategies.json
  strategies-reload-interval: 30s

http-server:
  host-port: :14268

grpc-server:
  host-port: :14250

processors:
  batch:
    timeout: 1s
    send-batch-size: 1024
    send-batch-max-size: 2048

Setup adaptive sampling with volume control

Create an advanced configuration that adapts sampling rates based on traffic volume and service importance.

{
  "default_strategy": {
    "type": "adaptive",
    "max_traces_per_second": 500,
    "param": 0.1
  },
  "per_service_strategies": [
    {
      "service": "user-service",
      "type": "adaptive",
      "param": 0.3,
      "max_traces_per_second": 100,
      "operation_strategies": [
        {
          "operation": "login",
          "type": "probabilistic",
          "param": 0.8
        },
        {
          "operation": "register",
          "type": "probabilistic",
          "param": 1.0
        }
      ]
    },
    {
      "service": "database-service",
      "type": "rate_limiting",
      "param": 50
    },
    {
      "service": "cache-service",
      "type": "probabilistic",
      "param": 0.05,
      "max_traces_per_second": 20
    }
  ]
}

Configure environment-specific sampling

Create different sampling configurations for development, staging, and production environments.

{
  "default_strategy": {
    "type": "probabilistic",
    "param": 0.01
  },
  "per_service_strategies": [
    {
      "service": "critical-payment-service",
      "type": "probabilistic",
      "param": 0.5,
      "max_traces_per_second": 1000
    },
    {
      "service": "user-analytics",
      "type": "probabilistic",
      "param": 0.001,
      "max_traces_per_second": 10
    }
  ]
}

{
  "default_strategy": {
    "type": "probabilistic",
    "param": 1.0
  },
  "per_service_strategies": [
    {
      "service": "test-service",
      "type": "probabilistic",
      "param": 1.0
    }
  ]
}

Enable remote sampling in Jaeger Collector

Configure the Jaeger Collector to serve sampling strategies to client applications over HTTP.

sudo systemctl stop jaeger-collector

[Unit]
Description=Jaeger Collector
After=network.target

[Service]
Type=simple
User=jaeger
Group=jaeger
ExecStart=/usr/local/bin/jaeger-collector \
  --config-file=/etc/jaeger/collector.yaml \
  --sampling.strategies-file=/etc/jaeger/production_sampling.json \
  --sampling.strategies-reload-interval=60s \
  --collector.http-server.host-port=:14268 \
  --collector.grpc-server.host-port=:14250
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

sudo systemctl daemon-reload
sudo systemctl start jaeger-collector
sudo systemctl status jaeger-collector

Configure client applications for remote sampling

Update your application configuration to fetch sampling strategies from the Jaeger Collector instead of using local configuration.

package main

import (
    "github.com/uber/jaeger-client-go/config"
    "github.com/uber/jaeger-client-go"
)

func initJaeger() {
    cfg := config.Configuration{
        ServiceName: "my-service",
        Sampler: &config.SamplerConfig{
            Type: jaeger.SamplerTypeRemote,
            Param: 0.1, // fallback sampling rate
            SamplingServerURL: "http://jaeger-collector:14268/api/sampling",
            SamplingRefreshInterval: 60,
        },
        Reporter: &config.ReporterConfig{
            LocalAgentHostPort: "jaeger-agent:6831",
        },
    }
    
    tracer, closer, err := cfg.NewTracer()
    if err != nil {
        panic(err)
    }
    defer closer.Close()
}

Setup sampling strategy monitoring

Create a monitoring script to track sampling effectiveness and adjust strategies based on metrics.

#!/bin/bash

# Get sampling stats from Jaeger
SAMPLING_URL="http://localhost:14268/api/sampling"
METRICS_URL="http://localhost:14269/metrics"

# Check current sampling strategies
echo "Current sampling strategies:"
curl -s $SAMPLING_URL | jq .

# Get trace volume metrics
echo "\nTrace volume metrics:"
curl -s $METRICS_URL | grep jaeger_collector_traces_received_total

# Check storage usage
echo "\nStorage usage:"
curl -s $METRICS_URL | grep jaeger_collector_spans_saved_total

# Calculate sampling efficiency
RECEIVED=$(curl -s $METRICS_URL | grep jaeger_collector_traces_received_total | tail -1 | awk '{print $2}')
SAVED=$(curl -s $METRICS_URL | grep jaeger_collector_spans_saved_total | tail -1 | awk '{print $2}')

if [ "$RECEIVED" -gt 0 ]; then
    EFFICIENCY=$(echo "scale=2; $SAVED / $RECEIVED * 100" | bc)
    echo "\nSampling efficiency: $EFFICIENCY%"
fi

sudo chmod +x /usr/local/bin/monitor-sampling.sh

Create automated sampling adjustment script

Implement a script that automatically adjusts sampling rates based on system load and storage capacity.

#!/usr/bin/env python3
import json
import requests
import time
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class SamplingAdjuster:
    def __init__(self, collector_url, strategies_file):
        self.collector_url = collector_url
        self.strategies_file = strategies_file
        
    def get_current_load(self):
        """Get current trace volume from metrics"""
        try:
            response = requests.get(f"{self.collector_url}/metrics")
            metrics = response.text
            
            # Extract trace rate (traces per second)
            for line in metrics.split('\n'):
                if 'jaeger_collector_traces_received_total' in line:
                    return float(line.split()[-1])
        except Exception as e:
            logger.error(f"Failed to get metrics: {e}")
        return 0
        
    def adjust_sampling_rate(self, current_load):
        """Adjust sampling based on load"""
        with open(self.strategies_file, 'r') as f:
            strategies = json.load(f)
            
        # Adjust default strategy based on load
        if current_load > 10000:  # High load
            strategies['default_strategy']['param'] = 0.01
        elif current_load > 1000:  # Medium load  
            strategies['default_strategy']['param'] = 0.05
        else:  # Low load
            strategies['default_strategy']['param'] = 0.1
            
        # Write updated strategies
        with open(self.strategies_file, 'w') as f:
            json.dump(strategies, f, indent=2)
            
        logger.info(f"Adjusted sampling for load: {current_load}")
        
def main():
    adjuster = SamplingAdjuster(
        collector_url="http://localhost:14268",
        strategies_file="/etc/jaeger/production_sampling.json"
    )
    
    while True:
        load = adjuster.get_current_load()
        adjuster.adjust_sampling_rate(load)
        time.sleep(300)  # Check every 5 minutes
        
if __name__ == "__main__":
    main()

sudo chmod +x /usr/local/bin/adjust-sampling.py

Setup sampling strategy validation

Create a validation script to ensure sampling configurations are working correctly.

#!/bin/bash

JAEGER_COLLECTOR="http://localhost:14268"
JAEGER_QUERY="http://localhost:16686"

echo "Validating Jaeger sampling configuration..."

# Test sampling endpoint
echo "1. Testing sampling endpoint:"
SAMPLING_RESPONSE=$(curl -s -w "%{http_code}" $JAEGER_COLLECTOR/api/sampling)
HTTP_CODE=${SAMPLING_RESPONSE: -3}

if [ "$HTTP_CODE" = "200" ]; then
    echo "✓ Sampling endpoint accessible"
else
    echo "✗ Sampling endpoint failed (HTTP $HTTP_CODE)"
    exit 1
fi

# Validate JSON structure
echo "\n2. Validating sampling strategy JSON:"
SAMPLING_JSON=$(curl -s $JAEGER_COLLECTOR/api/sampling)
echo $SAMPLING_JSON | jq . > /dev/null 2>&1
if [ $? -eq 0 ]; then
    echo "✓ Valid JSON structure"
else
    echo "✗ Invalid JSON structure"
    exit 1
fi

# Check for required fields
echo "\n3. Checking required fields:"
HAS_DEFAULT=$(echo $SAMPLING_JSON | jq -r '.default_strategy.type')
if [ "$HAS_DEFAULT" != "null" ] && [ "$HAS_DEFAULT" != "" ]; then
    echo "✓ Default strategy configured"
else
    echo "✗ Missing default strategy"
fi

# Test trace collection
echo "\n4. Testing trace collection:"
TRACE_COUNT=$(curl -s "$JAEGER_QUERY/api/traces?limit=1" | jq -r '.data | length')
if [ "$TRACE_COUNT" -gt 0 ]; then
    echo "✓ Traces are being collected"
else
    echo "! No recent traces found (this may be normal)"
fi

echo "\nSampling validation complete."

sudo chmod +x /usr/local/bin/validate-sampling.sh

Configure per-service sampling policies

Create service-tier based sampling

Implement different sampling rates based on service criticality and business importance.

{
  "default_strategy": {
    "type": "probabilistic",
    "param": 0.1
  },
  "per_service_strategies": [
    {
      "service": "tier1-payment-gateway",
      "type": "probabilistic",
      "param": 0.8,
      "max_traces_per_second": 500,
      "operation_strategies": [
        {
          "operation": "process_payment",
          "type": "probabilistic",
          "param": 1.0
        },
        {
          "operation": "refund_payment",
          "type": "probabilistic",
          "param": 1.0
        }
      ]
    },
    {
      "service": "tier2-user-service",
      "type": "probabilistic",
      "param": 0.3,
      "max_traces_per_second": 200
    },
    {
      "service": "tier3-analytics",
      "type": "probabilistic",
      "param": 0.05,
      "max_traces_per_second": 50
    },
    {
      "service": "tier4-background-jobs",
      "type": "probabilistic",
      "param": 0.01,
      "max_traces_per_second": 10
    }
  ]
}

Setup error-based sampling boost

Configure higher sampling rates for services experiencing errors to improve debugging.

{
  "default_strategy": {
    "type": "probabilistic",
    "param": 0.1
  },
  "per_service_strategies": [
    {
      "service": "error-prone-service",
      "type": "probabilistic",
      "param": 0.5,
      "max_traces_per_second": 100,
      "operation_strategies": [
        {
          "operation": "failing_endpoint",
          "type": "probabilistic",
          "param": 1.0
        }
      ]
    }
  ],
  "per_operation_strategies": [
    {
      "service": "*",
      "operation": "*error*",
      "type": "probabilistic",
      "param": 0.8
    },
    {
      "service": "*", 
      "operation": "*exception*",
      "type": "probabilistic",
      "param": 0.8
    }
  ]
}

Setup remote sampling with Jaeger Collector

Configure collector for high availability

Setup multiple Jaeger Collectors with load balancing for sampling strategy distribution.

sampling:
  strategies-file: /etc/jaeger/production_sampling.json
  strategies-reload-interval: 30s

http-server:
  host-port: 0.0.0.0:14268

grpc-server: 
  host-port: 0.0.0.0:14250

span-storage:
  type: elasticsearch
  
elasticsearch:
  server-urls: http://elasticsearch-1:9200,http://elasticsearch-2:9200
  index-prefix: jaeger
  
processors:
  batch:
    timeout: 1s
    send-batch-size: 2048
    send-batch-max-size: 4096

metrics-storage:
  type: prometheus

Create sampling strategy hot reload

Implement a system to update sampling strategies without restarting the collector.

#!/bin/bash

STRATEGIES_FILE="/etc/jaeger/production_sampling.json"
COLLECTOR_PID_FILE="/var/run/jaeger-collector.pid"
BACKUP_DIR="/var/backups/jaeger"

# Create backup
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
sudo mkdir -p $BACKUP_DIR
sudo cp $STRATEGIES_FILE "$BACKUP_DIR/sampling_strategies_$TIMESTAMP.json"

# Validate new configuration
echo "Validating new sampling configuration..."
if ! jq . "$STRATEGIES_FILE" > /dev/null 2>&1; then
    echo "Error: Invalid JSON in strategies file"
    exit 1
fi

# Send SIGHUP to collector for hot reload
if [ -f "$COLLECTOR_PID_FILE" ]; then
    PID=$(cat $COLLECTOR_PID_FILE)
    if kill -0 $PID 2>/dev/null; then
        echo "Reloading sampling strategies..."
        kill -HUP $PID
        echo "Sampling strategies reloaded successfully"
    else
        echo "Collector process not found, restarting service..."
        sudo systemctl restart jaeger-collector
    fi
else
    echo "PID file not found, restarting service..."
    sudo systemctl restart jaeger-collector
fi

# Verify reload
sleep 2
echo "Verifying configuration reload..."
curl -s http://localhost:14268/api/sampling | jq . > /dev/null
if [ $? -eq 0 ]; then
    echo "✓ Sampling strategies successfully reloaded"
else
    echo "✗ Failed to reload sampling strategies"
    exit 1
fi

sudo chmod +x /usr/local/bin/reload-sampling.sh

Monitor and optimize sampling performance

Setup Prometheus metrics collection

Configure Prometheus to scrape Jaeger metrics for sampling analysis. This helps you monitor sampling effectiveness and storage impact.

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'jaeger-collector'
    static_configs:
      - targets: ['localhost:14269']
    scrape_interval: 10s
    metrics_path: /metrics
    
  - job_name: 'jaeger-agent'
    static_configs:
      - targets: ['localhost:14271']
    scrape_interval: 30s
    
  - job_name: 'jaeger-query'
    static_configs:
      - targets: ['localhost:16687']
    scrape_interval: 30s

Create sampling performance dashboard

Setup Grafana dashboard to visualize sampling metrics and trace volumes.

#jaeger #distributed-tracing #sampling #observability #performance

Setup Jaeger sampling strategies for high-volume production tracing

Prerequisites

What this solves

Prerequisites

Understanding sampling strategies

Step-by-step configuration

Create sampling strategies configuration

Configure Jaeger Collector with sampling strategies

Setup adaptive sampling with volume control

Configure environment-specific sampling

Enable remote sampling in Jaeger Collector

Configure client applications for remote sampling

Setup sampling strategy monitoring

Create automated sampling adjustment script

Setup sampling strategy validation

Configure per-service sampling policies

Create service-tier based sampling

Setup error-based sampling boost

Setup remote sampling with Jaeger Collector

Configure collector for high availability

Create sampling strategy hot reload

Monitor and optimize sampling performance

Setup Prometheus metrics collection

Create sampling performance dashboard

संबंधित tutorials

Configure Consul Connect service mesh monitoring with distributed tracing

Configure OpenTelemetry custom metrics for application monitoring with Prometheus and Grafana

Configure Jaeger with Elasticsearch backend security and encryption

इसे खुद मैनेज नहीं करना चाहते?