OpenTelemetry Sampling Strategies Guide

Learn how to implement probabilistic, deterministic, and adaptive sampling strategies in OpenTelemetry to optimize distributed tracing performance and reduce storage costs in high-traffic production environments.

Prerequisites

Root or sudo access
At least 2GB RAM available
Basic understanding of distributed tracing concepts

What this solves

OpenTelemetry sampling strategies help you control the volume of trace data collected from your applications, reducing storage costs and performance overhead while maintaining observability insights. This tutorial shows you how to configure different sampling strategies including probabilistic, deterministic, and adaptive sampling for high-traffic applications that generate millions of traces daily.

Step-by-step configuration

Install OpenTelemetry Collector

Download and install the OpenTelemetry Collector which will handle trace sampling and forwarding.

curl -LO https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.91.0/otelcol_0.91.0_linux_amd64.tar.gz
tar -xzf otelcol_0.91.0_linux_amd64.tar.gz
sudo mv otelcol /usr/local/bin/
sudo chmod +x /usr/local/bin/otelcol

curl -LO https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.91.0/otelcol_0.91.0_linux_amd64.tar.gz
tar -xzf otelcol_0.91.0_linux_amd64.tar.gz
sudo mv otelcol /usr/local/bin/
sudo chmod +x /usr/local/bin/otelcol

Create OpenTelemetry user and directories

Create a dedicated user for running the collector and set up required directories with proper permissions.

sudo useradd --system --shell /bin/false otel
sudo mkdir -p /etc/otelcol /var/log/otelcol /var/lib/otelcol
sudo chown -R otel:otel /etc/otelcol /var/log/otelcol /var/lib/otelcol
sudo chmod 755 /etc/otelcol /var/log/otelcol /var/lib/otelcol

Configure probabilistic sampling

Set up probabilistic sampling which randomly samples a percentage of traces. This is ideal for consistent sampling across all services.

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_http:
        endpoint: 0.0.0.0:14268

processors:
  probabilistic_sampler:
    sampling_percentage: 10.0
  batch:
    timeout: 1s
    send_batch_size: 1024
    send_batch_max_size: 2048
  memory_limiter:
    limit_mib: 512
    spike_limit_mib: 128
    check_interval: 5s

exporters:
  jaeger:
    endpoint: localhost:14250
    tls:
      insecure: true
  logging:
    loglevel: debug

service:
  pipelines:
    traces:
      receivers: [otlp, jaeger]
      processors: [memory_limiter, probabilistic_sampler, batch]
      exporters: [jaeger, logging]
  telemetry:
    logs:
      level: "info"
      development: false
      sampling:
        initial: 5
        thereafter: 200
      output_paths:
        - "/var/log/otelcol/collector.log"
      error_output_paths:
        - "/var/log/otelcol/collector-error.log"

Configure tail-based sampling

Implement tail-based sampling for more intelligent trace selection based on errors, latency, and other criteria.

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  tail_sampling:
    decision_wait: 30s
    num_traces: 100000
    expected_new_traces_per_sec: 1000
    policies:
      - name: errors_policy
        type: status_code
        status_code:
          status_codes: [ERROR]
      - name: latency_policy
        type: latency
        latency:
          threshold_ms: 5000
      - name: probabilistic_policy
        type: probabilistic
        probabilistic:
          sampling_percentage: 5.0
      - name: rate_limiting_policy
        type: rate_limiting
        rate_limiting:
          spans_per_second: 100
  batch:
    timeout: 1s
    send_batch_size: 1024
    send_batch_max_size: 2048
  memory_limiter:
    limit_mib: 1024
    spike_limit_mib: 256
    check_interval: 5s

exporters:
  jaeger:
    endpoint: localhost:14250
    tls:
      insecure: true
  logging:
    loglevel: info

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, tail_sampling, batch]
      exporters: [jaeger, logging]
  telemetry:
    logs:
      level: "info"
      output_paths:
        - "/var/log/otelcol/collector.log"

Configure adaptive sampling with remote configuration

Set up adaptive sampling that adjusts sampling rates dynamically based on traffic patterns and service behavior.

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  groupbytrace:
    wait_duration: 10s
    num_traces: 100000
    num_workers: 4
  tail_sampling:
    decision_wait: 30s
    num_traces: 100000
    expected_new_traces_per_sec: 2000
    policies:
      - name: always_sample_errors
        type: status_code
        status_code:
          status_codes: [ERROR]
      - name: high_latency_traces
        type: latency
        latency:
          threshold_ms: 2000
      - name: service_name_policy
        type: string_attribute
        string_attribute:
          key: service.name
          values: ["critical-service", "payment-service"]
          enabled_regex_matching: false
          invert_match: false
      - name: adaptive_rate_policy
        type: rate_limiting
        rate_limiting:
          spans_per_second: 200
      - name: probabilistic_fallback
        type: probabilistic
        probabilistic:
          sampling_percentage: 1.0
  batch:
    timeout: 1s
    send_batch_size: 2048
    send_batch_max_size: 4096
  memory_limiter:
    limit_mib: 2048
    spike_limit_mib: 512
    check_interval: 5s

exporters:
  jaeger:
    endpoint: localhost:14250
    tls:
      insecure: true
  prometheus:
    endpoint: "0.0.0.0:8889"
    const_labels:
      collector: "otelcol-adaptive"

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, groupbytrace, tail_sampling, batch]
      exporters: [jaeger]
  telemetry:
    logs:
      level: "info"
    metrics:
      level: "detailed"
      address: "0.0.0.0:8888"

Create systemd service file

Configure the OpenTelemetry Collector as a systemd service with proper security settings.

[Unit]
Description=OpenTelemetry Collector
After=network.target
Wants=network-online.target

[Service]
Type=simple
User=otel
Group=otel
ExecStart=/usr/local/bin/otelcol --config=/etc/otelcol/otelcol-adaptive.yaml
Restart=always
RestartSec=10
Environment=OTEL_LOG_LEVEL=info
WorkingDirectory=/var/lib/otelcol
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/var/log/otelcol /var/lib/otelcol
NoNewPrivileges=true
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

Configure log rotation

Set up log rotation to prevent collector logs from consuming too much disk space.

/var/log/otelcol/*.log {
    daily
    missingok
    rotate 30
    compress
    delaycompress
    notifempty
    create 644 otel otel
    postrotate
        systemctl reload otelcol
    endscript
}

Start and enable the service

Enable and start the OpenTelemetry Collector service with your chosen sampling configuration.

sudo systemctl daemon-reload
sudo systemctl enable otelcol
sudo systemctl start otelcol
sudo systemctl status otelcol

Configure application instrumentation

Update your application to send traces to the OpenTelemetry Collector with appropriate service name and attributes.

export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4318"
export OTEL_EXPORTER_OTLP_PROTOCOL="http/protobuf"
export OTEL_SERVICE_NAME="your-service-name"
export OTEL_RESOURCE_ATTRIBUTES="service.version=1.0.0,deployment.environment=production"

Set up monitoring and alerting

Configure Prometheus to scrape collector metrics for monitoring sampling effectiveness.

  - job_name: 'otelcol'
    static_configs:
      - targets: ['localhost:8888']
    scrape_interval: 30s
    metrics_path: /metrics

Understanding OpenTelemetry sampling concepts

OpenTelemetry supports multiple sampling strategies, each with specific use cases. Probabilistic sampling uses a fixed percentage across all traces, while deterministic sampling makes decisions based on trace IDs. Tail-based sampling analyzes complete traces before deciding whether to keep them, allowing for more intelligent decisions based on errors, latency, or custom attributes.

The sampling decision propagates through your distributed system via trace context headers. When a service makes a sampling decision, downstream services inherit that decision, ensuring complete traces are either fully sampled or fully dropped. This prevents incomplete traces that would be difficult to analyze.

Sampling Type	Use Case	Performance Impact	Decision Point
Probabilistic	Consistent percentage across all services	Low CPU, immediate decision	At trace start
Tail-based	Sample based on trace characteristics	Higher memory usage	After trace completion
Adaptive	Dynamic adjustment based on traffic	Moderate CPU overhead	Real-time adjustment

Monitor and optimize sampling performance

Use the collector's built-in metrics to monitor sampling effectiveness. Key metrics include otelcol_processor_sampled_spans_total, otelcol_processor_dropped_spans_total, and otelcol_processor_tail_sampling_policy_decision_total. These help you understand how much data you're reducing and which policies are most active.

Monitor memory usage closely when using tail-based sampling, as it requires buffering traces until sampling decisions are made. Adjust num_traces and decision_wait parameters based on your traffic patterns and available resources. For applications with high trace volume, consider implementing multiple collector instances with load balancing.

Note: Tail-based sampling requires complete traces to make decisions, so ensure your decision_wait parameter is longer than your longest expected trace duration.

Verify your setup

sudo systemctl status otelcol
curl -s http://localhost:8888/metrics | grep sampling
journalctl -u otelcol -f --lines=20

Check that your application is sending traces and that sampling decisions are being logged:

tail -f /var/log/otelcol/collector.log | grep -E "(sampled|dropped)"
curl -s http://localhost:8888/metrics | grep -E "(sampled_spans|dropped_spans)"

Common issues

Symptom	Cause	Fix
High memory usage	Too many traces buffered for tail sampling	Reduce `num_traces` or `decision_wait` parameters
Incomplete traces in backend	decision_wait too short for long traces	Increase `decision_wait` to match your longest trace duration
No sampling decisions logged	Wrong processor order in pipeline	Ensure sampling processors come before batch processor
Service won't start	Configuration syntax error	Run `otelcol --config=/path/to/config --dry-run` to validate
Traces not being forwarded	Exporter connectivity issues	Check network connectivity and exporter endpoint configuration

Next steps

Automated install script

Run this to automate the entire setup

install.sh

#!/usr/bin/env bash

set -euo pipefail

# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color

# Default configuration
OTEL_VERSION="0.91.0"
SAMPLING_TYPE="probabilistic"
SAMPLING_PERCENTAGE="10.0"

# Usage function
usage() {
    echo "Usage: $0 [OPTIONS]"
    echo "Options:"
    echo "  -t, --type        Sampling type: probabilistic or tail (default: probabilistic)"
    echo "  -p, --percentage  Sampling percentage for probabilistic sampling (default: 10.0)"
    echo "  -v, --version     OpenTelemetry Collector version (default: 0.91.0)"
    echo "  -h, --help        Show this help message"
    exit 1
}

# Parse command line arguments
while [[ $# -gt 0 ]]; do
    case $1 in
        -t|--type)
            SAMPLING_TYPE="$2"
            shift 2
            ;;
        -p|--percentage)
            SAMPLING_PERCENTAGE="$2"
            shift 2
            ;;
        -v|--version)
            OTEL_VERSION="$2"
            shift 2
            ;;
        -h|--help)
            usage
            ;;
        *)
            echo -e "${RED}Unknown option: $1${NC}"
            usage
            ;;
    esac
done

# Validate sampling type
if [[ "$SAMPLING_TYPE" != "probabilistic" && "$SAMPLING_TYPE" != "tail" ]]; then
    echo -e "${RED}Error: Sampling type must be 'probabilistic' or 'tail'${NC}"
    exit 1
fi

# Error handling
cleanup() {
    echo -e "${RED}Installation failed. Cleaning up...${NC}"
    sudo rm -f /tmp/otelcol_${OTEL_VERSION}_linux_amd64.tar.gz
    sudo systemctl stop otelcol 2>/dev/null || true
    sudo systemctl disable otelcol 2>/dev/null || true
}

trap cleanup ERR

# Check if running as root or with sudo
if [[ $EUID -eq 0 ]]; then
    echo -e "${YELLOW}Warning: Running as root${NC}"
elif ! sudo -n true 2>/dev/null; then
    echo -e "${RED}Error: This script requires sudo privileges${NC}"
    exit 1
fi

# Detect distribution
echo -e "${BLUE}[1/8] Detecting distribution...${NC}"
if [ -f /etc/os-release ]; then
    . /etc/os-release
    case "$ID" in
        ubuntu|debian)
            PKG_MGR="apt"
            PKG_INSTALL="apt install -y"
            PKG_UPDATE="apt update"
            ;;
        almalinux|rocky|centos|rhel|ol|fedora)
            PKG_MGR="dnf"
            PKG_INSTALL="dnf install -y"
            PKG_UPDATE="dnf check-update || true"
            ;;
        amzn)
            PKG_MGR="yum"
            PKG_INSTALL="yum install -y"
            PKG_UPDATE="yum check-update || true"
            ;;
        *)
            echo -e "${RED}Unsupported distribution: $ID${NC}"
            exit 1
            ;;
    esac
    echo -e "${GREEN}Detected: $PRETTY_NAME${NC}"
else
    echo -e "${RED}Error: Cannot detect distribution${NC}"
    exit 1
fi

# Update package manager
echo -e "${BLUE}[2/8] Updating package manager...${NC}"
sudo $PKG_UPDATE

# Install required packages
echo -e "${BLUE}[3/8] Installing required packages...${NC}"
sudo $PKG_INSTALL curl tar

# Download and install OpenTelemetry Collector
echo -e "${BLUE}[4/8] Downloading OpenTelemetry Collector v${OTEL_VERSION}...${NC}"
cd /tmp
curl -LO "https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v${OTEL_VERSION}/otelcol_${OTEL_VERSION}_linux_amd64.tar.gz"

echo -e "${BLUE}[5/8] Installing OpenTelemetry Collector...${NC}"
tar -xzf "otelcol_${OTEL_VERSION}_linux_amd64.tar.gz"
sudo mv otelcol /usr/local/bin/
sudo chmod 755 /usr/local/bin/otelcol
sudo chown root:root /usr/local/bin/otelcol

# Create OpenTelemetry user and directories
echo -e "${BLUE}[6/8] Creating OpenTelemetry user and directories...${NC}"
sudo useradd --system --shell /bin/false --home-dir /var/lib/otelcol otel 2>/dev/null || echo "User otel already exists"
sudo mkdir -p /etc/otelcol /var/log/otelcol /var/lib/otelcol
sudo chown -R otel:otel /etc/otelcol /var/log/otelcol /var/lib/otelcol
sudo chmod 755 /etc/otelcol /var/log/otelcol /var/lib/otelcol

# Create configuration file based on sampling type
echo -e "${BLUE}[7/8] Creating OpenTelemetry configuration...${NC}"
if [[ "$SAMPLING_TYPE" == "probabilistic" ]]; then
    sudo tee /etc/otelcol/config.yaml > /dev/null <<EOF
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_http:
        endpoint: 0.0.0.0:14268

processors:
  probabilistic_sampler:
    sampling_percentage: ${SAMPLING_PERCENTAGE}
  batch:
    timeout: 1s
    send_batch_size: 1024
    send_batch_max_size: 2048
  memory_limiter:
    limit_mib: 512
    spike_limit_mib: 128
    check_interval: 5s

exporters:
  logging:
    loglevel: info

service:
  pipelines:
    traces:
      receivers: [otlp, jaeger]
      processors: [memory_limiter, probabilistic_sampler, batch]
      exporters: [logging]
  telemetry:
    logs:
      level: "info"
      development: false
      output_paths:
        - "/var/log/otelcol/collector.log"
      error_output_paths:
        - "/var/log/otelcol/collector-error.log"
EOF
else
    sudo tee /etc/otelcol/config.yaml > /dev/null <<EOF
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  tail_sampling:
    decision_wait: 30s
    num_traces: 100000
    expected_new_traces_per_sec: 1000
    policies:
      - name: errors_policy
        type: status_code
        status_code:
          status_codes: [ERROR]
      - name: latency_policy
        type: latency
        latency:
          threshold_ms: 5000
      - name: probabilistic_policy
        type: probabilistic
        probabilistic:
          sampling_percentage: ${SAMPLING_PERCENTAGE}
      - name: rate_limiting_policy
        type: rate_limiting
        rate_limiting:
          spans_per_second: 100
  batch:
    timeout: 1s
    send_batch_size: 1024
    send_batch_max_size: 2048
  memory_limiter:
    limit_mib: 1024
    spike_limit_mib: 256
    check_interval: 5s

exporters:
  logging:
    loglevel: info

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, tail_sampling, batch]
      exporters: [logging]
  telemetry:
    logs:
      level: "info"
      development: false
      output_paths:
        - "/var/log/otelcol/collector.log"
      error_output_paths:
        - "/var/log/otelcol/collector-error.log"
EOF
fi

sudo chown otel:otel /etc/otelcol/config.yaml
sudo chmod 644 /etc/otelcol/config.yaml

# Create systemd service
sudo tee /etc/systemd/system/otelcol.service > /dev/null <<EOF
[Unit]
Description=OpenTelemetry Collector
After=network.target

[Service]
Type=simple
User=otel
Group=otel
ExecStart=/usr/local/bin/otelcol --config=/etc/otelcol/config.yaml
Restart=on-failure
RestartSec=5
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target
EOF

# Configure firewall based on distribution
if command -v ufw >/dev/null 2>&1; then
    sudo ufw allow 4317/tcp comment "OpenTelemetry GRPC"
    sudo ufw allow 4318/tcp comment "OpenTelemetry HTTP"
elif command -v firewall-cmd >/dev/null 2>&1; then
    sudo firewall-cmd --permanent --add-port=4317/tcp
    sudo firewall-cmd --permanent --add-port=4318/tcp
    sudo firewall-cmd --reload
fi

# Start and enable the service
echo -e "${BLUE}[8/8] Starting OpenTelemetry Collector service...${NC}"
sudo systemctl daemon-reload
sudo systemctl enable otelcol
sudo systemctl start otelcol

# Cleanup
sudo rm -f /tmp/otelcol_${OTEL_VERSION}_linux_amd64.tar.gz

# Verification
echo -e "${BLUE}Verifying installation...${NC}"
sleep 3

if sudo systemctl is-active --quiet otelcol; then
    echo -e "${GREEN}✓ OpenTelemetry Collector is running${NC}"
else
    echo -e "${RED}✗ OpenTelemetry Collector failed to start${NC}"
    sudo systemctl status otelcol
    exit 1
fi

if netstat -tuln 2>/dev/null | grep -q ":4317\|:4318" || ss -tuln 2>/dev/null | grep -q ":4317\|:4318"; then
    echo -e "${GREEN}✓ OpenTelemetry Collector is listening on configured ports${NC}"
else
    echo -e "${YELLOW}⚠ Warning: Cannot verify port status${NC}"
fi

echo -e "${GREEN}OpenTelemetry Collector installation completed successfully!${NC}"
echo -e "${BLUE}Configuration:${NC}"
echo "  - Type: $SAMPLING_TYPE sampling"
echo "  - Percentage: $SAMPLING_PERCENTAGE%"
echo "  - Config file: /etc/otelcol/config.yaml"
echo "  - Log files: /var/log/otelcol/"
echo "  - Service: systemctl status otelcol"

Review the script before running. Execute with: bash install.sh

#opentelemetry #distributed-tracing #sampling #observability #performance

Configure OpenTelemetry sampling strategies for high-traffic applications