Learn how to implement probabilistic, deterministic, and adaptive sampling strategies in OpenTelemetry to optimize distributed tracing performance and reduce storage costs in high-traffic production environments.
Prerequisites
- Root or sudo access
- At least 2GB RAM available
- Basic understanding of distributed tracing concepts
What this solves
OpenTelemetry sampling strategies help you control the volume of trace data collected from your applications, reducing storage costs and performance overhead while maintaining observability insights. This tutorial shows you how to configure different sampling strategies including probabilistic, deterministic, and adaptive sampling for high-traffic applications that generate millions of traces daily.
Step-by-step configuration
Install OpenTelemetry Collector
Download and install the OpenTelemetry Collector which will handle trace sampling and forwarding.
curl -LO https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.91.0/otelcol_0.91.0_linux_amd64.tar.gz
tar -xzf otelcol_0.91.0_linux_amd64.tar.gz
sudo mv otelcol /usr/local/bin/
sudo chmod +x /usr/local/bin/otelcol
Create OpenTelemetry user and directories
Create a dedicated user for running the collector and set up required directories with proper permissions.
sudo useradd --system --shell /bin/false otel
sudo mkdir -p /etc/otelcol /var/log/otelcol /var/lib/otelcol
sudo chown -R otel:otel /etc/otelcol /var/log/otelcol /var/lib/otelcol
sudo chmod 755 /etc/otelcol /var/log/otelcol /var/lib/otelcol
Configure probabilistic sampling
Set up probabilistic sampling which randomly samples a percentage of traces. This is ideal for consistent sampling across all services.
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
jaeger:
protocols:
grpc:
endpoint: 0.0.0.0:14250
thrift_http:
endpoint: 0.0.0.0:14268
processors:
probabilistic_sampler:
sampling_percentage: 10.0
batch:
timeout: 1s
send_batch_size: 1024
send_batch_max_size: 2048
memory_limiter:
limit_mib: 512
spike_limit_mib: 128
check_interval: 5s
exporters:
jaeger:
endpoint: localhost:14250
tls:
insecure: true
logging:
loglevel: debug
service:
pipelines:
traces:
receivers: [otlp, jaeger]
processors: [memory_limiter, probabilistic_sampler, batch]
exporters: [jaeger, logging]
telemetry:
logs:
level: "info"
development: false
sampling:
initial: 5
thereafter: 200
output_paths:
- "/var/log/otelcol/collector.log"
error_output_paths:
- "/var/log/otelcol/collector-error.log"
Configure tail-based sampling
Implement tail-based sampling for more intelligent trace selection based on errors, latency, and other criteria.
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
tail_sampling:
decision_wait: 30s
num_traces: 100000
expected_new_traces_per_sec: 1000
policies:
- name: errors_policy
type: status_code
status_code:
status_codes: [ERROR]
- name: latency_policy
type: latency
latency:
threshold_ms: 5000
- name: probabilistic_policy
type: probabilistic
probabilistic:
sampling_percentage: 5.0
- name: rate_limiting_policy
type: rate_limiting
rate_limiting:
spans_per_second: 100
batch:
timeout: 1s
send_batch_size: 1024
send_batch_max_size: 2048
memory_limiter:
limit_mib: 1024
spike_limit_mib: 256
check_interval: 5s
exporters:
jaeger:
endpoint: localhost:14250
tls:
insecure: true
logging:
loglevel: info
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, tail_sampling, batch]
exporters: [jaeger, logging]
telemetry:
logs:
level: "info"
output_paths:
- "/var/log/otelcol/collector.log"
Configure adaptive sampling with remote configuration
Set up adaptive sampling that adjusts sampling rates dynamically based on traffic patterns and service behavior.
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
groupbytrace:
wait_duration: 10s
num_traces: 100000
num_workers: 4
tail_sampling:
decision_wait: 30s
num_traces: 100000
expected_new_traces_per_sec: 2000
policies:
- name: always_sample_errors
type: status_code
status_code:
status_codes: [ERROR]
- name: high_latency_traces
type: latency
latency:
threshold_ms: 2000
- name: service_name_policy
type: string_attribute
string_attribute:
key: service.name
values: ["critical-service", "payment-service"]
enabled_regex_matching: false
invert_match: false
- name: adaptive_rate_policy
type: rate_limiting
rate_limiting:
spans_per_second: 200
- name: probabilistic_fallback
type: probabilistic
probabilistic:
sampling_percentage: 1.0
batch:
timeout: 1s
send_batch_size: 2048
send_batch_max_size: 4096
memory_limiter:
limit_mib: 2048
spike_limit_mib: 512
check_interval: 5s
exporters:
jaeger:
endpoint: localhost:14250
tls:
insecure: true
prometheus:
endpoint: "0.0.0.0:8889"
const_labels:
collector: "otelcol-adaptive"
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, groupbytrace, tail_sampling, batch]
exporters: [jaeger]
telemetry:
logs:
level: "info"
metrics:
level: "detailed"
address: "0.0.0.0:8888"
Create systemd service file
Configure the OpenTelemetry Collector as a systemd service with proper security settings.
[Unit]
Description=OpenTelemetry Collector
After=network.target
Wants=network-online.target
[Service]
Type=simple
User=otel
Group=otel
ExecStart=/usr/local/bin/otelcol --config=/etc/otelcol/otelcol-adaptive.yaml
Restart=always
RestartSec=10
Environment=OTEL_LOG_LEVEL=info
WorkingDirectory=/var/lib/otelcol
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/var/log/otelcol /var/lib/otelcol
NoNewPrivileges=true
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
Configure log rotation
Set up log rotation to prevent collector logs from consuming too much disk space.
/var/log/otelcol/*.log {
daily
missingok
rotate 30
compress
delaycompress
notifempty
create 644 otel otel
postrotate
systemctl reload otelcol
endscript
}
Start and enable the service
Enable and start the OpenTelemetry Collector service with your chosen sampling configuration.
sudo systemctl daemon-reload
sudo systemctl enable otelcol
sudo systemctl start otelcol
sudo systemctl status otelcol
Configure application instrumentation
Update your application to send traces to the OpenTelemetry Collector with appropriate service name and attributes.
export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4318"
export OTEL_EXPORTER_OTLP_PROTOCOL="http/protobuf"
export OTEL_SERVICE_NAME="your-service-name"
export OTEL_RESOURCE_ATTRIBUTES="service.version=1.0.0,deployment.environment=production"
Set up monitoring and alerting
Configure Prometheus to scrape collector metrics for monitoring sampling effectiveness.
- job_name: 'otelcol'
static_configs:
- targets: ['localhost:8888']
scrape_interval: 30s
metrics_path: /metrics
Understanding OpenTelemetry sampling concepts
OpenTelemetry supports multiple sampling strategies, each with specific use cases. Probabilistic sampling uses a fixed percentage across all traces, while deterministic sampling makes decisions based on trace IDs. Tail-based sampling analyzes complete traces before deciding whether to keep them, allowing for more intelligent decisions based on errors, latency, or custom attributes.
The sampling decision propagates through your distributed system via trace context headers. When a service makes a sampling decision, downstream services inherit that decision, ensuring complete traces are either fully sampled or fully dropped. This prevents incomplete traces that would be difficult to analyze.
| Sampling Type | Use Case | Performance Impact | Decision Point |
|---|---|---|---|
| Probabilistic | Consistent percentage across all services | Low CPU, immediate decision | At trace start |
| Tail-based | Sample based on trace characteristics | Higher memory usage | After trace completion |
| Adaptive | Dynamic adjustment based on traffic | Moderate CPU overhead | Real-time adjustment |
Monitor and optimize sampling performance
Use the collector's built-in metrics to monitor sampling effectiveness. Key metrics include otelcol_processor_sampled_spans_total, otelcol_processor_dropped_spans_total, and otelcol_processor_tail_sampling_policy_decision_total. These help you understand how much data you're reducing and which policies are most active.
Monitor memory usage closely when using tail-based sampling, as it requires buffering traces until sampling decisions are made. Adjust num_traces and decision_wait parameters based on your traffic patterns and available resources. For applications with high trace volume, consider implementing multiple collector instances with load balancing.
decision_wait parameter is longer than your longest expected trace duration.Verify your setup
sudo systemctl status otelcol
curl -s http://localhost:8888/metrics | grep sampling
journalctl -u otelcol -f --lines=20
Check that your application is sending traces and that sampling decisions are being logged:
tail -f /var/log/otelcol/collector.log | grep -E "(sampled|dropped)"
curl -s http://localhost:8888/metrics | grep -E "(sampled_spans|dropped_spans)"
Common issues
| Symptom | Cause | Fix |
|---|---|---|
| High memory usage | Too many traces buffered for tail sampling | Reduce num_traces or decision_wait parameters |
| Incomplete traces in backend | decision_wait too short for long traces | Increase decision_wait to match your longest trace duration |
| No sampling decisions logged | Wrong processor order in pipeline | Ensure sampling processors come before batch processor |
| Service won't start | Configuration syntax error | Run otelcol --config=/path/to/config --dry-run to validate |
| Traces not being forwarded | Exporter connectivity issues | Check network connectivity and exporter endpoint configuration |
Next steps
- Configure Jaeger data retention policies and automated archiving with Elasticsearch backend
- Install and configure Jaeger for distributed tracing with Elasticsearch backend
- Configure OpenTelemetry custom metrics for application monitoring
- Implement OpenTelemetry distributed context propagation across microservices
Automated install script
Run this to automate the entire setup
#!/usr/bin/env bash
set -euo pipefail
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color
# Default configuration
OTEL_VERSION="0.91.0"
SAMPLING_TYPE="probabilistic"
SAMPLING_PERCENTAGE="10.0"
# Usage function
usage() {
echo "Usage: $0 [OPTIONS]"
echo "Options:"
echo " -t, --type Sampling type: probabilistic or tail (default: probabilistic)"
echo " -p, --percentage Sampling percentage for probabilistic sampling (default: 10.0)"
echo " -v, --version OpenTelemetry Collector version (default: 0.91.0)"
echo " -h, --help Show this help message"
exit 1
}
# Parse command line arguments
while [[ $# -gt 0 ]]; do
case $1 in
-t|--type)
SAMPLING_TYPE="$2"
shift 2
;;
-p|--percentage)
SAMPLING_PERCENTAGE="$2"
shift 2
;;
-v|--version)
OTEL_VERSION="$2"
shift 2
;;
-h|--help)
usage
;;
*)
echo -e "${RED}Unknown option: $1${NC}"
usage
;;
esac
done
# Validate sampling type
if [[ "$SAMPLING_TYPE" != "probabilistic" && "$SAMPLING_TYPE" != "tail" ]]; then
echo -e "${RED}Error: Sampling type must be 'probabilistic' or 'tail'${NC}"
exit 1
fi
# Error handling
cleanup() {
echo -e "${RED}Installation failed. Cleaning up...${NC}"
sudo rm -f /tmp/otelcol_${OTEL_VERSION}_linux_amd64.tar.gz
sudo systemctl stop otelcol 2>/dev/null || true
sudo systemctl disable otelcol 2>/dev/null || true
}
trap cleanup ERR
# Check if running as root or with sudo
if [[ $EUID -eq 0 ]]; then
echo -e "${YELLOW}Warning: Running as root${NC}"
elif ! sudo -n true 2>/dev/null; then
echo -e "${RED}Error: This script requires sudo privileges${NC}"
exit 1
fi
# Detect distribution
echo -e "${BLUE}[1/8] Detecting distribution...${NC}"
if [ -f /etc/os-release ]; then
. /etc/os-release
case "$ID" in
ubuntu|debian)
PKG_MGR="apt"
PKG_INSTALL="apt install -y"
PKG_UPDATE="apt update"
;;
almalinux|rocky|centos|rhel|ol|fedora)
PKG_MGR="dnf"
PKG_INSTALL="dnf install -y"
PKG_UPDATE="dnf check-update || true"
;;
amzn)
PKG_MGR="yum"
PKG_INSTALL="yum install -y"
PKG_UPDATE="yum check-update || true"
;;
*)
echo -e "${RED}Unsupported distribution: $ID${NC}"
exit 1
;;
esac
echo -e "${GREEN}Detected: $PRETTY_NAME${NC}"
else
echo -e "${RED}Error: Cannot detect distribution${NC}"
exit 1
fi
# Update package manager
echo -e "${BLUE}[2/8] Updating package manager...${NC}"
sudo $PKG_UPDATE
# Install required packages
echo -e "${BLUE}[3/8] Installing required packages...${NC}"
sudo $PKG_INSTALL curl tar
# Download and install OpenTelemetry Collector
echo -e "${BLUE}[4/8] Downloading OpenTelemetry Collector v${OTEL_VERSION}...${NC}"
cd /tmp
curl -LO "https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v${OTEL_VERSION}/otelcol_${OTEL_VERSION}_linux_amd64.tar.gz"
echo -e "${BLUE}[5/8] Installing OpenTelemetry Collector...${NC}"
tar -xzf "otelcol_${OTEL_VERSION}_linux_amd64.tar.gz"
sudo mv otelcol /usr/local/bin/
sudo chmod 755 /usr/local/bin/otelcol
sudo chown root:root /usr/local/bin/otelcol
# Create OpenTelemetry user and directories
echo -e "${BLUE}[6/8] Creating OpenTelemetry user and directories...${NC}"
sudo useradd --system --shell /bin/false --home-dir /var/lib/otelcol otel 2>/dev/null || echo "User otel already exists"
sudo mkdir -p /etc/otelcol /var/log/otelcol /var/lib/otelcol
sudo chown -R otel:otel /etc/otelcol /var/log/otelcol /var/lib/otelcol
sudo chmod 755 /etc/otelcol /var/log/otelcol /var/lib/otelcol
# Create configuration file based on sampling type
echo -e "${BLUE}[7/8] Creating OpenTelemetry configuration...${NC}"
if [[ "$SAMPLING_TYPE" == "probabilistic" ]]; then
sudo tee /etc/otelcol/config.yaml > /dev/null <<EOF
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
jaeger:
protocols:
grpc:
endpoint: 0.0.0.0:14250
thrift_http:
endpoint: 0.0.0.0:14268
processors:
probabilistic_sampler:
sampling_percentage: ${SAMPLING_PERCENTAGE}
batch:
timeout: 1s
send_batch_size: 1024
send_batch_max_size: 2048
memory_limiter:
limit_mib: 512
spike_limit_mib: 128
check_interval: 5s
exporters:
logging:
loglevel: info
service:
pipelines:
traces:
receivers: [otlp, jaeger]
processors: [memory_limiter, probabilistic_sampler, batch]
exporters: [logging]
telemetry:
logs:
level: "info"
development: false
output_paths:
- "/var/log/otelcol/collector.log"
error_output_paths:
- "/var/log/otelcol/collector-error.log"
EOF
else
sudo tee /etc/otelcol/config.yaml > /dev/null <<EOF
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
tail_sampling:
decision_wait: 30s
num_traces: 100000
expected_new_traces_per_sec: 1000
policies:
- name: errors_policy
type: status_code
status_code:
status_codes: [ERROR]
- name: latency_policy
type: latency
latency:
threshold_ms: 5000
- name: probabilistic_policy
type: probabilistic
probabilistic:
sampling_percentage: ${SAMPLING_PERCENTAGE}
- name: rate_limiting_policy
type: rate_limiting
rate_limiting:
spans_per_second: 100
batch:
timeout: 1s
send_batch_size: 1024
send_batch_max_size: 2048
memory_limiter:
limit_mib: 1024
spike_limit_mib: 256
check_interval: 5s
exporters:
logging:
loglevel: info
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, tail_sampling, batch]
exporters: [logging]
telemetry:
logs:
level: "info"
development: false
output_paths:
- "/var/log/otelcol/collector.log"
error_output_paths:
- "/var/log/otelcol/collector-error.log"
EOF
fi
sudo chown otel:otel /etc/otelcol/config.yaml
sudo chmod 644 /etc/otelcol/config.yaml
# Create systemd service
sudo tee /etc/systemd/system/otelcol.service > /dev/null <<EOF
[Unit]
Description=OpenTelemetry Collector
After=network.target
[Service]
Type=simple
User=otel
Group=otel
ExecStart=/usr/local/bin/otelcol --config=/etc/otelcol/config.yaml
Restart=on-failure
RestartSec=5
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target
EOF
# Configure firewall based on distribution
if command -v ufw >/dev/null 2>&1; then
sudo ufw allow 4317/tcp comment "OpenTelemetry GRPC"
sudo ufw allow 4318/tcp comment "OpenTelemetry HTTP"
elif command -v firewall-cmd >/dev/null 2>&1; then
sudo firewall-cmd --permanent --add-port=4317/tcp
sudo firewall-cmd --permanent --add-port=4318/tcp
sudo firewall-cmd --reload
fi
# Start and enable the service
echo -e "${BLUE}[8/8] Starting OpenTelemetry Collector service...${NC}"
sudo systemctl daemon-reload
sudo systemctl enable otelcol
sudo systemctl start otelcol
# Cleanup
sudo rm -f /tmp/otelcol_${OTEL_VERSION}_linux_amd64.tar.gz
# Verification
echo -e "${BLUE}Verifying installation...${NC}"
sleep 3
if sudo systemctl is-active --quiet otelcol; then
echo -e "${GREEN}✓ OpenTelemetry Collector is running${NC}"
else
echo -e "${RED}✗ OpenTelemetry Collector failed to start${NC}"
sudo systemctl status otelcol
exit 1
fi
if netstat -tuln 2>/dev/null | grep -q ":4317\|:4318" || ss -tuln 2>/dev/null | grep -q ":4317\|:4318"; then
echo -e "${GREEN}✓ OpenTelemetry Collector is listening on configured ports${NC}"
else
echo -e "${YELLOW}⚠ Warning: Cannot verify port status${NC}"
fi
echo -e "${GREEN}OpenTelemetry Collector installation completed successfully!${NC}"
echo -e "${BLUE}Configuration:${NC}"
echo " - Type: $SAMPLING_TYPE sampling"
echo " - Percentage: $SAMPLING_PERCENTAGE%"
echo " - Config file: /etc/otelcol/config.yaml"
echo " - Log files: /var/log/otelcol/"
echo " - Service: systemctl status otelcol"
Review the script before running. Execute with: bash install.sh