Set up OpenTelemetry distributed tracing with Jaeger

Configure end-to-end distributed tracing across Node.js and Python microservices using OpenTelemetry instrumentation and Jaeger backend with Elasticsearch for centralized trace collection and analysis.

Prerequisites

Root or sudo access
4GB+ RAM available
Docker support
Node.js and Python development knowledge

What this solves

Distributed tracing helps you monitor request flows across multiple microservices, identify performance bottlenecks, and debug complex service interactions. This tutorial sets up OpenTelemetry instrumentation for Node.js and Python applications with Jaeger as the tracing backend, enabling you to visualize request paths, measure latencies, and troubleshoot issues in your microservice architecture.

Step-by-step installation

Update system packages

Start by updating your package manager to ensure you get the latest versions of dependencies.

sudo apt update && sudo apt upgrade -y

sudo dnf update -y

Install Docker and Docker Compose

Install Docker to run Jaeger and Elasticsearch containers for the tracing backend infrastructure.

sudo apt install -y apt-transport-https ca-certificates curl gnupg lsb-release
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin

sudo dnf install -y yum-utils
sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
sudo dnf install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin

Start Docker service

Enable and start the Docker service, then add your user to the docker group.

sudo systemctl enable --now docker
sudo usermod -aG docker $USER
newgrp docker

Create Jaeger with Elasticsearch configuration

Set up a Docker Compose file to run Jaeger with Elasticsearch as the storage backend for better scalability and data retention.

version: '3.8'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
    container_name: jaeger-elasticsearch
    environment:
      - discovery.type=single-node
      - xpack.security.enabled=false
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ports:
      - "9200:9200"
    volumes:
      - elasticsearch_data:/usr/share/elasticsearch/data
    networks:
      - jaeger-net

  jaeger-collector:
    image: jaegertracing/jaeger-collector:1.51
    container_name: jaeger-collector
    environment:
      - SPAN_STORAGE_TYPE=elasticsearch
      - ES_SERVER_URLS=http://elasticsearch:9200
      - ES_NUM_SHARDS=1
      - ES_NUM_REPLICAS=0
    ports:
      - "14269:14269"
      - "14268:14268"
      - "14250:14250"
      - "4317:4317"
      - "4318:4318"
    depends_on:
      - elasticsearch
    networks:
      - jaeger-net

  jaeger-query:
    image: jaegertracing/jaeger-query:1.51
    container_name: jaeger-query
    environment:
      - SPAN_STORAGE_TYPE=elasticsearch
      - ES_SERVER_URLS=http://elasticsearch:9200
    ports:
      - "16686:16686"
      - "16687:16687"
    depends_on:
      - elasticsearch
    networks:
      - jaeger-net

  jaeger-agent:
    image: jaegertracing/jaeger-agent:1.51
    container_name: jaeger-agent
    command: [
      "--reporter.grpc.host-port=jaeger-collector:14250"
    ]
    ports:
      - "5775:5775/udp"
      - "6831:6831/udp"
      - "6832:6832/udp"
      - "5778:5778"
    depends_on:
      - jaeger-collector
    networks:
      - jaeger-net

volumes:
  elasticsearch_data:

networks:
  jaeger-net:
    driver: bridge

Create the tracing directory and start services

Create the directory structure and start the Jaeger infrastructure with Elasticsearch backend.

sudo mkdir -p /opt/tracing
sudo chown $USER:$USER /opt/tracing
cd /opt/tracing
docker compose up -d

Install Node.js and npm

Install Node.js runtime and npm package manager for the Node.js microservice development.

curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt install -y nodejs

curl -fsSL https://rpm.nodesource.com/setup_20.x | sudo bash -
sudo dnf install -y nodejs npm

Install Python and pip

Install Python runtime and pip package manager for the Python microservice development.

sudo apt install -y python3 python3-pip python3-venv

sudo dnf install -y python3 python3-pip python3-virtualenv

Create Node.js microservice with OpenTelemetry

Set up a sample Node.js Express application with OpenTelemetry auto-instrumentation for distributed tracing.

mkdir -p ~/microservices/nodejs-service
cd ~/microservices/nodejs-service
npm init -y

Install Node.js OpenTelemetry dependencies

Install the required OpenTelemetry packages for automatic instrumentation and OTLP export.

npm install express axios
npm install @opentelemetry/api @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node @opentelemetry/exporter-jaeger @opentelemetry/exporter-otlp-http

Create OpenTelemetry configuration for Node.js

Configure OpenTelemetry initialization with automatic instrumentation and Jaeger exporter.

const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { JaegerExporter } = require('@opentelemetry/exporter-jaeger');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-otlp-http');
const { Resource } = require('@opentelemetry/resources');
const { SemanticResourceAttributes } = require('@opentelemetry/semantic-conventions');

// Configure OTLP exporter for Jaeger
const traceExporter = new OTLPTraceExporter({
  url: 'http://localhost:4318/v1/traces',
});

// Initialize the SDK
const sdk = new NodeSDK({
  resource: new Resource({
    [SemanticResourceAttributes.SERVICE_NAME]: 'nodejs-microservice',
    [SemanticResourceAttributes.SERVICE_VERSION]: '1.0.0',
  }),
  traceExporter: traceExporter,
  instrumentations: [getNodeAutoInstrumentations()],
});

// Start the SDK
sdk.start();

console.log('OpenTelemetry started successfully');

// Graceful shutdown
process.on('SIGTERM', () => {
  sdk.shutdown()
    .then(() => console.log('Tracing terminated'))
    .catch((error) => console.log('Error terminating tracing', error))
    .finally(() => process.exit(0));
});

Create Node.js application

Build an Express application that demonstrates distributed tracing with HTTP requests to other services.

require('./tracing'); // Initialize tracing first

const express = require('express');
const axios = require('axios');
const { trace } = require('@opentelemetry/api');

const app = express();
const PORT = process.env.PORT || 3000;

// Get tracer
const tracer = trace.getTracer('nodejs-microservice');

app.use(express.json());

// Health check endpoint
app.get('/health', (req, res) => {
  res.json({ status: 'healthy', service: 'nodejs-microservice' });
});

// Main endpoint that calls Python service
app.get('/process/:id', async (req, res) => {
  const span = tracer.startSpan('process-request');
  
  try {
    const { id } = req.params;
    span.setAttributes({
      'request.id': id,
      'service.operation': 'process-request'
    });

    // Simulate some processing
    await new Promise(resolve => setTimeout(resolve, Math.random() * 100));

    // Call Python microservice
    const pythonResponse = await axios.get(http://localhost:8000/analyze/${id}, {
      headers: {
        'x-trace-id': span.spanContext().traceId
      }
    });

    const result = {
      id: id,
      timestamp: new Date().toISOString(),
      nodeData: {
        processed: true,
        processingTime: Math.random() * 100
      },
      pythonData: pythonResponse.data
    };

    span.setAttributes({
      'response.status': 'success',
      'response.size': JSON.stringify(result).length
    });

    res.json(result);
  } catch (error) {
    span.recordException(error);
    span.setStatus({ code: 2, message: error.message });
    
    res.status(500).json({ 
      error: 'Processing failed', 
      message: error.message 
    });
  } finally {
    span.end();
  }
});

// Batch processing endpoint
app.post('/batch', async (req, res) => {
  const span = tracer.startSpan('batch-process');
  
  try {
    const { items } = req.body;
    span.setAttributes({
      'batch.size': items.length,
      'service.operation': 'batch-process'
    });

    const results = [];
    for (const item of items) {
      const childSpan = tracer.startSpan('process-item', { parent: span });
      childSpan.setAttributes({ 'item.id': item.id });
      
      // Simulate processing
      await new Promise(resolve => setTimeout(resolve, Math.random() * 50));
      
      results.push({
        id: item.id,
        processed: true,
        timestamp: new Date().toISOString()
      });
      
      childSpan.end();
    }

    res.json({ results, total: results.length });
  } catch (error) {
    span.recordException(error);
    span.setStatus({ code: 2, message: error.message });
    res.status(500).json({ error: 'Batch processing failed' });
  } finally {
    span.end();
  }
});

app.listen(PORT, () => {
  console.log(Node.js microservice running on port ${PORT});
});

Create Python virtual environment

Set up a Python virtual environment for the Python microservice with isolated dependencies.

mkdir -p ~/microservices/python-service
cd ~/microservices/python-service
python3 -m venv venv
source venv/bin/activate

Install Python OpenTelemetry dependencies

Install the required OpenTelemetry packages for Python along with FastAPI for the web framework.

pip install fastapi uvicorn requests
pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-jaeger opentelemetry-exporter-otlp
pip install opentelemetry-instrumentation-fastapi opentelemetry-instrumentation-requests opentelemetry-instrumentation-logging

Create Python OpenTelemetry configuration

Configure OpenTelemetry initialization for the Python application with automatic instrumentation.

from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.resources import Resource
from opentelemetry.semconv.resource import ResourceAttributes
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.instrumentation.requests import RequestsInstrumentor
from opentelemetry.instrumentation.logging import LoggingInstrumentor
import logging

def init_tracing():
    # Create resource
    resource = Resource(attributes={
        ResourceAttributes.SERVICE_NAME: "python-microservice",
        ResourceAttributes.SERVICE_VERSION: "1.0.0",
    })
    
    # Set tracer provider
    trace.set_tracer_provider(TracerProvider(resource=resource))
    
    # Create OTLP exporter
    otlp_exporter = OTLPSpanExporter(
        endpoint="http://localhost:4318/v1/traces",
    )
    
    # Create span processor
    span_processor = BatchSpanProcessor(otlp_exporter)
    
    # Add span processor to tracer provider
    trace.get_tracer_provider().add_span_processor(span_processor)
    
    # Auto-instrument libraries
    RequestsInstrumentor().instrument()
    LoggingInstrumentor().instrument(set_logging_format=True)
    
    # Configure logging
    logging.basicConfig(level=logging.INFO)
    
    print("OpenTelemetry initialized for Python service")

def get_tracer():
    return trace.get_tracer(__name__)

Create Python FastAPI application

Build a FastAPI application with OpenTelemetry instrumentation that processes requests and demonstrates trace correlation.

from fastapi import FastAPI, HTTPException, Request
from pydantic import BaseModel
import asyncio
import random
import time
import logging
from typing import List, Optional
from datetime import datetime

Import tracing configuration
from tracing import init_tracing, get_tracer
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry import trace
from opentelemetry.trace import Status, StatusCode

Initialize tracing
init_tracing()

Create FastAPI app
app = FastAPI(title="Python Microservice", version="1.0.0")

Instrument FastAPI
FastAPIInstrumentor.instrument_app(app)

Get tracer
tracer = get_tracer()

logger = logging.getLogger(__name__)

class AnalysisResult(BaseModel):
    id: str
    analysis_type: str
    score: float
    metadata: dict
    processing_time_ms: float
    timestamp: str

@app.get("/health")
async def health_check():
    return {"status": "healthy", "service": "python-microservice"}

@app.get("/analyze/{item_id}", response_model=AnalysisResult)
async def analyze_item(item_id: str, request: Request):
    with tracer.start_as_current_span("analyze-item") as span:
        try:
            start_time = time.time()
            
            # Extract trace context from headers if available
            trace_id = request.headers.get('x-trace-id')
            if trace_id:
                span.set_attribute("parent.trace_id", trace_id)
            
            span.set_attributes({
                "item.id": item_id,
                "service.operation": "analyze-item",
                "analysis.type": "sentiment"
            })
            
            logger.info(f"Starting analysis for item {item_id}")
            
            # Simulate different types of analysis
            analysis_types = ["sentiment", "classification", "similarity", "anomaly"]
            analysis_type = random.choice(analysis_types)
            
            # Simulate processing time
            processing_delay = random.uniform(0.1, 0.5)
            await asyncio.sleep(processing_delay)
            
            # Simulate analysis score
            score = random.uniform(0.1, 0.99)
            
            # Create child span for detailed analysis
            with tracer.start_as_current_span("detailed-analysis") as detail_span:
                detail_span.set_attributes({
                    "analysis.algorithm": f"{analysis_type}-v2",
                    "analysis.score": score,
                    "analysis.confidence": random.uniform(0.7, 0.95)
                })
                
                # Simulate detailed processing
                await asyncio.sleep(random.uniform(0.05, 0.15))
            
            end_time = time.time()
            processing_time = (end_time - start_time) * 1000
            
            result = AnalysisResult(
                id=item_id,
                analysis_type=analysis_type,
                score=score,
                metadata={
                    "algorithm_version": "2.1",
                    "confidence": random.uniform(0.7, 0.95),
                    "features_extracted": random.randint(10, 50)
                },
                processing_time_ms=processing_time,
                timestamp=datetime.utcnow().isoformat()
            )
            
            span.set_attributes({
                "response.status": "success",
                "processing.time_ms": processing_time,
                "analysis.score": score
            })
            
            span.set_status(Status(StatusCode.OK))
            logger.info(f"Completed analysis for item {item_id}")
            
            return result
            
        except Exception as e:
            span.record_exception(e)
            span.set_status(Status(StatusCode.ERROR, str(e)))
            logger.error(f"Error analyzing item {item_id}: {str(e)}")
            raise HTTPException(status_code=500, detail=f"Analysis failed: {str(e)}")

@app.post("/batch-analyze")
async def batch_analyze(items: List[str]):
    with tracer.start_as_current_span("batch-analyze") as span:
        try:
            span.set_attributes({
                "batch.size": len(items),
                "service.operation": "batch-analyze"
            })
            
            results = []
            
            for item_id in items:
                with tracer.start_as_current_span("batch-item-analysis") as item_span:
                    item_span.set_attribute("item.id", item_id)
                    
                    # Simulate processing
                    await asyncio.sleep(random.uniform(0.05, 0.2))
                    
                    result = {
                        "id": item_id,
                        "status": "processed",
                        "score": random.uniform(0.1, 0.99),
                        "timestamp": datetime.utcnow().isoformat()
                    }
                    
                    results.append(result)
            
            span.set_attributes({
                "response.status": "success",
                "batch.processed_count": len(results)
            })
            
            return {
                "results": results,
                "total_processed": len(results),
                "processing_timestamp": datetime.utcnow().isoformat()
            }
            
        except Exception as e:
            span.record_exception(e)
            span.set_status(Status(StatusCode.ERROR, str(e)))
            raise HTTPException(status_code=500, detail=f"Batch analysis failed: {str(e)}")

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

Create service startup scripts

Create convenience scripts to start both microservices with proper environment configuration.

#!/bin/bash

Start Python service
echo "Starting Python microservice..."
cd ~/microservices/python-service
source venv/bin/activate
uvicorn app:app --host 0.0.0.0 --port 8000 --reload &
PYTHON_PID=$!

Wait a moment for Python service to start
sleep 3

Start Node.js service
echo "Starting Node.js microservice..."
cd ~/microservices/nodejs-service
node app.js &
NODE_PID=$!

echo "Services started:"
echo "Python service PID: $PYTHON_PID (http://localhost:8000)"
echo "Node.js service PID: $NODE_PID (http://localhost:3000)"
echo "Jaeger UI: http://localhost:16686"
echo ""
echo "To stop services, run: kill $PYTHON_PID $NODE_PID"

Wait for user input to stop services
read -p "Press Enter to stop all services..."

echo "Stopping services..."
kill $PYTHON_PID $NODE_PID
echo "Services stopped."

Make script executable and test services

Make the startup script executable and create a test script to generate sample traces.

chmod +x ~/microservices/start-services.sh

#!/bin/bash

echo "Generating test traces..."

Test individual endpoints
echo "Testing Node.js health endpoint..."
curl -s http://localhost:3000/health | jq .

echo "Testing Python health endpoint..."
curl -s http://localhost:8000/health | jq .

Test distributed trace
echo "Testing distributed trace (Node.js -> Python)..."
curl -s http://localhost:3000/process/test-123 | jq .

Test batch processing
echo "Testing batch processing..."
curl -s -X POST http://localhost:3000/batch \
  -H "Content-Type: application/json" \
  -d '{"items": [{"id": "batch-1"}, {"id": "batch-2"}, {"id": "batch-3"}]}' | jq .

Test Python direct analysis
echo "Testing Python analysis endpoint..."
curl -s http://localhost:8000/analyze/direct-test | jq .

echo "Test traces generated. Check Jaeger UI at http://localhost:16686"

chmod +x ~/microservices/test-traces.sh

Configure distributed context propagation

Configure trace context propagation

Ensure that trace context is properly propagated between services by configuring HTTP headers and correlation IDs. Update the existing configuration to handle W3C Trace Context standards and related best practices as shown in our OpenTelemetry distributed context propagation guide.

const { trace, propagation, context } = require('@opentelemetry/api');

// Middleware to extract and inject trace context
const traceContextMiddleware = (req, res, next) => {
  // Extract trace context from incoming headers
  const parentContext = propagation.extract(context.active(), req.headers);
  
  // Continue processing in the extracted context
  context.with(parentContext, () => {
    const span = trace.getActiveSpan();
    if (span) {
      // Add common attributes
      span.setAttributes({
        'http.method': req.method,
        'http.url': req.url,
        'http.user_agent': req.get('User-Agent') || 'unknown',
        'service.name': 'nodejs-microservice'
      });
    }
    
    // Inject trace context into response headers for downstream services
    const activeContext = context.active();
    const carrier = {};
    propagation.inject(activeContext, carrier);
    
    // Set response headers for trace context
    Object.keys(carrier).forEach(key => {
      res.set(key, carrier[key]);
    });
    
    next();
  });
};

module.exports = { traceContextMiddleware };

Update Node.js application with context propagation

Modify the Node.js application to use the trace context middleware and properly propagate context to downstream services.

require('./tracing'); // Initialize tracing first

const express = require('express');
const axios = require('axios');
const { trace, propagation, context } = require('@opentelemetry/api');
const { traceContextMiddleware } = require('./middleware');

const app = express();
const PORT = process.env.PORT || 3000;

// Get tracer
const tracer = trace.getTracer('nodejs-microservice');

app.use(express.json());
app.use(traceContextMiddleware);

// Enhanced process endpoint with better context propagation
app.get('/process/:id', async (req, res) => {
  const span = tracer.startSpan('process-request');
  
  try {
    const { id } = req.params;
    span.setAttributes({
      'request.id': id,
      'service.operation': 'process-request',
      'http.method': req.method,
      'http.route': '/process/:id'
    });

    // Simulate some processing
    await new Promise(resolve => setTimeout(resolve, Math.random() * 100));

    // Prepare headers with trace context for downstream call
    const headers = {};
    propagation.inject(context.active(), headers);
    
    // Call Python microservice with proper context propagation
    const pythonResponse = await axios.get(http://localhost:8000/analyze/${id}, {
      headers: {
        ...headers,
        'Content-Type': 'application/json'
      }
    });

    const result = {
      id: id,
      timestamp: new Date().toISOString(),
      traceId: span.spanContext().traceId,
      nodeData: {
        processed: true,
        processingTime: Math.random() * 100
      },
      pythonData: pythonResponse.data
    };

    span.setAttributes({
      'response.status': 'success',
      'response.size': JSON.stringify(result).length,
      'downstream.service': 'python-microservice'
    });

    res.json(result);
  } catch (error) {
    span.recordException(error);
    span.setStatus({ code: 2, message: error.message });
    
    res.status(500).json({ 
      error: 'Processing failed', 
      message: error.message,
      traceId: span.spanContext().traceId
    });
  } finally {
    span.end();
  }
});

// Health endpoint
app.get('/health', (req, res) => {
  res.json({ status: 'healthy', service: 'nodejs-microservice' });
});

// Other endpoints remain the same...

app.listen(PORT, () => {
  console.log(Node.js microservice running on port ${PORT});
});

Configure sampling strategies

Set up intelligent sampling to manage trace volume in production environments, as detailed in our OpenTelemetry sampling strategies tutorial.

{
  "default_strategy": {
    "type": "probabilistic",
    "param": 0.1
  },
  "per_service_strategies": [
    {
      "service": "nodejs-microservice",
      "type": "probabilistic",
      "param": 0.2
    },
    {
      "service": "python-microservice",
      "type": "probabilistic",
      "param": 0.15
    }
  ],
  "per_operation_strategies": [
    {
      "service": "nodejs-microservice",
      "operation": "process-request",
      "type": "probabilistic",
      "param": 0.5
    },
    {
      "service": "python-microservice",
      "operation": "analyze-item",
      "type": "probabilistic",
      "param": 0.3
    }
  ]
}

Verify your setup

Test the distributed tracing setup by running the services and generating sample traces.

# Check if Jaeger services are running
docker ps | grep jaeger

Verify Elasticsearch is healthy
curl -s http://localhost:9200/_cluster/health | jq .

Check Jaeger UI accessibility
curl -s http://localhost:16686/api/services

Start the microservices
~/microservices/start-services.sh

Open a new terminal and generate test traces:

# Generate sample traces
~/microservices/test-traces.sh

Check trace data in Jaeger
echo "Open http://localhost:16686 in your browser to view traces"

Note: It may take a few minutes for traces to appear in the Jaeger UI after generating them. The Elasticsearch backend processes traces asynchronously.

Common issues

Symptom	Cause	Fix
No traces in Jaeger UI	OTLP endpoint not accessible	Check `docker ps` and verify port 4318 is accessible
Services can't connect to Jaeger	Firewall blocking ports	Open ports 4317, 4318, 14268, 16686 in firewall
Elasticsearch connection failed	Memory limits too low	Increase Docker memory limits or adjust ES_JAVA_OPTS
Trace context not propagated	Missing propagation headers	Verify W3C Trace Context headers are being sent
High trace volume impacts performance	Sampling rate too high	Adjust sampling configuration to reduce trace volume
Python service import errors	Virtual environment not activated	`source venv/bin/activate` before running

Next steps

Automated install script

Run this to automate the entire setup

install.sh

#!/usr/bin/env bash

set -euo pipefail

# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color

# Default values
INSTALL_DIR="/opt/tracing"
SERVICE_USER="tracing"

# Usage function
usage() {
    echo "Usage: $0 [OPTIONS]"
    echo "Options:"
    echo "  -d, --dir DIR       Installation directory (default: /opt/tracing)"
    echo "  -h, --help         Show this help message"
    exit 1
}

# Parse arguments
while [[ $# -gt 0 ]]; do
    case $1 in
        -d|--dir)
            INSTALL_DIR="$2"
            shift 2
            ;;
        -h|--help)
            usage
            ;;
        *)
            echo -e "${RED}Unknown option: $1${NC}"
            usage
            ;;
    esac
done

# Log function
log() {
    echo -e "${GREEN}[INFO]${NC} $1"
}

warn() {
    echo -e "${YELLOW}[WARN]${NC} $1"
}

error() {
    echo -e "${RED}[ERROR]${NC} $1"
}

# Cleanup function for rollback
cleanup() {
    error "Installation failed. Rolling back changes..."
    if systemctl is-active --quiet docker 2>/dev/null; then
        cd "$INSTALL_DIR" 2>/dev/null && docker compose down 2>/dev/null || true
    fi
    if id "$SERVICE_USER" &>/dev/null; then
        userdel -r "$SERVICE_USER" 2>/dev/null || true
    fi
    rm -rf "$INSTALL_DIR" 2>/dev/null || true
    exit 1
}

trap cleanup ERR

# Check if running as root or with sudo
if [[ $EUID -ne 0 ]]; then
    error "This script must be run as root or with sudo"
    exit 1
fi

# Auto-detect distribution
if [ -f /etc/os-release ]; then
    . /etc/os-release
    case "$ID" in
        ubuntu|debian)
            PKG_MGR="apt"
            PKG_INSTALL="apt install -y"
            PKG_UPDATE="apt update && apt upgrade -y"
            ;;
        almalinux|rocky|centos|rhel|ol)
            PKG_MGR="dnf"
            PKG_INSTALL="dnf install -y"
            PKG_UPDATE="dnf update -y"
            ;;
        fedora)
            PKG_MGR="dnf"
            PKG_INSTALL="dnf install -y"
            PKG_UPDATE="dnf update -y"
            ;;
        amzn)
            PKG_MGR="yum"
            PKG_INSTALL="yum install -y"
            PKG_UPDATE="yum update -y"
            ;;
        *)
            error "Unsupported distribution: $ID"
            exit 1
            ;;
    esac
else
    error "Cannot detect distribution. /etc/os-release not found."
    exit 1
fi

log "Detected distribution: $ID"

echo -e "${BLUE}[1/8]${NC} Updating system packages..."
$PKG_UPDATE

echo -e "${BLUE}[2/8]${NC} Installing Docker and Docker Compose..."
if [[ "$PKG_MGR" == "apt" ]]; then
    $PKG_INSTALL apt-transport-https ca-certificates curl gnupg lsb-release
    curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
    echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null
    apt update
    $PKG_INSTALL docker-ce docker-ce-cli containerd.io docker-compose-plugin
else
    $PKG_INSTALL yum-utils curl
    yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
    $PKG_INSTALL docker-ce docker-ce-cli containerd.io docker-compose-plugin
fi

echo -e "${BLUE}[3/8]${NC} Starting Docker service..."
systemctl enable --now docker

echo -e "${BLUE}[4/8]${NC} Creating service user and directories..."
useradd -r -s /bin/false "$SERVICE_USER" 2>/dev/null || true
usermod -aG docker "$SERVICE_USER"
mkdir -p "$INSTALL_DIR"
chown "$SERVICE_USER":"$SERVICE_USER" "$INSTALL_DIR"
chmod 755 "$INSTALL_DIR"

echo -e "${BLUE}[5/8]${NC} Creating Docker Compose configuration..."
cat > "$INSTALL_DIR/docker-compose.yml" << 'EOF'
version: '3.8'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
    container_name: jaeger-elasticsearch
    environment:
      - discovery.type=single-node
      - xpack.security.enabled=false
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ports:
      - "9200:9200"
    volumes:
      - elasticsearch_data:/usr/share/elasticsearch/data
    networks:
      - jaeger-net

  jaeger-collector:
    image: jaegertracing/jaeger-collector:1.51
    container_name: jaeger-collector
    environment:
      - SPAN_STORAGE_TYPE=elasticsearch
      - ES_SERVER_URLS=http://elasticsearch:9200
      - ES_NUM_SHARDS=1
      - ES_NUM_REPLICAS=0
    ports:
      - "14269:14269"
      - "14268:14268"
      - "14250:14250"
      - "4317:4317"
      - "4318:4318"
    depends_on:
      - elasticsearch
    networks:
      - jaeger-net

  jaeger-query:
    image: jaegertracing/jaeger-query:1.51
    container_name: jaeger-query
    environment:
      - SPAN_STORAGE_TYPE=elasticsearch
      - ES_SERVER_URLS=http://elasticsearch:9200
    ports:
      - "16686:16686"
      - "16687:16687"
    depends_on:
      - elasticsearch
    networks:
      - jaeger-net

  jaeger-agent:
    image: jaegertracing/jaeger-agent:1.51
    container_name: jaeger-agent
    command: [
      "--reporter.grpc.host-port=jaeger-collector:14250"
    ]
    ports:
      - "5775:5775/udp"
      - "6831:6831/udp"
      - "6832:6832/udp"
      - "5778:5778"
    depends_on:
      - jaeger-collector
    networks:
      - jaeger-net

volumes:
  elasticsearch_data:

networks:
  jaeger-net:
    driver: bridge
EOF

chown "$SERVICE_USER":"$SERVICE_USER" "$INSTALL_DIR/docker-compose.yml"
chmod 644 "$INSTALL_DIR/docker-compose.yml"

echo -e "${BLUE}[6/8]${NC} Installing Node.js and Python..."
if [[ "$PKG_MGR" == "apt" ]]; then
    curl -fsSL https://deb.nodesource.com/setup_20.x | bash -
    $PKG_INSTALL nodejs python3 python3-pip python3-venv
else
    $PKG_INSTALL nodejs npm python3 python3-pip
fi

echo -e "${BLUE}[7/8]${NC} Starting Jaeger services..."
cd "$INSTALL_DIR"
sudo -u "$SERVICE_USER" docker compose up -d

echo -e "${BLUE}[8/8]${NC} Configuring firewall (if active)..."
if systemctl is-active --quiet firewalld 2>/dev/null; then
    firewall-cmd --permanent --add-port=16686/tcp --add-port=4317/tcp --add-port=4318/tcp
    firewall-cmd --reload
    log "Firewall rules added for Jaeger UI (16686) and OTLP endpoints (4317, 4318)"
elif systemctl is-active --quiet ufw 2>/dev/null; then
    ufw allow 16686/tcp
    ufw allow 4317/tcp
    ufw allow 4318/tcp
    log "UFW rules added for Jaeger UI (16686) and OTLP endpoints (4317, 4318)"
fi

echo -e "${BLUE}Verifying installation...${NC}"
sleep 10

# Verify services are running
if ! docker ps | grep -q jaeger-elasticsearch; then
    error "Elasticsearch container is not running"
    exit 1
fi

if ! docker ps | grep -q jaeger-collector; then
    error "Jaeger collector container is not running"
    exit 1
fi

# Test connectivity
if ! curl -s http://localhost:9200/_cluster/health >/dev/null; then
    warn "Elasticsearch may not be ready yet (this is normal on first start)"
fi

if curl -s http://localhost:16686 >/dev/null; then
    log "Jaeger UI is accessible at http://localhost:16686"
else
    warn "Jaeger UI may not be ready yet"
fi

log "Installation completed successfully!"
log "Services installed:"
log "  - Jaeger UI: http://localhost:16686"
log "  - OTLP gRPC endpoint: http://localhost:4317"
log "  - OTLP HTTP endpoint: http://localhost:4318"
log "  - Installation directory: $INSTALL_DIR"

Review the script before running. Execute with: bash install.sh

#distributed tracing #OpenTelemetry #Node.js microservices #Python tracing #Jaeger backend

Set up distributed tracing for Node.js and Python microservices with OpenTelemetry and Jaeger

Prerequisites

What this solves

Step-by-step installation

Update system packages

Install Docker and Docker Compose

Start Docker service

Create Jaeger with Elasticsearch configuration

Create the tracing directory and start services

Install Node.js and npm

Install Python and pip

Create Node.js microservice with OpenTelemetry

Install Node.js OpenTelemetry dependencies

Create OpenTelemetry configuration for Node.js

Create Node.js application

Create Python virtual environment

Install Python OpenTelemetry dependencies

Create Python OpenTelemetry configuration

Create Python FastAPI application

Import tracing configuration

Initialize tracing

Create FastAPI app

Instrument FastAPI

Get tracer

Create service startup scripts

Start Python service

Wait a moment for Python service to start

Start Node.js service

Wait for user input to stop services

Make script executable and test services

Test individual endpoints

Test distributed trace

Test batch processing

Test Python direct analysis

Configure distributed context propagation

Configure trace context propagation

Update Node.js application with context propagation

Configure sampling strategies

Verify your setup

Verify Elasticsearch is healthy

Check Jaeger UI accessibility

Start the microservices

Check trace data in Jaeger

Common issues

Next steps

Related tutorials

Configure advanced network monitoring with SmokePing for detailed latency analysis

Set up PHP application performance monitoring with APM tools and real-time metrics collection

Monitor cron jobs and systemd timers with Prometheus and Grafana alerting

Don't want to manage this yourself?