Configure OpenTelemetry custom metrics for application monitoring with Prometheus and Grafana

Intermediate 45 min Jun 14, 2026 22 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Set up OpenTelemetry SDK to collect custom application metrics, export them to Prometheus for storage, and visualize performance data in Grafana dashboards with automated alerting.

Prerequisites

  • Root or sudo access
  • At least 2GB RAM
  • Python 3.8+ (for examples)
  • Basic understanding of Prometheus and Grafana
  • Network connectivity for package downloads

What this solves

OpenTelemetry custom metrics give you detailed insights into your application's performance beyond basic system metrics. You can track business-specific metrics like user sign-ups, order completion rates, or API response times. This tutorial shows you how to instrument applications with OpenTelemetry, send metrics to Prometheus, and build Grafana dashboards for monitoring and alerting.

Step-by-step installation

Install OpenTelemetry Collector

The OpenTelemetry Collector receives metrics from your applications and forwards them to Prometheus. Download and install the latest collector binary.

wget https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.91.0/otelcol_0.91.0_linux_amd64.tar.gz
tar -xzf otelcol_0.91.0_linux_amd64.tar.gz
sudo mv otelcol /usr/local/bin/
sudo chmod +x /usr/local/bin/otelcol

Create collector configuration

Configure the collector to receive OTLP metrics and export them to Prometheus format. This config enables metric collection on port 4318 and serves Prometheus metrics on port 8889.

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 1s
    send_batch_size: 1024
  memory_limiter:
    limit_mib: 512

exporters:
  prometheus:
    endpoint: "0.0.0.0:8889"
    namespace: "app"
    const_labels:
      environment: "production"

service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [prometheus]
  telemetry:
    logs:
      level: info

Create systemd service for collector

Set up the collector as a systemd service for automatic startup and management.

[Unit]
Description=OpenTelemetry Collector
After=network.target

[Service]
Type=simple
User=nobody
Group=nogroup
ExecStart=/usr/local/bin/otelcol --config=/etc/otelcol-config.yaml
Restart=on-failure
RestartSec=5
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target

Start OpenTelemetry Collector

Enable and start the collector service to begin accepting metrics from your applications.

sudo systemctl daemon-reload
sudo systemctl enable --now otelcol
sudo systemctl status otelcol

Install Prometheus

Install Prometheus to scrape metrics from the OpenTelemetry Collector and store them for querying.

sudo apt update
sudo apt install -y prometheus
sudo dnf install -y epel-release
sudo dnf install -y golang-github-prometheus

Configure Prometheus to scrape OpenTelemetry metrics

Add the OpenTelemetry Collector as a scrape target in Prometheus configuration. This tells Prometheus to collect metrics from the collector's Prometheus endpoint.

global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "/etc/prometheus/rules/*.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - localhost:9093

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'otel-collector'
    static_configs:
      - targets: ['localhost:8889']
    scrape_interval: 10s
    metrics_path: /metrics

  - job_name: 'node-exporter'
    static_configs:
      - targets: ['localhost:9100']

Create Prometheus alerting rules

Set up alerting rules for custom metrics to notify you when application performance degrades.

sudo mkdir -p /etc/prometheus/rules
groups:
  - name: application_metrics
    rules:
      - alert: HighErrorRate
        expr: rate(app_http_requests_total{status=~"5.."}[5m]) > 0.1
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "High error rate detected"
          description: "Error rate is {{ $value }} errors per second"
      
      - alert: SlowResponseTime
        expr: histogram_quantile(0.95, rate(app_http_request_duration_seconds_bucket[5m])) > 1.0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Slow response times detected"
          description: "95th percentile response time is {{ $value }} seconds"
      
      - alert: LowThroughput
        expr: rate(app_http_requests_total[5m]) < 1.0
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Low request throughput"
          description: "Request rate is {{ $value }} requests per second"

Start Prometheus

Enable and start Prometheus to begin collecting metrics from the OpenTelemetry Collector.

sudo systemctl enable --now prometheus
sudo systemctl status prometheus

Install Grafana

Install Grafana to create dashboards and visualizations for your OpenTelemetry metrics.

sudo apt install -y software-properties-common
sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
sudo apt update
sudo apt install -y grafana
sudo tee /etc/yum.repos.d/grafana.repo << 'EOF'
[grafana]
name=grafana
baseurl=https://packages.grafana.com/oss/rpm
repo_gpgcheck=1
enabled=1
gpgcheck=1
gpgkey=https://packages.grafana.com/gpg.key
EOF
sudo dnf install -y grafana

Configure Grafana data source

Add Prometheus as a data source in Grafana to query your OpenTelemetry metrics.

sudo systemctl enable --now grafana-server
sudo systemctl status grafana-server

Access Grafana at http://your-server:3000 with username admin and password admin. Navigate to Configuration > Data Sources and add Prometheus with URL http://localhost:9090.

Install OpenTelemetry SDK in your application

Add OpenTelemetry instrumentation to your application. This example shows Python implementation with custom metrics.

pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp
from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.exporter.otlp.proto.http.metric_exporter import OTLPMetricExporter
import time
import random

Configure OpenTelemetry

metric_exporter = OTLPMetricExporter( endpoint="http://localhost:4318/v1/metrics", headers={} ) metric_reader = PeriodicExportingMetricReader( exporter=metric_exporter, export_interval_millis=5000 ) metrics.set_meter_provider(MeterProvider(metric_readers=[metric_reader])) meter = metrics.get_meter("app_metrics", "1.0.0")

Create custom metrics

request_counter = meter.create_counter( name="http_requests_total", description="Total number of HTTP requests", unit="1" ) response_time_histogram = meter.create_histogram( name="http_request_duration_seconds", description="HTTP request duration in seconds", unit="s" ) active_connections_gauge = meter.create_up_down_counter( name="active_connections", description="Number of active connections", unit="1" )

Example usage

def handle_request(endpoint, status_code): start_time = time.time() # Simulate request processing processing_time = random.uniform(0.1, 2.0) time.sleep(processing_time) # Record metrics request_counter.add(1, {"endpoint": endpoint, "status": str(status_code)}) response_time_histogram.record(processing_time, {"endpoint": endpoint}) return f"Processed {endpoint} in {processing_time:.2f}s"

Simulate application traffic

if __name__ == "__main__": endpoints = ["/api/users", "/api/orders", "/api/products"] for i in range(100): endpoint = random.choice(endpoints) status = random.choices([200, 404, 500], weights=[85, 10, 5])[0] active_connections_gauge.add(1) result = handle_request(endpoint, status) active_connections_gauge.add(-1) print(f"Request {i+1}: {result}") time.sleep(0.1)

Create Grafana dashboard

Import a custom dashboard configuration to visualize your OpenTelemetry metrics with panels for request rates, response times, and error rates.

{
  "dashboard": {
    "id": null,
    "title": "OpenTelemetry Application Metrics",
    "tags": ["opentelemetry", "monitoring"],
    "timezone": "browser",
    "panels": [
      {
        "id": 1,
        "title": "Request Rate",
        "type": "stat",
        "targets": [
          {
            "expr": "rate(app_http_requests_total[5m])",
            "refId": "A"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "unit": "reqps",
            "min": 0
          }
        },
        "gridPos": {"h": 8, "w": 6, "x": 0, "y": 0}
      },
      {
        "id": 2,
        "title": "Error Rate",
        "type": "stat",
        "targets": [
          {
            "expr": "rate(app_http_requests_total{status=~\"5..\"}[5m]) / rate(app_http_requests_total[5m])",
            "refId": "A"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "unit": "percentunit",
            "min": 0,
            "max": 1
          }
        },
        "gridPos": {"h": 8, "w": 6, "x": 6, "y": 0}
      },
      {
        "id": 3,
        "title": "Response Time (95th percentile)",
        "type": "stat",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, rate(app_http_request_duration_seconds_bucket[5m]))",
            "refId": "A"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "unit": "s",
            "min": 0
          }
        },
        "gridPos": {"h": 8, "w": 6, "x": 12, "y": 0}
      },
      {
        "id": 4,
        "title": "Active Connections",
        "type": "stat",
        "targets": [
          {
            "expr": "app_active_connections",
            "refId": "A"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "unit": "short",
            "min": 0
          }
        },
        "gridPos": {"h": 8, "w": 6, "x": 18, "y": 0}
      }
    ],
    "time": {
      "from": "now-1h",
      "to": "now"
    },
    "refresh": "10s"
  }
}

Configure firewall rules

Open the necessary ports for OpenTelemetry Collector, Prometheus, and Grafana to communicate properly.

sudo ufw allow 3000/tcp
sudo ufw allow 9090/tcp
sudo ufw allow 4317/tcp
sudo ufw allow 4318/tcp
sudo ufw allow 8889/tcp
sudo firewall-cmd --permanent --add-port=3000/tcp
sudo firewall-cmd --permanent --add-port=9090/tcp
sudo firewall-cmd --permanent --add-port=4317/tcp
sudo firewall-cmd --permanent --add-port=4318/tcp
sudo firewall-cmd --permanent --add-port=8889/tcp
sudo firewall-cmd --reload

Configure custom metrics collection

Add business metrics to your application

Extend your application with business-specific metrics like user signups, revenue, or feature usage. These metrics provide insights into application performance from a business perspective.

# Additional business metrics
user_signups = meter.create_counter(
    name="user_signups_total",
    description="Total number of user signups",
    unit="1"
)

order_value_histogram = meter.create_histogram(
    name="order_value_dollars",
    description="Order value in dollars",
    unit="USD"
)

feature_usage = meter.create_counter(
    name="feature_usage_total",
    description="Feature usage by type",
    unit="1"
)

Example business event tracking

def track_user_signup(user_type, source): user_signups.add(1, { "user_type": user_type, "source": source }) def track_order(order_value, product_category): order_value_histogram.record(order_value, { "category": product_category }) def track_feature_use(feature_name, user_tier): feature_usage.add(1, { "feature": feature_name, "tier": user_tier })

Set up metric sampling and filtering

Configure the collector to sample high-volume metrics and filter irrelevant data to reduce storage costs and improve query performance.

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 1s
    send_batch_size: 1024
  
  memory_limiter:
    limit_mib: 512
  
  filter/drop_debug:
    metrics:
      exclude:
        match_type: regexp
        metric_names:
          - "._debug."
          - "._test."
  
  probabilistic_sampler:
    sampling_percentage: 10

exporters:
  prometheus:
    endpoint: "0.0.0.0:8889"
    namespace: "app"
    const_labels:
      environment: "production"
      version: "1.0.0"
    
    metric_expiration: 180s
    enable_open_metrics: true

service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, filter/drop_debug, batch]
      exporters: [prometheus]
  telemetry:
    logs:
      level: info
    metrics:
      address: 0.0.0.0:8888

Set up Grafana dashboards and alerting

Create alerting rules in Grafana

Set up Grafana alerts that trigger notifications when metrics exceed thresholds. This provides proactive monitoring for your application performance.

curl -X POST http://admin:admin@localhost:3000/api/alert-rules \
  -H "Content-Type: application/json" \
  -d '{
    "title": "High Error Rate Alert",
    "condition": "B",
    "data": [
      {
        "refId": "A",
        "queryType": "",
        "relativeTimeRange": {
          "from": 300,
          "to": 0
        },
        "model": {
          "expr": "rate(app_http_requests_total{status=~\"5..\"}[5m])",
          "refId": "A"
        }
      },
      {
        "refId": "B",
        "queryType": "",
        "model": {
          "conditions": [
            {
              "evaluator": {
                "params": [0.1],
                "type": "gt"
              },
              "operator": {
                "type": "and"
              },
              "query": {
                "params": ["A"]
              },
              "reducer": {
                "params": [],
                "type": "avg"
              },
              "type": "query"
            }
          ],
          "refId": "B"
        }
      }
    ],
    "intervalSeconds": 60,
    "noDataState": "NoData",
    "execErrState": "Alerting",
    "for": "2m"
  }'

Configure notification channels

Set up Slack or email notifications for alerts. This ensures your team gets notified when application issues occur.

curl -X POST http://admin:admin@localhost:3000/api/alert-notifications \
  -H "Content-Type: application/json" \
  -d '{
    "name": "slack-alerts",
    "type": "slack",
    "settings": {
      "url": "https://hooks.slack.com/services/YOUR/WEBHOOK/URL",
      "username": "Grafana",
      "channel": "#alerts",
      "title": "Application Alert",
      "text": "{{ range .Alerts }}{{ .Annotations.summary }}\n{{ .Annotations.description }}{{ end }}"
    }
  }'

Import pre-built dashboard

Load a comprehensive dashboard template that includes panels for all common OpenTelemetry metrics and alerts.

curl -X POST http://admin:admin@localhost:3000/api/dashboards/db \
  -H "Content-Type: application/json" \
  -d '@otel_dashboard.json'

Verify your setup

# Check OpenTelemetry Collector status
sudo systemctl status otelcol

Verify collector is receiving metrics

curl http://localhost:8889/metrics | grep app_

Check Prometheus targets

curl http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | select(.job=="otel-collector")'

Test metric ingestion

python3 app_metrics.py

Query metrics in Prometheus

curl -G http://localhost:9090/api/v1/query --data-urlencode 'query=app_http_requests_total'

Check Grafana data source

curl http://admin:admin@localhost:3000/api/datasources

Common issues

SymptomCauseFix
Metrics not appearing in PrometheusCollector endpoint misconfiguredCheck /etc/otelcol-config.yaml endpoint settings and firewall rules
High memory usage in collectorNo memory limits configuredAdd memory_limiter processor with appropriate limit_mib
Missing metrics in GrafanaWrong Prometheus URLVerify data source URL is http://localhost:9090
Alerts not triggeringAlert rule query syntax errorTest queries in Prometheus before adding to Grafana alerts
Application metrics not exportedOTLP exporter endpoint wrongEnsure application sends to http://localhost:4318/v1/metrics

Next steps

Running this in production?

Want this handled for you? Setting this up once is straightforward. Keeping it patched, monitored, backed up and performant across environments is the harder part. See how we run infrastructure like this for European teams.

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle managed devops services for businesses that depend on uptime. From initial setup to ongoing operations.