Implement Node.js application monitoring with Prometheus metrics and Grafana dashboards

Intermediate 45 min Jun 08, 2026 20 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Set up comprehensive Node.js application monitoring using Prometheus metrics collection with the prom-client library and create custom Grafana dashboards for performance insights and alerting.

Prerequisites

  • Node.js 16+ installed
  • sudo access
  • 8GB+ RAM recommended

What this solves

Node.js applications in production need real-time monitoring to track performance metrics, identify bottlenecks, and respond to issues before they impact users. This tutorial sets up Prometheus metrics collection directly in your Node.js application using the prom-client library, then creates comprehensive Grafana dashboards to visualize application performance, HTTP request patterns, and custom business metrics with automated alerting.

Step-by-step implementation

Install Prometheus server

Start by installing Prometheus to collect and store metrics from your Node.js applications.

sudo apt update
sudo apt install -y prometheus prometheus-node-exporter
sudo systemctl enable --now prometheus
sudo systemctl enable --now prometheus-node-exporter
sudo dnf install -y epel-release
sudo dnf install -y prometheus2 node_exporter
sudo systemctl enable --now prometheus
sudo systemctl enable --now node_exporter

Configure Prometheus scraping

Configure Prometheus to scrape metrics from your Node.js application endpoints.

global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "/etc/prometheus/nodejs_alerts.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - localhost:9093

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node-exporter'
    static_configs:
      - targets: ['localhost:9100']

  - job_name: 'nodejs-app'
    static_configs:
      - targets: ['localhost:3000']
    metrics_path: '/metrics'
    scrape_interval: 10s

  - job_name: 'nodejs-app-custom'
    static_configs:
      - targets: ['localhost:3001', 'localhost:3002']
    metrics_path: '/metrics'
    scrape_interval: 10s

Install Grafana for visualization

Install Grafana to create dashboards and alerts for your Node.js metrics.

wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt update
sudo apt install -y grafana
sudo systemctl enable --now grafana-server
sudo dnf install -y https://dl.grafana.com/oss/release/grafana-10.2.0-1.x86_64.rpm
sudo systemctl enable --now grafana-server

Create Node.js application with metrics

Set up a sample Node.js application with comprehensive Prometheus metrics collection.

mkdir ~/nodejs-monitoring && cd ~/nodejs-monitoring
npm init -y
npm install express prom-client response-time
const express = require('express');
const client = require('prom-client');
const responseTime = require('response-time');

const app = express();
const port = process.env.PORT || 3000;

// Enable collection of default metrics
client.collectDefaultMetrics({
  timeout: 5000,
  gcDurationBuckets: [0.001, 0.01, 0.1, 1, 2, 5]
});

// Custom metrics
const httpRequestDuration = new client.Histogram({
  name: 'http_request_duration_ms',
  help: 'Duration of HTTP requests in ms',
  labelNames: ['method', 'route', 'status_code'],
  buckets: [0.1, 5, 15, 50, 100, 500, 1000]
});

const httpRequestsTotal = new client.Counter({
  name: 'http_requests_total',
  help: 'Total number of HTTP requests',
  labelNames: ['method', 'route', 'status_code']
});

const activeConnections = new client.Gauge({
  name: 'nodejs_active_connections',
  help: 'Number of active connections',
});

const businessMetric = new client.Gauge({
  name: 'business_orders_processed',
  help: 'Number of orders processed',
});

const errorRate = new client.Counter({
  name: 'nodejs_errors_total',
  help: 'Total number of errors',
  labelNames: ['type', 'endpoint']
});

// Middleware to collect HTTP metrics
app.use(responseTime((req, res, time) => {
  const route = req.route ? req.route.path : req.path;
  const labels = {
    method: req.method,
    route: route,
    status_code: res.statusCode
  };
  
  httpRequestDuration.observe(labels, time);
  httpRequestsTotal.inc(labels);
}));

// Middleware to track active connections
app.use((req, res, next) => {
  activeConnections.inc();
  res.on('finish', () => {
    activeConnections.dec();
  });
  next();
});

// Sample routes
app.get('/', (req, res) => {
  res.json({ message: 'Node.js monitoring demo', timestamp: new Date().toISOString() });
});

app.get('/api/users', (req, res) => {
  // Simulate processing time
  setTimeout(() => {
    res.json({ users: ['alice', 'bob', 'charlie'], count: 3 });
  }, Math.random() * 100);
});

app.get('/api/orders', (req, res) => {
  // Simulate business metric
  businessMetric.inc(Math.floor(Math.random() * 5) + 1);
  res.json({ orders: ['order1', 'order2'], processed: true });
});

app.get('/api/error', (req, res) => {
  // Simulate error for testing
  errorRate.inc({ type: 'validation', endpoint: '/api/error' });
  res.status(500).json({ error: 'Simulated error for testing' });
});

app.get('/health', (req, res) => {
  res.status(200).json({
    status: 'healthy',
    uptime: process.uptime(),
    memory: process.memoryUsage(),
    timestamp: new Date().toISOString()
  });
});

// Metrics endpoint for Prometheus
app.get('/metrics', async (req, res) => {
  try {
    res.set('Content-Type', client.register.contentType);
    const metrics = await client.register.metrics();
    res.send(metrics);
  } catch (error) {
    res.status(500).send(error.message);
  }
});

// Graceful shutdown
process.on('SIGTERM', () => {
  console.log('SIGTERM received, shutting down gracefully');
  server.close(() => {
    console.log('Process terminated');
    process.exit(0);
  });
});

const server = app.listen(port, () => {
  console.log(Node.js monitoring app listening on port ${port});
  console.log(Metrics available at http://localhost:${port}/metrics);
  console.log(Health check at http://localhost:${port}/health);
});

module.exports = app;

Create production-ready package.json

Configure proper Node.js application scripts and dependencies for production deployment.

{
  "name": "nodejs-prometheus-monitoring",
  "version": "1.0.0",
  "description": "Node.js application with Prometheus metrics",
  "main": "app.js",
  "scripts": {
    "start": "node app.js",
    "dev": "nodemon app.js",
    "test": "jest",
    "healthcheck": "curl -f http://localhost:3000/health || exit 1"
  },
  "dependencies": {
    "express": "^4.18.2",
    "prom-client": "^14.2.0",
    "response-time": "^2.3.2"
  },
  "devDependencies": {
    "nodemon": "^3.0.1",
    "jest": "^29.0.0"
  },
  "keywords": ["nodejs", "prometheus", "monitoring", "metrics"],
  "author": "DevOps Team",
  "license": "MIT"
}

Create systemd service for Node.js app

Set up a systemd service to run your Node.js application with proper monitoring and restart policies.

[Unit]
Description=Node.js Monitoring Application
After=network.target
Requires=network.target

[Service]
Type=simple
User=nodejs
Group=nodejs
WorkingDirectory=/home/nodejs/nodejs-monitoring
ExecStart=/usr/bin/node app.js
Restart=always
RestartSec=5
Environment=NODE_ENV=production
Environment=PORT=3000
StandardOutput=journal
StandardError=journal
SyslogIdentifier=nodejs-monitoring

Resource limits

LimitNOFILE=65536 MemoryMax=512M

Security settings

NoNewPrivileges=true ProtectSystem=strict ProtectHome=true ReadWritePaths=/home/nodejs/nodejs-monitoring/logs [Install] WantedBy=multi-user.target
sudo useradd -r -s /bin/false nodejs
sudo mkdir -p /home/nodejs
sudo cp -r ~/nodejs-monitoring /home/nodejs/
sudo chown -R nodejs:nodejs /home/nodejs/nodejs-monitoring
sudo systemctl daemon-reload
sudo systemctl enable --now nodejs-monitoring

Configure Prometheus alerting rules

Create alerting rules for Node.js application monitoring with threshold-based notifications.

groups:
  • name: nodejs_alerts
rules: - alert: NodejsHighResponseTime expr: histogram_quantile(0.95, rate(http_request_duration_ms_bucket[5m])) > 500 for: 2m labels: severity: warning service: nodejs annotations: summary: "High response time detected" description: "95th percentile response time is {{ $value }}ms for {{ $labels.instance }}" - alert: NodejsHighErrorRate expr: rate(http_requests_total{status_code=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.05 for: 5m labels: severity: critical service: nodejs annotations: summary: "High error rate detected" description: "Error rate is {{ $value | humanizePercentage }} for {{ $labels.instance }}" - alert: NodejsHighMemoryUsage expr: nodejs_memory_heap_used_bytes / nodejs_memory_heap_total_bytes > 0.85 for: 5m labels: severity: warning service: nodejs annotations: summary: "High memory usage" description: "Memory usage is {{ $value | humanizePercentage }} for {{ $labels.instance }}" - alert: NodejsApplicationDown expr: up{job="nodejs-app"} == 0 for: 1m labels: severity: critical service: nodejs annotations: summary: "Node.js application is down" description: "Node.js application on {{ $labels.instance }} is not responding" - alert: NodejsHighActiveConnections expr: nodejs_active_connections > 100 for: 5m labels: severity: warning service: nodejs annotations: summary: "High number of active connections" description: "{{ $value }} active connections on {{ $labels.instance }}" - alert: NodejsEventLoopLag expr: nodejs_eventloop_lag_seconds > 0.1 for: 3m labels: severity: warning service: nodejs annotations: summary: "Event loop lag detected" description: "Event loop lag is {{ $value }}s on {{ $labels.instance }}" - alert: NodejsGCDuration expr: rate(nodejs_gc_duration_seconds_sum[5m]) > 0.1 for: 5m labels: severity: warning service: nodejs annotations: summary: "High garbage collection time" description: "GC duration is {{ $value }}s/sec on {{ $labels.instance }}"

Install and configure Alertmanager

Set up Alertmanager to handle notifications from Prometheus alerts.

sudo apt install -y prometheus-alertmanager
sudo dnf install -y alertmanager
global:
  smtp_smarthost: 'localhost:587'
  smtp_from: 'alerts@example.com'

route:
  group_by: ['alertname', 'cluster', 'service']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'nodejs-alerts'
  routes:
  - match:
      severity: critical
    receiver: 'critical-alerts'
    continue: true

receivers:
  • name: 'nodejs-alerts'
email_configs: - to: 'devops@example.com' subject: 'Node.js Alert: {{ .GroupLabels.alertname }}' body: | {{ range .Alerts }} Alert: {{ .Annotations.summary }} Description: {{ .Annotations.description }} Instance: {{ .Labels.instance }} Severity: {{ .Labels.severity }} {{ end }}
  • name: 'critical-alerts'
email_configs: - to: 'oncall@example.com' subject: 'CRITICAL: {{ .GroupLabels.alertname }}' body: | CRITICAL ALERT TRIGGERED {{ range .Alerts }} Alert: {{ .Annotations.summary }} Description: {{ .Annotations.description }} Instance: {{ .Labels.instance }} Time: {{ .StartsAt }} {{ end }}
sudo systemctl enable --now alertmanager
sudo systemctl restart prometheus

Create comprehensive Grafana dashboards

Import and configure detailed Grafana dashboards for Node.js application monitoring.

curl -X POST http://admin:admin@localhost:3000/api/datasources \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Prometheus",
    "type": "prometheus",
    "url": "http://localhost:9090",
    "access": "proxy",
    "isDefault": true
  }'
{
  "dashboard": {
    "id": null,
    "title": "Node.js Application Monitoring",
    "tags": ["nodejs", "prometheus", "monitoring"],
    "timezone": "browser",
    "panels": [
      {
        "id": 1,
        "title": "HTTP Request Rate",
        "type": "stat",
        "targets": [
          {
            "expr": "rate(http_requests_total[5m])",
            "legendFormat": "Requests/sec"
          }
        ],
        "gridPos": {"h": 8, "w": 12, "x": 0, "y": 0}
      },
      {
        "id": 2,
        "title": "Response Time Percentiles",
        "type": "graph",
        "targets": [
          {
            "expr": "histogram_quantile(0.50, rate(http_request_duration_ms_bucket[5m]))",
            "legendFormat": "50th percentile"
          },
          {
            "expr": "histogram_quantile(0.95, rate(http_request_duration_ms_bucket[5m]))",
            "legendFormat": "95th percentile"
          },
          {
            "expr": "histogram_quantile(0.99, rate(http_request_duration_ms_bucket[5m]))",
            "legendFormat": "99th percentile"
          }
        ],
        "gridPos": {"h": 8, "w": 12, "x": 12, "y": 0}
      },
      {
        "id": 3,
        "title": "Memory Usage",
        "type": "graph",
        "targets": [
          {
            "expr": "nodejs_memory_heap_used_bytes",
            "legendFormat": "Heap Used"
          },
          {
            "expr": "nodejs_memory_heap_total_bytes",
            "legendFormat": "Heap Total"
          },
          {
            "expr": "nodejs_memory_external_bytes",
            "legendFormat": "External"
          }
        ],
        "gridPos": {"h": 8, "w": 12, "x": 0, "y": 8}
      },
      {
        "id": 4,
        "title": "Error Rate by Endpoint",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(http_requests_total{status_code=~\"5..\"}[5m]) by (route)",
            "legendFormat": "{{route}}"
          }
        ],
        "gridPos": {"h": 8, "w": 12, "x": 12, "y": 8}
      },
      {
        "id": 5,
        "title": "Active Connections",
        "type": "singlestat",
        "targets": [
          {
            "expr": "nodejs_active_connections",
            "legendFormat": "Active Connections"
          }
        ],
        "gridPos": {"h": 4, "w": 6, "x": 0, "y": 16}
      },
      {
        "id": 6,
        "title": "Event Loop Lag",
        "type": "singlestat",
        "targets": [
          {
            "expr": "nodejs_eventloop_lag_seconds",
            "legendFormat": "Event Loop Lag (s)"
          }
        ],
        "gridPos": {"h": 4, "w": 6, "x": 6, "y": 16}
      },
      {
        "id": 7,
        "title": "GC Duration",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(nodejs_gc_duration_seconds_sum[5m]) by (kind)",
            "legendFormat": "{{kind}}"
          }
        ],
        "gridPos": {"h": 8, "w": 12, "x": 12, "y": 16}
      }
    ],
    "time": {
      "from": "now-1h",
      "to": "now"
    },
    "refresh": "10s"
  }
}
curl -X POST http://admin:admin@localhost:3000/api/dashboards/db \
  -H "Content-Type: application/json" \
  -d @nodejs_dashboard.json

Set up advanced custom metrics

Add business-specific metrics and advanced monitoring patterns to your Node.js application.

const client = require('prom-client');

// Business metrics
const userRegistrations = new client.Counter({
  name: 'user_registrations_total',
  help: 'Total number of user registrations',
  labelNames: ['source', 'country']
});

const orderValue = new client.Histogram({
  name: 'order_value_dollars',
  help: 'Order value in dollars',
  labelNames: ['category', 'payment_method'],
  buckets: [10, 25, 50, 100, 250, 500, 1000, 2500]
});

const activeUsers = new client.Gauge({
  name: 'active_users_current',
  help: 'Current number of active users',
  labelNames: ['session_type']
});

const databaseConnections = new client.Gauge({
  name: 'database_connections_active',
  help: 'Active database connections',
  labelNames: ['database', 'type']
});

const cacheHitRate = new client.Gauge({
  name: 'cache_hit_rate',
  help: 'Cache hit rate percentage',
  labelNames: ['cache_type']
});

// Queue metrics
const queueSize = new client.Gauge({
  name: 'queue_size',
  help: 'Current queue size',
  labelNames: ['queue_name']
});

const jobProcessingTime = new client.Histogram({
  name: 'job_processing_duration_seconds',
  help: 'Job processing duration',
  labelNames: ['job_type', 'status'],
  buckets: [0.1, 0.5, 1, 2, 5, 10, 30]
});

// Custom middleware for detailed request tracking
const requestTracker = (req, res, next) => {
  const startTime = Date.now();
  const originalSend = res.send;
  
  res.send = function(data) {
    const duration = Date.now() - startTime;
    const contentLength = Buffer.byteLength(data || '', 'utf8');
    
    // Track response size
    const responseSizeHistogram = new client.Histogram({
      name: 'http_response_size_bytes',
      help: 'Size of HTTP responses',
      labelNames: ['method', 'route', 'status_code'],
      buckets: [100, 1000, 5000, 10000, 50000, 100000]
    });
    
    responseSizeHistogram.observe(
      {
        method: req.method,
        route: req.route ? req.route.path : 'unknown',
        status_code: res.statusCode
      },
      contentLength
    );
    
    originalSend.call(this, data);
  };
  
  next();
};

// Function to simulate business metrics
const updateBusinessMetrics = () => {
  // Simulate user activity
  activeUsers.set({ session_type: 'web' }, Math.floor(Math.random() * 100) + 50);
  activeUsers.set({ session_type: 'mobile' }, Math.floor(Math.random() * 50) + 20);
  
  // Simulate cache performance
  cacheHitRate.set({ cache_type: 'redis' }, Math.random() * 0.3 + 0.7); // 70-100%
  cacheHitRate.set({ cache_type: 'memory' }, Math.random() * 0.2 + 0.8); // 80-100%
  
  // Simulate database connections
  databaseConnections.set({ database: 'primary', type: 'read' }, Math.floor(Math.random() * 10) + 5);
  databaseConnections.set({ database: 'primary', type: 'write' }, Math.floor(Math.random() * 5) + 2);
  
  // Simulate queue sizes
  queueSize.set({ queue_name: 'email' }, Math.floor(Math.random() * 20));
  queueSize.set({ queue_name: 'analytics' }, Math.floor(Math.random() * 50));
};

// Update business metrics every 30 seconds
setInterval(updateBusinessMetrics, 30000);

module.exports = {
  userRegistrations,
  orderValue,
  activeUsers,
  databaseConnections,
  cacheHitRate,
  queueSize,
  jobProcessingTime,
  requestTracker
};

Configure Grafana alerting

Set up Grafana alerting rules with notification channels for comprehensive monitoring.

# Create notification channel for Slack
curl -X POST http://admin:admin@localhost:3000/api/alert-notifications \
  -H "Content-Type: application/json" \
  -d '{
    "name": "slack-nodejs-alerts",
    "type": "slack",
    "settings": {
      "url": "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK",
      "channel": "#nodejs-alerts",
      "username": "Grafana",
      "title": "Node.js Alert",
      "text": "Alert: {{ range .Alerts }}{{ .Annotations.summary }}{{ end }}"
    }
  }'

Create email notification channel

curl -X POST http://admin:admin@localhost:3000/api/alert-notifications \ -H "Content-Type: application/json" \ -d '{ "name": "email-nodejs-alerts", "type": "email", "settings": { "addresses": "devops@example.com;oncall@example.com", "subject": "Node.js Application Alert" } }'

Verify your setup

Test your Node.js monitoring stack to ensure all components are working correctly.

# Check application is running and exposing metrics
curl http://localhost:3000/health
curl http://localhost:3000/metrics | head -20

Verify Prometheus is scraping metrics

curl http://localhost:9090/api/v1/query?query=up{job="nodejs-app"}

Check Grafana dashboard access

curl -I http://localhost:3000

Test alerting by triggering an error

curl http://localhost:3000/api/error

Check service status

sudo systemctl status nodejs-monitoring sudo systemctl status prometheus sudo systemctl status grafana-server sudo systemctl status alertmanager

View application logs

sudo journalctl -u nodejs-monitoring -f
Note: Access Grafana at http://localhost:3000 (admin/admin), Prometheus at http://localhost:9090, and your Node.js app metrics at http://localhost:3000/metrics.

Load testing your monitoring

Generate traffic to test your monitoring setup and verify metrics collection works under load.

# Install Apache Bench for load testing
sudo apt install -y apache2-utils

Generate load on different endpoints

ab -n 1000 -c 10 http://localhost:3000/ ab -n 500 -c 5 http://localhost:3000/api/users ab -n 200 -c 3 http://localhost:3000/api/orders

Test error endpoint to trigger alerts

for i in {1..10}; do curl http://localhost:3000/api/error; done

Common issues

SymptomCauseFix
Metrics endpoint returns 404Route not configured properlyCheck /metrics endpoint in app.js and restart service
Prometheus shows target downApplication not running on expected portVerify sudo systemctl status nodejs-monitoring and port configuration
Grafana shows "No data" in panelsPrometheus datasource not configuredAdd Prometheus datasource at http://localhost:9090
High memory usage from metricsToo many metric labels or high cardinalityReview and limit metric labels, use client.register.clear()
Alerts not firingAlertmanager not receiving rulesCheck /etc/prometheus/nodejs_alerts.yml syntax with promtool check rules
Permission denied writing logsService user lacks write accesssudo chown nodejs:nodejs /home/nodejs/nodejs-monitoring/logs

Next steps

Running this in production?

Want this handled for you? Setting up monitoring once is straightforward. Keeping it patched, monitored, backed up and tuned across environments is the harder part. See how we run infrastructure like this for European SaaS and e-commerce teams.

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle managed devops services for businesses that depend on uptime. From initial setup to ongoing operations.