Configure NTP monitoring with Grafana dashboards and Prometheus alerting

Intermediate 25 min Jun 06, 2026 112 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Set up comprehensive time synchronization monitoring using chrony, Prometheus node exporter, and custom Grafana dashboards with alerting for time drift and NTP service failures.

Prerequisites

  • Prometheus server installed
  • Grafana server installed
  • Sudo access
  • Network connectivity to NTP servers

What this solves

Accurate time synchronization is critical for distributed systems, logging, security protocols, and compliance requirements. This tutorial sets up monitoring for your NTP service using chrony, collects time drift metrics with Prometheus, and creates Grafana dashboards with alerting for time synchronization issues.

Step-by-step configuration

Install and configure chrony NTP service

Start by installing chrony, a modern NTP implementation that provides better accuracy and faster synchronization than traditional ntpd.

sudo apt update
sudo apt install -y chrony
sudo dnf install -y chrony

Configure chrony with monitoring-friendly settings

Edit the chrony configuration to use reliable NTP servers and enable statistics logging for monitoring.

# Use public NTP servers from the pool.ntp.org project
server 0.pool.ntp.org iburst
server 1.pool.ntp.org iburst
server 2.pool.ntp.org iburst
server 3.pool.ntp.org iburst

Record the rate at which the system clock gains/losses time

driftfile /var/lib/chrony/drift

Allow the system clock to be stepped in the first three updates

makestep 1.0 3

Enable kernel synchronization of the real-time clock (RTC)

rtcsync

Enable hardware timestamping on all interfaces that support it

hwtimestamp *

Increase the minimum number of selectable sources required to adjust the system clock

minsources 2

Allow NTP client access from local network

allow 192.168.0.0/16 allow 10.0.0.0/8 allow 172.16.0.0/12

Serve time even if not synchronized to a time source

local stratum 10

Enable statistics logging for monitoring

log statistics measurements tracking tempcomp logdir /var/log/chrony

Create chrony log directory and set permissions

Create the log directory for chrony statistics and set appropriate permissions.

sudo mkdir -p /var/log/chrony
sudo chown chrony:chrony /var/log/chrony
sudo chmod 755 /var/log/chrony

Enable and start chrony service

Start the chrony service and enable it to start automatically on boot.

sudo systemctl enable chrony
sudo systemctl start chrony
sudo systemctl status chrony

Install Prometheus node exporter

Install the Prometheus node exporter which will collect system metrics including time synchronization data.

sudo apt install -y prometheus-node-exporter
sudo dnf install -y nodejs-exporter

Download and install NTP exporter

Install a dedicated NTP exporter to collect detailed chrony metrics for Prometheus.

cd /tmp
wget https://github.com/sapcc/ntp_exporter/releases/download/v1.1.0/ntp_exporter-1.1.0.linux-amd64.tar.gz
tar -xzf ntp_exporter-1.1.0.linux-amd64.tar.gz
sudo mv ntp_exporter-1.1.0.linux-amd64/ntp_exporter /usr/local/bin/
sudo chmod +x /usr/local/bin/ntp_exporter

Create NTP exporter systemd service

Create a systemd service file to run the NTP exporter as a service.

[Unit]
Description=NTP Exporter for Prometheus
After=network.target

[Service]
Type=simple
User=prometheus
Group=prometheus
ExecStart=/usr/local/bin/ntp_exporter -chrony.address unix:///var/run/chrony/chronyd.sock
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

Create prometheus user and configure permissions

Create a dedicated user for the NTP exporter and configure access to chrony socket.

sudo useradd --no-create-home --shell /bin/false prometheus
sudo usermod -a -G chrony prometheus
sudo chmod 755 /var/run/chrony
sudo chmod 666 /var/run/chrony/chronyd.sock

Start NTP exporter service

Enable and start the NTP exporter service.

sudo systemctl daemon-reload
sudo systemctl enable ntp-exporter
sudo systemctl start ntp-exporter
sudo systemctl status ntp-exporter

Configure Prometheus to scrape NTP metrics

Add the NTP exporter to your Prometheus configuration. This assumes you have Prometheus already installed following our Prometheus and Grafana setup guide.

global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "/etc/prometheus/rules/*.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - localhost:9093

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node-exporter'
    static_configs:
      - targets: ['localhost:9100']
    scrape_interval: 5s

  - job_name: 'ntp-exporter'
    static_configs:
      - targets: ['localhost:9559']
    scrape_interval: 30s
    metrics_path: /metrics

Create NTP alerting rules

Create Prometheus alerting rules to detect time synchronization issues.

sudo mkdir -p /etc/prometheus/rules
groups:
  - name: ntp_alerts
    rules:
    - alert: NTPDrift
      expr: abs(ntp_drift_seconds) > 0.5
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "NTP time drift detected on {{ $labels.instance }}"
        description: "System clock drift is {{ $value }}s on {{ $labels.instance }}, which exceeds the 0.5s threshold."

    - alert: NTPHighDrift
      expr: abs(ntp_drift_seconds) > 2.0
      for: 2m
      labels:
        severity: critical
      annotations:
        summary: "High NTP time drift on {{ $labels.instance }}"
        description: "System clock drift is {{ $value }}s on {{ $labels.instance }}, which exceeds the critical 2.0s threshold."

    - alert: NTPNotSynchronized
      expr: ntp_stratum > 16
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "NTP not synchronized on {{ $labels.instance }}"
        description: "NTP stratum is {{ $value }} on {{ $labels.instance }}, indicating no time synchronization."

    - alert: NTPServiceDown
      expr: up{job="ntp-exporter"} == 0
      for: 2m
      labels:
        severity: critical
      annotations:
        summary: "NTP exporter is down on {{ $labels.instance }}"
        description: "NTP exporter has been down for more than 2 minutes on {{ $labels.instance }}."

    - alert: ChronydDown
      expr: node_systemd_unit_state{name="chronyd.service",state="active"} != 1
      for: 3m
      labels:
        severity: critical
      annotations:
        summary: "Chronyd service is not running on {{ $labels.instance }}"
        description: "Chronyd service is not in active state on {{ $labels.instance }}."

    - alert: NTPSourcesLow
      expr: ntp_source_count < 2
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "Low number of NTP sources on {{ $labels.instance }}"
        description: "Only {{ $value }} NTP sources available on {{ $labels.instance }}, recommend at least 2 sources."

    - alert: NTPRootDelay
      expr: ntp_root_delay_seconds > 0.1
      for: 15m
      labels:
        severity: warning
      annotations:
        summary: "High NTP root delay on {{ $labels.instance }}"
        description: "NTP root delay is {{ $value }}s on {{ $labels.instance }}, indicating potential network issues."

Restart Prometheus to load new configuration

Restart Prometheus to apply the new scrape configuration and alerting rules.

sudo systemctl restart prometheus
sudo systemctl status prometheus

Configure Grafana data source

If you haven't already configured Prometheus as a data source in Grafana, add it now. Access your Grafana instance and add Prometheus as a data source pointing to http://localhost:9090.

Create NTP monitoring dashboard

Create a comprehensive Grafana dashboard for NTP monitoring. Save this JSON configuration as a new dashboard in Grafana.

{
  "dashboard": {
    "id": null,
    "title": "NTP Time Synchronization Monitoring",
    "tags": ["ntp", "time", "chrony"],
    "timezone": "browser",
    "panels": [
      {
        "id": 1,
        "title": "Time Drift",
        "type": "stat",
        "targets": [
          {
            "expr": "ntp_drift_seconds",
            "refId": "A"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "unit": "s",
            "thresholds": {
              "steps": [
                {"color": "green", "value": null},
                {"color": "yellow", "value": 0.1},
                {"color": "red", "value": 0.5}
              ]
            }
          }
        },
        "gridPos": {"h": 8, "w": 12, "x": 0, "y": 0}
      },
      {
        "id": 2,
        "title": "NTP Stratum",
        "type": "stat",
        "targets": [
          {
            "expr": "ntp_stratum",
            "refId": "A"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "thresholds": {
              "steps": [
                {"color": "green", "value": null},
                {"color": "yellow", "value": 8},
                {"color": "red", "value": 15}
              ]
            }
          }
        },
        "gridPos": {"h": 8, "w": 12, "x": 12, "y": 0}
      },
      {
        "id": 3,
        "title": "Time Drift Over Time",
        "type": "timeseries",
        "targets": [
          {
            "expr": "ntp_drift_seconds",
            "refId": "A",
            "legendFormat": "Time Drift"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "unit": "s"
          }
        },
        "gridPos": {"h": 8, "w": 24, "x": 0, "y": 8}
      },
      {
        "id": 4,
        "title": "NTP Sources",
        "type": "stat",
        "targets": [
          {
            "expr": "ntp_source_count",
            "refId": "A"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "thresholds": {
              "steps": [
                {"color": "red", "value": null},
                {"color": "yellow", "value": 1},
                {"color": "green", "value": 2}
              ]
            }
          }
        },
        "gridPos": {"h": 8, "w": 8, "x": 0, "y": 16}
      },
      {
        "id": 5,
        "title": "Root Delay",
        "type": "stat",
        "targets": [
          {
            "expr": "ntp_root_delay_seconds",
            "refId": "A"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "unit": "s",
            "thresholds": {
              "steps": [
                {"color": "green", "value": null},
                {"color": "yellow", "value": 0.05},
                {"color": "red", "value": 0.1}
              ]
            }
          }
        },
        "gridPos": {"h": 8, "w": 8, "x": 8, "y": 16}
      },
      {
        "id": 6,
        "title": "Chronyd Service Status",
        "type": "stat",
        "targets": [
          {
            "expr": "node_systemd_unit_state{name=\"chronyd.service\",state=\"active\"}",
            "refId": "A"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "mappings": [
              {"options": {"0": {"text": "DOWN"}, "1": {"text": "UP"}}, "type": "value"}
            ],
            "thresholds": {
              "steps": [
                {"color": "red", "value": null},
                {"color": "green", "value": 1}
              ]
            }
          }
        },
        "gridPos": {"h": 8, "w": 8, "x": 16, "y": 16}
      }
    ],
    "time": {
      "from": "now-1h",
      "to": "now"
    },
    "refresh": "30s"
  }
}

Configure Alertmanager for NTP alerts

Configure Alertmanager to handle NTP alerts. This example shows email notifications, but you can adapt it for Slack or other channels following our Alertmanager webhook guide.

global:
  smtp_smarthost: 'localhost:587'
  smtp_from: 'alerts@example.com'
  smtp_auth_username: 'alerts@example.com'
  smtp_auth_password: 'your-email-password'

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'ntp-alerts'

receivers:
  - name: 'ntp-alerts'
    email_configs:
      - to: 'admin@example.com'
        subject: 'NTP Alert: {{ .GroupLabels.alertname }}'
        body: |
          {{ range .Alerts }}
          Alert: {{ .Annotations.summary }}
          Description: {{ .Annotations.description }}
          Instance: {{ .Labels.instance }}
          Severity: {{ .Labels.severity }}
          {{ end }}

Restart Alertmanager

Restart Alertmanager to apply the new configuration.

sudo systemctl restart alertmanager
sudo systemctl status alertmanager

Verify your setup

Check that all components are working correctly:

# Verify chrony is synchronizing
chronyc sources -v
chronyc tracking

Check NTP exporter metrics

curl http://localhost:9559/metrics | grep ntp_

Verify Prometheus is scraping NTP metrics

curl http://localhost:9090/api/v1/query?query=ntp_drift_seconds

Check service statuses

sudo systemctl status chrony sudo systemctl status ntp-exporter sudo systemctl status prometheus sudo systemctl status alertmanager
Note: It may take a few minutes for chrony to fully synchronize after initial startup. Check chronyc tracking to see the current synchronization status.

Common issues

Symptom Cause Fix
NTP exporter fails to start Cannot access chrony socket sudo usermod -a -G chrony prometheus and restart service
No metrics in Prometheus Incorrect scrape configuration Verify targets in Prometheus UI and check exporter is running on port 9559
High time drift alerts Network issues or bad NTP sources Check chronyc sources and consider changing NTP pool servers
Chronyd not synchronizing Firewall blocking NTP traffic Allow UDP port 123: sudo ufw allow 123/udp
Dashboard shows no data Grafana data source misconfigured Verify Prometheus data source URL and test connection
Alertmanager not sending emails SMTP configuration issues Test SMTP settings and check Alertmanager logs: journalctl -u alertmanager

Next steps

Running this in production?

Want this handled for you? Setting this up once is straightforward. Keeping it patched, monitored, backed up and performant across environments is the harder part. See how we run infrastructure like this for European teams.

Need help?

Don't want to manage this yourself?

We handle managed devops services for businesses that depend on uptime. From initial setup to ongoing operations.