Configure NTP Monitoring with Grafana & Prometheus

Set up comprehensive time synchronization monitoring using chrony, Prometheus node exporter, and custom Grafana dashboards with alerting for time drift and NTP service failures.

Prerequisites

Prometheus server installed
Grafana server installed
Sudo access
Network connectivity to NTP servers

What this solves

Accurate time synchronization is critical for distributed systems, logging, security protocols, and compliance requirements. This tutorial sets up monitoring for your NTP service using chrony, collects time drift metrics with Prometheus, and creates Grafana dashboards with alerting for time synchronization issues.

Step-by-step configuration

Install and configure chrony NTP service

Start by installing chrony, a modern NTP implementation that provides better accuracy and faster synchronization than traditional ntpd.

sudo apt update
sudo apt install -y chrony

sudo dnf install -y chrony

Configure chrony with monitoring-friendly settings

Edit the chrony configuration to use reliable NTP servers and enable statistics logging for monitoring.

# Use public NTP servers from the pool.ntp.org project
server 0.pool.ntp.org iburst
server 1.pool.ntp.org iburst
server 2.pool.ntp.org iburst
server 3.pool.ntp.org iburst

# Record the rate at which the system clock gains/losses time
driftfile /var/lib/chrony/drift

# Allow the system clock to be stepped in the first three updates
makestep 1.0 3

# Enable kernel synchronization of the real-time clock (RTC)
rtcsync

# Enable hardware timestamping on all interfaces that support it
hwtimestamp *

# Increase the minimum number of selectable sources required to adjust the system clock
minsources 2

# Allow NTP client access from local network
allow 192.168.0.0/16
allow 10.0.0.0/8
allow 172.16.0.0/12

# Serve time even if not synchronized to a time source
local stratum 10

# Enable statistics logging for monitoring
log statistics measurements tracking tempcomp
logdir /var/log/chrony

Create chrony log directory and set permissions

Create the log directory for chrony statistics and set appropriate permissions.

sudo mkdir -p /var/log/chrony
sudo chown chrony:chrony /var/log/chrony
sudo chmod 755 /var/log/chrony

Enable and start chrony service

Start the chrony service and enable it to start automatically on boot.

sudo systemctl enable chrony
sudo systemctl start chrony
sudo systemctl status chrony

Install Prometheus node exporter

Install the Prometheus node exporter which will collect system metrics including time synchronization data.

sudo apt install -y prometheus-node-exporter

sudo dnf install -y nodejs-exporter

Download and install NTP exporter

Install a dedicated NTP exporter to collect detailed chrony metrics for Prometheus.

cd /tmp
wget https://github.com/sapcc/ntp_exporter/releases/download/v1.1.0/ntp_exporter-1.1.0.linux-amd64.tar.gz
tar -xzf ntp_exporter-1.1.0.linux-amd64.tar.gz
sudo mv ntp_exporter-1.1.0.linux-amd64/ntp_exporter /usr/local/bin/
sudo chmod +x /usr/local/bin/ntp_exporter

Create NTP exporter systemd service

Create a systemd service file to run the NTP exporter as a service.

[Unit]
Description=NTP Exporter for Prometheus
After=network.target

[Service]
Type=simple
User=prometheus
Group=prometheus
ExecStart=/usr/local/bin/ntp_exporter -chrony.address unix:///var/run/chrony/chronyd.sock
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

Create prometheus user and configure permissions

Create a dedicated user for the NTP exporter and configure access to chrony socket.

sudo useradd --no-create-home --shell /bin/false prometheus
sudo usermod -a -G chrony prometheus
sudo chmod 755 /var/run/chrony
sudo chmod 666 /var/run/chrony/chronyd.sock

Start NTP exporter service

Enable and start the NTP exporter service.

sudo systemctl daemon-reload
sudo systemctl enable ntp-exporter
sudo systemctl start ntp-exporter
sudo systemctl status ntp-exporter

Configure Prometheus to scrape NTP metrics

Add the NTP exporter to your Prometheus configuration. This assumes you have Prometheus already installed following our Prometheus and Grafana setup guide.

global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "/etc/prometheus/rules/*.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - localhost:9093

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node-exporter'
    static_configs:
      - targets: ['localhost:9100']
    scrape_interval: 5s

  - job_name: 'ntp-exporter'
    static_configs:
      - targets: ['localhost:9559']
    scrape_interval: 30s
    metrics_path: /metrics

Create NTP alerting rules

Create Prometheus alerting rules to detect time synchronization issues.

sudo mkdir -p /etc/prometheus/rules

groups:
  - name: ntp_alerts
    rules:
    - alert: NTPDrift
      expr: abs(ntp_drift_seconds) > 0.5
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "NTP time drift detected on {{ $labels.instance }}"
        description: "System clock drift is {{ $value }}s on {{ $labels.instance }}, which exceeds the 0.5s threshold."

    - alert: NTPHighDrift
      expr: abs(ntp_drift_seconds) > 2.0
      for: 2m
      labels:
        severity: critical
      annotations:
        summary: "High NTP time drift on {{ $labels.instance }}"
        description: "System clock drift is {{ $value }}s on {{ $labels.instance }}, which exceeds the critical 2.0s threshold."

    - alert: NTPNotSynchronized
      expr: ntp_stratum > 16
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "NTP not synchronized on {{ $labels.instance }}"
        description: "NTP stratum is {{ $value }} on {{ $labels.instance }}, indicating no time synchronization."

    - alert: NTPServiceDown
      expr: up{job="ntp-exporter"} == 0
      for: 2m
      labels:
        severity: critical
      annotations:
        summary: "NTP exporter is down on {{ $labels.instance }}"
        description: "NTP exporter has been down for more than 2 minutes on {{ $labels.instance }}."

    - alert: ChronydDown
      expr: node_systemd_unit_state{name="chronyd.service",state="active"} != 1
      for: 3m
      labels:
        severity: critical
      annotations:
        summary: "Chronyd service is not running on {{ $labels.instance }}"
        description: "Chronyd service is not in active state on {{ $labels.instance }}."

    - alert: NTPSourcesLow
      expr: ntp_source_count < 2
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "Low number of NTP sources on {{ $labels.instance }}"
        description: "Only {{ $value }} NTP sources available on {{ $labels.instance }}, recommend at least 2 sources."

    - alert: NTPRootDelay
      expr: ntp_root_delay_seconds > 0.1
      for: 15m
      labels:
        severity: warning
      annotations:
        summary: "High NTP root delay on {{ $labels.instance }}"
        description: "NTP root delay is {{ $value }}s on {{ $labels.instance }}, indicating potential network issues."

Restart Prometheus to load new configuration

Restart Prometheus to apply the new scrape configuration and alerting rules.

sudo systemctl restart prometheus
sudo systemctl status prometheus

Configure Grafana data source

If you haven't already configured Prometheus as a data source in Grafana, add it now. Access your Grafana instance and add Prometheus as a data source pointing to http://localhost:9090.

Create NTP monitoring dashboard

Create a comprehensive Grafana dashboard for NTP monitoring. Save this JSON configuration as a new dashboard in Grafana.

{
  "dashboard": {
    "id": null,
    "title": "NTP Time Synchronization Monitoring",
    "tags": ["ntp", "time", "chrony"],
    "timezone": "browser",
    "panels": [
      {
        "id": 1,
        "title": "Time Drift",
        "type": "stat",
        "targets": [
          {
            "expr": "ntp_drift_seconds",
            "refId": "A"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "unit": "s",
            "thresholds": {
              "steps": [
                {"color": "green", "value": null},
                {"color": "yellow", "value": 0.1},
                {"color": "red", "value": 0.5}
              ]
            }
          }
        },
        "gridPos": {"h": 8, "w": 12, "x": 0, "y": 0}
      },
      {
        "id": 2,
        "title": "NTP Stratum",
        "type": "stat",
        "targets": [
          {
            "expr": "ntp_stratum",
            "refId": "A"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "thresholds": {
              "steps": [
                {"color": "green", "value": null},
                {"color": "yellow", "value": 8},
                {"color": "red", "value": 15}
              ]
            }
          }
        },
        "gridPos": {"h": 8, "w": 12, "x": 12, "y": 0}
      },
      {
        "id": 3,
        "title": "Time Drift Over Time",
        "type": "timeseries",
        "targets": [
          {
            "expr": "ntp_drift_seconds",
            "refId": "A",
            "legendFormat": "Time Drift"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "unit": "s"
          }
        },
        "gridPos": {"h": 8, "w": 24, "x": 0, "y": 8}
      },
      {
        "id": 4,
        "title": "NTP Sources",
        "type": "stat",
        "targets": [
          {
            "expr": "ntp_source_count",
            "refId": "A"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "thresholds": {
              "steps": [
                {"color": "red", "value": null},
                {"color": "yellow", "value": 1},
                {"color": "green", "value": 2}
              ]
            }
          }
        },
        "gridPos": {"h": 8, "w": 8, "x": 0, "y": 16}
      },
      {
        "id": 5,
        "title": "Root Delay",
        "type": "stat",
        "targets": [
          {
            "expr": "ntp_root_delay_seconds",
            "refId": "A"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "unit": "s",
            "thresholds": {
              "steps": [
                {"color": "green", "value": null},
                {"color": "yellow", "value": 0.05},
                {"color": "red", "value": 0.1}
              ]
            }
          }
        },
        "gridPos": {"h": 8, "w": 8, "x": 8, "y": 16}
      },
      {
        "id": 6,
        "title": "Chronyd Service Status",
        "type": "stat",
        "targets": [
          {
            "expr": "node_systemd_unit_state{name=\"chronyd.service\",state=\"active\"}",
            "refId": "A"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "mappings": [
              {"options": {"0": {"text": "DOWN"}, "1": {"text": "UP"}}, "type": "value"}
            ],
            "thresholds": {
              "steps": [
                {"color": "red", "value": null},
                {"color": "green", "value": 1}
              ]
            }
          }
        },
        "gridPos": {"h": 8, "w": 8, "x": 16, "y": 16}
      }
    ],
    "time": {
      "from": "now-1h",
      "to": "now"
    },
    "refresh": "30s"
  }
}

Configure Alertmanager for NTP alerts

Configure Alertmanager to handle NTP alerts. This example shows email notifications, but you can adapt it for Slack or other channels following our Alertmanager webhook guide.

global:
  smtp_smarthost: 'localhost:587'
  smtp_from: 'alerts@example.com'
  smtp_auth_username: 'alerts@example.com'
  smtp_auth_password: 'your-email-password'

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'ntp-alerts'

receivers:
  - name: 'ntp-alerts'
    email_configs:
      - to: 'admin@example.com'
        subject: 'NTP Alert: {{ .GroupLabels.alertname }}'
        body: |
          {{ range .Alerts }}
          Alert: {{ .Annotations.summary }}
          Description: {{ .Annotations.description }}
          Instance: {{ .Labels.instance }}
          Severity: {{ .Labels.severity }}
          {{ end }}

Restart Alertmanager

Restart Alertmanager to apply the new configuration.

sudo systemctl restart alertmanager
sudo systemctl status alertmanager

Verify your setup

Check that all components are working correctly:

# Verify chrony is synchronizing
chronyc sources -v
chronyc tracking

# Check NTP exporter metrics
curl http://localhost:9559/metrics | grep ntp_

# Verify Prometheus is scraping NTP metrics
curl http://localhost:9090/api/v1/query?query=ntp_drift_seconds

# Check service statuses
sudo systemctl status chrony
sudo systemctl status ntp-exporter
sudo systemctl status prometheus
sudo systemctl status alertmanager

Note: It may take a few minutes for chrony to fully synchronize after initial startup. Check chronyc tracking to see the current synchronization status.

Common issues

Symptom	Cause	Fix
NTP exporter fails to start	Cannot access chrony socket	`sudo usermod -a -G chrony prometheus` and restart service
No metrics in Prometheus	Incorrect scrape configuration	Verify targets in Prometheus UI and check exporter is running on port 9559
High time drift alerts	Network issues or bad NTP sources	Check `chronyc sources` and consider changing NTP pool servers
Chronyd not synchronizing	Firewall blocking NTP traffic	Allow UDP port 123: `sudo ufw allow 123/udp`
Dashboard shows no data	Grafana data source misconfigured	Verify Prometheus data source URL and test connection
Alertmanager not sending emails	SMTP configuration issues	Test SMTP settings and check Alertmanager logs: `journalctl -u alertmanager`

Next steps

Running this in production?

Want this handled for you? Setting this up once is straightforward. Keeping it patched, monitored, backed up and performant across environments is the harder part. See how we run infrastructure like this for European teams.

#ntp #chrony #grafana #prometheus #time-sync

Configure NTP monitoring with Grafana dashboards and Prometheus alerting