Monitor system time drift with Prometheus and Grafana alerts

Intermediate 45 min May 01, 2026 59 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Set up comprehensive time synchronization monitoring with Prometheus node exporter metrics, Grafana dashboards, and automated alerting to prevent system clock drift issues in production environments.

Prerequisites

  • Root access to target servers
  • Basic knowledge of Prometheus and Grafana
  • Understanding of NTP and time synchronization concepts
  • Network access to NTP servers (UDP port 123)

What this solves

System time drift can cause authentication failures, log correlation issues, and database consistency problems in distributed systems. This tutorial shows you how to monitor time synchronization health across your infrastructure using Prometheus metrics and Grafana alerts, with automatic notifications when clocks drift beyond acceptable thresholds.

Step-by-step configuration

Install and configure Prometheus node exporter

Node exporter provides time-related metrics including clock offset and NTP synchronization status. Install it first to start collecting time metrics.

wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz
sudo cp node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/
sudo useradd --no-create-home --shell /bin/false node_exporter
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz
sudo cp node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/
sudo useradd --no-create-home --shell /bin/false node_exporter
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter

Create systemd service for node exporter

Configure node exporter to run as a system service with time collector enabled. This ensures continuous collection of time synchronization metrics.

[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter --collector.systemd --collector.ntp --collector.time
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

Enable and start node exporter

Start the service and verify it's exposing time metrics on port 9100.

sudo systemctl daemon-reload
sudo systemctl enable --now node_exporter
sudo systemctl status node_exporter

Install and configure chrony for NTP

Install chrony to provide accurate time synchronization and enable detailed time metrics collection.

sudo apt update
sudo apt install -y chrony
sudo dnf install -y chrony

Configure chrony with monitoring settings

Enable statistics and detailed logging for better time drift monitoring and troubleshooting.

# Public NTP servers
pool 2.pool.ntp.org iburst
pool 1.pool.ntp.org iburst
pool 0.pool.ntp.org iburst

Record statistics

driftfile /var/lib/chrony/chrony.drift dumpdir /var/lib/chrony logdir /var/log/chrony log statistics measurements tracking

Maximum allowed offset

maxupdateskew 100.0

Enable command port for monitoring

cmdport 323 cmdallow 127.0.0.1

Step clock if offset is larger than 1 second

makestep 1.0 3

Enable RTC synchronization

rtcsync

Start chrony service

Enable and start chrony to begin time synchronization.

sudo systemctl enable --now chrony
sudo systemctl status chrony

Configure Prometheus to scrape time metrics

Add the node exporter target to your Prometheus configuration to collect time-related metrics.

global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "time_drift_rules.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - localhost:9093

scrape_configs:
  - job_name: 'node-exporter'
    static_configs:
      - targets: ['localhost:9100']
    scrape_interval: 10s
    metrics_path: /metrics

Create Prometheus alerting rules for time drift

Define alert rules that trigger when system clocks drift beyond acceptable thresholds or NTP synchronization fails.

groups:
  - name: time_drift_alerts
    rules:
    - alert: ClockDriftHigh
      expr: abs(node_timex_offset_seconds) > 0.05
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: "System clock drift detected on {{ $labels.instance }}"
        description: "Clock offset is {{ $value }}s, exceeding 50ms threshold"
    
    - alert: ClockDriftCritical
      expr: abs(node_timex_offset_seconds) > 0.5
      for: 1m
      labels:
        severity: critical
      annotations:
        summary: "Critical clock drift on {{ $labels.instance }}"
        description: "Clock offset is {{ $value }}s, exceeding 500ms threshold"
    
    - alert: NTPSyncLost
      expr: node_timex_sync_status != 1
      for: 3m
      labels:
        severity: critical
      annotations:
        summary: "NTP synchronization lost on {{ $labels.instance }}"
        description: "System clock is not synchronized with NTP servers"
    
    - alert: TimeServerUnreachable
      expr: node_ntp_stratum == 16
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "NTP servers unreachable on {{ $labels.instance }}"
        description: "System cannot reach configured NTP servers"

Install Alertmanager for notifications

Set up Alertmanager to handle time drift alerts and send notifications via email or Slack.

wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz
tar xvfz alertmanager-0.26.0.linux-amd64.tar.gz
sudo cp alertmanager-0.26.0.linux-amd64/alertmanager /usr/local/bin/
sudo cp alertmanager-0.26.0.linux-amd64/amtool /usr/local/bin/
sudo mkdir -p /etc/alertmanager /var/lib/alertmanager
sudo useradd --no-create-home --shell /bin/false alertmanager
sudo chown -R alertmanager:alertmanager /etc/alertmanager /var/lib/alertmanager
wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz
tar xvfz alertmanager-0.26.0.linux-amd64.tar.gz
sudo cp alertmanager-0.26.0.linux-amd64/alertmanager /usr/local/bin/
sudo cp alertmanager-0.26.0.linux-amd64/amtool /usr/local/bin/
sudo mkdir -p /etc/alertmanager /var/lib/alertmanager
sudo useradd --no-create-home --shell /bin/false alertmanager
sudo chown -R alertmanager:alertmanager /etc/alertmanager /var/lib/alertmanager

Configure Alertmanager for time drift notifications

Set up notification channels and routing for time drift alerts with appropriate escalation.

global:
  smtp_smarthost: 'localhost:587'
  smtp_from: 'alerts@example.com'

route:
  group_by: ['alertname', 'cluster', 'service']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'time-drift-alerts'
  routes:
  - match:
      severity: critical
    receiver: 'critical-alerts'
    repeat_interval: 15m

receivers:
  • name: 'time-drift-alerts'
email_configs: - to: 'ops-team@example.com' subject: 'Time Drift Alert: {{ .GroupLabels.alertname }}' body: | {{ range .Alerts }} Alert: {{ .Annotations.summary }} Description: {{ .Annotations.description }} Instance: {{ .Labels.instance }} Severity: {{ .Labels.severity }} {{ end }}
  • name: 'critical-alerts'
email_configs: - to: 'critical-ops@example.com' subject: 'CRITICAL: Time Drift Alert' body: | {{ range .Alerts }} CRITICAL TIME DRIFT DETECTED Alert: {{ .Annotations.summary }} Description: {{ .Annotations.description }} Instance: {{ .Labels.instance }} {{ end }} slack_configs: - api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK' channel: '#alerts' title: 'Critical Time Drift Alert' text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'

Create Alertmanager systemd service

Configure Alertmanager to run as a system service for reliable alert handling.

[Unit]
Description=Alertmanager
Wants=network-online.target
After=network-online.target

[Service]
User=alertmanager
Group=alertmanager
Type=simple
WorkingDirectory=/etc/alertmanager
ExecStart=/usr/local/bin/alertmanager --config.file=/etc/alertmanager/alertmanager.yml --storage.path=/var/lib/alertmanager
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

Create Grafana dashboard for time drift visualization

Import a comprehensive dashboard to visualize time synchronization metrics and trends.

{
  "dashboard": {
    "id": null,
    "title": "System Time Drift Monitoring",
    "tags": ["time", "ntp", "monitoring"],
    "timezone": "browser",
    "panels": [
      {
        "id": 1,
        "title": "Clock Offset",
        "type": "stat",
        "targets": [
          {
            "expr": "node_timex_offset_seconds * 1000",
            "legendFormat": "Offset (ms)"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "thresholds": {
              "steps": [
                {"color": "green", "value": null},
                {"color": "yellow", "value": 50},
                {"color": "red", "value": 500}
              ]
            }
          }
        },
        "gridPos": {"h": 8, "w": 12, "x": 0, "y": 0}
      },
      {
        "id": 2,
        "title": "NTP Synchronization Status",
        "type": "stat",
        "targets": [
          {
            "expr": "node_timex_sync_status",
            "legendFormat": "Sync Status"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "mappings": [
              {"options": {"0": {"text": "Not Synced", "color": "red"}}},
              {"options": {"1": {"text": "Synced", "color": "green"}}}
            ]
          }
        },
        "gridPos": {"h": 8, "w": 12, "x": 12, "y": 0}
      },
      {
        "id": 3,
        "title": "Clock Offset Over Time",
        "type": "graph",
        "targets": [
          {
            "expr": "node_timex_offset_seconds * 1000",
            "legendFormat": "Clock Offset (ms)"
          }
        ],
        "yAxes": [
          {"label": "Milliseconds"},
          {"show": false}
        ],
        "gridPos": {"h": 9, "w": 24, "x": 0, "y": 8}
      }
    ],
    "time": {
      "from": "now-1h",
      "to": "now"
    },
    "refresh": "30s"
  }
}

Import dashboard into Grafana

Use the Grafana API to import the time drift monitoring dashboard.

curl -X POST \
  http://admin:admin@localhost:3000/api/dashboards/db \
  -H 'Content-Type: application/json' \
  -d @/tmp/time_drift_dashboard.json

Start all services

Enable and start all monitoring services to begin time drift detection.

sudo systemctl enable --now prometheus
sudo systemctl enable --now alertmanager
sudo systemctl enable --now grafana-server

Configure alert escalation policies

Set up escalation rules for persistent time drift issues that require immediate attention.

route:
  receiver: 'default'
  routes:
  - match:
      alertname: ClockDriftCritical
    receiver: 'critical-escalation'
    continue: true
    routes:
    - match:
        severity: critical
      receiver: 'pager-duty'
      repeat_interval: 5m
      group_wait: 0s

receivers:
  • name: 'critical-escalation'
webhook_configs: - url: 'https://api.pagerduty.com/integration/YOUR-KEY/enqueue' send_resolved: true

Verify your setup

Check that all components are running and collecting time metrics properly.

# Verify node exporter is exposing time metrics
curl -s localhost:9100/metrics | grep -E "(timex_offset|timex_sync)"

Check chrony synchronization status

chronyc tracking chronyc sources -v

Verify Prometheus is scraping metrics

curl -s "localhost:9090/api/v1/query?query=node_timex_offset_seconds"

Test alert rules

curl -s "localhost:9090/api/v1/rules" | jq '.data.groups[].rules[].name'

Check Alertmanager status

curl -s localhost:9093/api/v1/status | jq

Verify Grafana dashboard

curl -s -u admin:admin "localhost:3000/api/dashboards/uid/time-drift"
Note: Time drift monitoring requires at least 5-10 minutes of data collection before accurate trends appear in Grafana dashboards.

Common issues

SymptomCauseFix
No time metrics in Prometheus Node exporter not running or misconfigured sudo systemctl restart node_exporter and check --collector.ntp flag
Clock drift alerts not firing Alert rules not loaded or thresholds too high Verify rules with promtool check rules time_drift_rules.yml
NTP sync status shows 0 Chrony not synchronizing with time servers Check firewall rules for UDP 123 and verify NTP pool connectivity
Alertmanager not sending emails SMTP configuration incorrect Test with amtool config check and verify SMTP settings
Grafana dashboard shows no data Data source not configured or wrong query Verify Prometheus data source URL and test queries manually
High clock drift on VM Hypervisor time synchronization disabled Enable VMware Tools time sync or Hyper-V time integration services

Advanced configuration

Fine-tune your time monitoring setup for different environments and use cases. You can configure multiple NTP sources, set custom drift thresholds based on your application requirements, and integrate with existing monitoring systems. For high-precision applications, consider using hardware time sources and implementing stepped time correction policies. The monitoring system can also be extended to track time server performance and automatically switch between time sources during outages.

Next steps

Running this in production?

Want this handled for you? Setting up monitoring once is straightforward. Keeping it patched, monitored, backed up and tuned across environments is the harder part. See how we run infrastructure like this for European SaaS and e-commerce teams.

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle managed devops services for businesses that depend on uptime. From initial setup to ongoing operations.