Set up comprehensive time synchronization monitoring with Prometheus node exporter metrics, Grafana dashboards, and automated alerting to prevent system clock drift issues in production environments.
Prerequisites
- Root access to target servers
- Basic knowledge of Prometheus and Grafana
- Understanding of NTP and time synchronization concepts
- Network access to NTP servers (UDP port 123)
What this solves
System time drift can cause authentication failures, log correlation issues, and database consistency problems in distributed systems. This tutorial shows you how to monitor time synchronization health across your infrastructure using Prometheus metrics and Grafana alerts, with automatic notifications when clocks drift beyond acceptable thresholds.
Step-by-step configuration
Install and configure Prometheus node exporter
Node exporter provides time-related metrics including clock offset and NTP synchronization status. Install it first to start collecting time metrics.
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz
sudo cp node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/
sudo useradd --no-create-home --shell /bin/false node_exporter
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter
Create systemd service for node exporter
Configure node exporter to run as a system service with time collector enabled. This ensures continuous collection of time synchronization metrics.
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target
[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter --collector.systemd --collector.ntp --collector.time
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
Enable and start node exporter
Start the service and verify it's exposing time metrics on port 9100.
sudo systemctl daemon-reload
sudo systemctl enable --now node_exporter
sudo systemctl status node_exporter
Install and configure chrony for NTP
Install chrony to provide accurate time synchronization and enable detailed time metrics collection.
sudo apt update
sudo apt install -y chrony
Configure chrony with monitoring settings
Enable statistics and detailed logging for better time drift monitoring and troubleshooting.
# Public NTP servers
pool 2.pool.ntp.org iburst
pool 1.pool.ntp.org iburst
pool 0.pool.ntp.org iburst
Record statistics
driftfile /var/lib/chrony/chrony.drift
dumpdir /var/lib/chrony
logdir /var/log/chrony
log statistics measurements tracking
Maximum allowed offset
maxupdateskew 100.0
Enable command port for monitoring
cmdport 323
cmdallow 127.0.0.1
Step clock if offset is larger than 1 second
makestep 1.0 3
Enable RTC synchronization
rtcsync
Start chrony service
Enable and start chrony to begin time synchronization.
sudo systemctl enable --now chrony
sudo systemctl status chrony
Configure Prometheus to scrape time metrics
Add the node exporter target to your Prometheus configuration to collect time-related metrics.
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "time_drift_rules.yml"
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
scrape_configs:
- job_name: 'node-exporter'
static_configs:
- targets: ['localhost:9100']
scrape_interval: 10s
metrics_path: /metrics
Create Prometheus alerting rules for time drift
Define alert rules that trigger when system clocks drift beyond acceptable thresholds or NTP synchronization fails.
groups:
- name: time_drift_alerts
rules:
- alert: ClockDriftHigh
expr: abs(node_timex_offset_seconds) > 0.05
for: 2m
labels:
severity: warning
annotations:
summary: "System clock drift detected on {{ $labels.instance }}"
description: "Clock offset is {{ $value }}s, exceeding 50ms threshold"
- alert: ClockDriftCritical
expr: abs(node_timex_offset_seconds) > 0.5
for: 1m
labels:
severity: critical
annotations:
summary: "Critical clock drift on {{ $labels.instance }}"
description: "Clock offset is {{ $value }}s, exceeding 500ms threshold"
- alert: NTPSyncLost
expr: node_timex_sync_status != 1
for: 3m
labels:
severity: critical
annotations:
summary: "NTP synchronization lost on {{ $labels.instance }}"
description: "System clock is not synchronized with NTP servers"
- alert: TimeServerUnreachable
expr: node_ntp_stratum == 16
for: 5m
labels:
severity: warning
annotations:
summary: "NTP servers unreachable on {{ $labels.instance }}"
description: "System cannot reach configured NTP servers"
Install Alertmanager for notifications
Set up Alertmanager to handle time drift alerts and send notifications via email or Slack.
wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz
tar xvfz alertmanager-0.26.0.linux-amd64.tar.gz
sudo cp alertmanager-0.26.0.linux-amd64/alertmanager /usr/local/bin/
sudo cp alertmanager-0.26.0.linux-amd64/amtool /usr/local/bin/
sudo mkdir -p /etc/alertmanager /var/lib/alertmanager
sudo useradd --no-create-home --shell /bin/false alertmanager
sudo chown -R alertmanager:alertmanager /etc/alertmanager /var/lib/alertmanager
Configure Alertmanager for time drift notifications
Set up notification channels and routing for time drift alerts with appropriate escalation.
global:
smtp_smarthost: 'localhost:587'
smtp_from: 'alerts@example.com'
route:
group_by: ['alertname', 'cluster', 'service']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'time-drift-alerts'
routes:
- match:
severity: critical
receiver: 'critical-alerts'
repeat_interval: 15m
receivers:
- name: 'time-drift-alerts'
email_configs:
- to: 'ops-team@example.com'
subject: 'Time Drift Alert: {{ .GroupLabels.alertname }}'
body: |
{{ range .Alerts }}
Alert: {{ .Annotations.summary }}
Description: {{ .Annotations.description }}
Instance: {{ .Labels.instance }}
Severity: {{ .Labels.severity }}
{{ end }}
- name: 'critical-alerts'
email_configs:
- to: 'critical-ops@example.com'
subject: 'CRITICAL: Time Drift Alert'
body: |
{{ range .Alerts }}
CRITICAL TIME DRIFT DETECTED
Alert: {{ .Annotations.summary }}
Description: {{ .Annotations.description }}
Instance: {{ .Labels.instance }}
{{ end }}
slack_configs:
- api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
channel: '#alerts'
title: 'Critical Time Drift Alert'
text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'
Create Alertmanager systemd service
Configure Alertmanager to run as a system service for reliable alert handling.
[Unit]
Description=Alertmanager
Wants=network-online.target
After=network-online.target
[Service]
User=alertmanager
Group=alertmanager
Type=simple
WorkingDirectory=/etc/alertmanager
ExecStart=/usr/local/bin/alertmanager --config.file=/etc/alertmanager/alertmanager.yml --storage.path=/var/lib/alertmanager
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
Create Grafana dashboard for time drift visualization
Import a comprehensive dashboard to visualize time synchronization metrics and trends.
{
"dashboard": {
"id": null,
"title": "System Time Drift Monitoring",
"tags": ["time", "ntp", "monitoring"],
"timezone": "browser",
"panels": [
{
"id": 1,
"title": "Clock Offset",
"type": "stat",
"targets": [
{
"expr": "node_timex_offset_seconds * 1000",
"legendFormat": "Offset (ms)"
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 50},
{"color": "red", "value": 500}
]
}
}
},
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 0}
},
{
"id": 2,
"title": "NTP Synchronization Status",
"type": "stat",
"targets": [
{
"expr": "node_timex_sync_status",
"legendFormat": "Sync Status"
}
],
"fieldConfig": {
"defaults": {
"mappings": [
{"options": {"0": {"text": "Not Synced", "color": "red"}}},
{"options": {"1": {"text": "Synced", "color": "green"}}}
]
}
},
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 0}
},
{
"id": 3,
"title": "Clock Offset Over Time",
"type": "graph",
"targets": [
{
"expr": "node_timex_offset_seconds * 1000",
"legendFormat": "Clock Offset (ms)"
}
],
"yAxes": [
{"label": "Milliseconds"},
{"show": false}
],
"gridPos": {"h": 9, "w": 24, "x": 0, "y": 8}
}
],
"time": {
"from": "now-1h",
"to": "now"
},
"refresh": "30s"
}
}
Import dashboard into Grafana
Use the Grafana API to import the time drift monitoring dashboard.
curl -X POST \
http://admin:admin@localhost:3000/api/dashboards/db \
-H 'Content-Type: application/json' \
-d @/tmp/time_drift_dashboard.json
Start all services
Enable and start all monitoring services to begin time drift detection.
sudo systemctl enable --now prometheus
sudo systemctl enable --now alertmanager
sudo systemctl enable --now grafana-server
Configure alert escalation policies
Set up escalation rules for persistent time drift issues that require immediate attention.
route:
receiver: 'default'
routes:
- match:
alertname: ClockDriftCritical
receiver: 'critical-escalation'
continue: true
routes:
- match:
severity: critical
receiver: 'pager-duty'
repeat_interval: 5m
group_wait: 0s
receivers:
- name: 'critical-escalation'
webhook_configs:
- url: 'https://api.pagerduty.com/integration/YOUR-KEY/enqueue'
send_resolved: true
Verify your setup
Check that all components are running and collecting time metrics properly.
# Verify node exporter is exposing time metrics
curl -s localhost:9100/metrics | grep -E "(timex_offset|timex_sync)"
Check chrony synchronization status
chronyc tracking
chronyc sources -v
Verify Prometheus is scraping metrics
curl -s "localhost:9090/api/v1/query?query=node_timex_offset_seconds"
Test alert rules
curl -s "localhost:9090/api/v1/rules" | jq '.data.groups[].rules[].name'
Check Alertmanager status
curl -s localhost:9093/api/v1/status | jq
Verify Grafana dashboard
curl -s -u admin:admin "localhost:3000/api/dashboards/uid/time-drift"
Common issues
| Symptom | Cause | Fix |
|---|---|---|
| No time metrics in Prometheus | Node exporter not running or misconfigured | sudo systemctl restart node_exporter and check --collector.ntp flag |
| Clock drift alerts not firing | Alert rules not loaded or thresholds too high | Verify rules with promtool check rules time_drift_rules.yml |
| NTP sync status shows 0 | Chrony not synchronizing with time servers | Check firewall rules for UDP 123 and verify NTP pool connectivity |
| Alertmanager not sending emails | SMTP configuration incorrect | Test with amtool config check and verify SMTP settings |
| Grafana dashboard shows no data | Data source not configured or wrong query | Verify Prometheus data source URL and test queries manually |
| High clock drift on VM | Hypervisor time synchronization disabled | Enable VMware Tools time sync or Hyper-V time integration services |
Advanced configuration
Fine-tune your time monitoring setup for different environments and use cases. You can configure multiple NTP sources, set custom drift thresholds based on your application requirements, and integrate with existing monitoring systems. For high-precision applications, consider using hardware time sources and implementing stepped time correction policies. The monitoring system can also be extended to track time server performance and automatically switch between time sources during outages.
Next steps
- Configure NGINX monitoring with Prometheus and Grafana dashboards for comprehensive web server monitoring
- Configure Prometheus Blackbox Exporter for endpoint monitoring to monitor service availability
- Configure backup monitoring with Prometheus and Grafana for infrastructure oversight
- Implement Grafana advanced alerting with webhooks for notification integration
- Configure Linux system time synchronization with chrony and NTP hardening for security best practices
Running this in production?
Automated install script
Run this to automate the entire setup
#!/usr/bin/env bash
set -euo pipefail
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'
# Configuration
NODE_EXPORTER_VERSION="1.7.0"
PROMETHEUS_PORT="9090"
NODE_EXPORTER_PORT="9100"
# Usage function
usage() {
echo "Usage: $0 [--prometheus-host HOSTNAME] [--no-prometheus]"
echo " --prometheus-host: Hostname/IP where Prometheus is running (default: localhost)"
echo " --no-prometheus: Skip Prometheus configuration steps"
exit 1
}
# Parse arguments
PROMETHEUS_HOST="localhost"
SKIP_PROMETHEUS=false
while [[ $# -gt 0 ]]; do
case $1 in
--prometheus-host)
PROMETHEUS_HOST="$2"
shift 2
;;
--no-prometheus)
SKIP_PROMETHEUS=true
shift
;;
-h|--help)
usage
;;
*)
echo -e "${RED}Unknown option: $1${NC}"
usage
;;
esac
done
# Cleanup function for rollback
cleanup() {
if [[ $? -ne 0 ]]; then
echo -e "${RED}Installation failed. Cleaning up...${NC}"
systemctl stop node_exporter 2>/dev/null || true
systemctl disable node_exporter 2>/dev/null || true
rm -f /etc/systemd/system/node_exporter.service
rm -f /usr/local/bin/node_exporter
userdel node_exporter 2>/dev/null || true
fi
}
trap cleanup ERR
# Check prerequisites
echo -e "${YELLOW}[1/8] Checking prerequisites...${NC}"
if [[ $EUID -ne 0 ]]; then
echo -e "${RED}This script must be run as root${NC}"
exit 1
fi
# Detect distribution
if [ -f /etc/os-release ]; then
. /etc/os-release
case "$ID" in
ubuntu|debian)
PKG_MGR="apt"
PKG_INSTALL="apt install -y"
PKG_UPDATE="apt update"
;;
almalinux|rocky|centos|rhel|ol|fedora)
PKG_MGR="dnf"
PKG_INSTALL="dnf install -y"
PKG_UPDATE="dnf check-update || true"
;;
amzn)
PKG_MGR="yum"
PKG_INSTALL="yum install -y"
PKG_UPDATE="yum check-update || true"
;;
*)
echo -e "${RED}Unsupported distribution: $ID${NC}"
exit 1
;;
esac
else
echo -e "${RED}Cannot detect distribution${NC}"
exit 1
fi
echo -e "${GREEN}Distribution detected: $PRETTY_NAME${NC}"
# Update package manager
echo -e "${YELLOW}[2/8] Updating package manager...${NC}"
$PKG_UPDATE
# Install required packages
echo -e "${YELLOW}[3/8] Installing required packages...${NC}"
$PKG_INSTALL wget tar chrony curl
# Download and install node exporter
echo -e "${YELLOW}[4/8] Installing Node Exporter...${NC}"
cd /tmp
wget -q "https://github.com/prometheus/node_exporter/releases/download/v${NODE_EXPORTER_VERSION}/node_exporter-${NODE_EXPORTER_VERSION}.linux-amd64.tar.gz"
tar xzf "node_exporter-${NODE_EXPORTER_VERSION}.linux-amd64.tar.gz"
cp "node_exporter-${NODE_EXPORTER_VERSION}.linux-amd64/node_exporter" /usr/local/bin/
chown root:root /usr/local/bin/node_exporter
chmod 755 /usr/local/bin/node_exporter
# Create node_exporter user
if ! id "node_exporter" &>/dev/null; then
useradd --no-create-home --shell /bin/false node_exporter
fi
# Create systemd service
echo -e "${YELLOW}[5/8] Creating systemd service...${NC}"
cat > /etc/systemd/system/node_exporter.service << 'EOF'
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target
[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter --collector.systemd --collector.ntp --collector.time
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
chmod 644 /etc/systemd/system/node_exporter.service
systemctl daemon-reload
systemctl enable node_exporter
systemctl start node_exporter
# Configure chrony
echo -e "${YELLOW}[6/8] Configuring chrony...${NC}"
cat > /etc/chrony.conf << 'EOF'
# Public NTP servers
pool 2.pool.ntp.org iburst
pool 1.pool.ntp.org iburst
pool 0.pool.ntp.org iburst
# Record statistics
driftfile /var/lib/chrony/chrony.drift
dumpdir /var/lib/chrony
logdir /var/log/chrony
log statistics measurements tracking
# Maximum allowed offset
maxupdateskew 100.0
# Enable command port for monitoring
cmdport 323
cmdallow 127.0.0.1
# Step clock if offset is larger than 1 second
makestep 1.0 3
# Enable RTC synchronization
rtcsync
EOF
chmod 644 /etc/chrony.conf
systemctl enable chronyd
systemctl restart chronyd
# Configure firewall
echo -e "${YELLOW}[7/8] Configuring firewall...${NC}"
if command -v firewall-cmd &> /dev/null && systemctl is-active firewalld &> /dev/null; then
firewall-cmd --permanent --add-port=${NODE_EXPORTER_PORT}/tcp
firewall-cmd --reload
elif command -v ufw &> /dev/null; then
ufw allow ${NODE_EXPORTER_PORT}/tcp
fi
# Create Prometheus configuration if not skipped
if [[ "$SKIP_PROMETHEUS" == false ]]; then
echo -e "${YELLOW}[8/8] Creating Prometheus configuration templates...${NC}"
# Create Prometheus config directory if it doesn't exist
mkdir -p /etc/prometheus
# Create time drift rules file
cat > /etc/prometheus/time_drift_rules.yml << EOF
groups:
- name: time_drift_alerts
rules:
- alert: ClockDriftHigh
expr: abs(node_timex_offset_seconds) > 0.05
for: 2m
labels:
severity: warning
annotations:
summary: "System clock drift detected on {{ \$labels.instance }}"
description: "Clock drift is {{ \$value }} seconds on {{ \$labels.instance }}"
- alert: NTPSyncLost
expr: node_timex_sync_status == 0
for: 5m
labels:
severity: critical
annotations:
summary: "NTP synchronization lost on {{ \$labels.instance }}"
description: "System is not synchronized with NTP on {{ \$labels.instance }}"
- alert: ClockSkewHigh
expr: node_timex_estimated_error_seconds > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "High clock skew detected on {{ \$labels.instance }}"
description: "Clock skew is {{ \$value }} seconds on {{ \$labels.instance }}"
EOF
# Create sample Prometheus config
cat > /etc/prometheus/prometheus_time_monitoring.yml << EOF
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "time_drift_rules.yml"
alerting:
alertmanagers:
- static_configs:
- targets:
- ${PROMETHEUS_HOST}:9093
scrape_configs:
- job_name: 'node-exporter'
static_configs:
- targets: ['${PROMETHEUS_HOST}:${NODE_EXPORTER_PORT}']
scrape_interval: 10s
metrics_path: /metrics
EOF
chmod 644 /etc/prometheus/time_drift_rules.yml
chmod 644 /etc/prometheus/prometheus_time_monitoring.yml
chown -R root:root /etc/prometheus
echo -e "${GREEN}Prometheus configuration files created in /etc/prometheus/${NC}"
else
echo -e "${YELLOW}[8/8] Skipping Prometheus configuration as requested${NC}"
fi
# Verification
echo -e "${YELLOW}Verifying installation...${NC}"
# Check node_exporter service
if systemctl is-active node_exporter &> /dev/null; then
echo -e "${GREEN}✓ Node Exporter is running${NC}"
else
echo -e "${RED}✗ Node Exporter failed to start${NC}"
exit 1
fi
# Check chrony service
if systemctl is-active chronyd &> /dev/null; then
echo -e "${GREEN}✓ Chrony is running${NC}"
else
echo -e "${RED}✗ Chrony failed to start${NC}"
exit 1
fi
# Check metrics endpoint
if curl -s "http://localhost:${NODE_EXPORTER_PORT}/metrics" | grep -q "node_timex_offset_seconds"; then
echo -e "${GREEN}✓ Time metrics are available${NC}"
else
echo -e "${RED}✗ Time metrics not found${NC}"
exit 1
fi
# Check chrony synchronization
sleep 5
if chrony sources &> /dev/null; then
echo -e "${GREEN}✓ NTP sources are configured${NC}"
else
echo -e "${YELLOW}⚠ NTP synchronization may take time to establish${NC}"
fi
echo -e "\n${GREEN}Installation completed successfully!${NC}"
echo -e "Node Exporter is running on port ${NODE_EXPORTER_PORT}"
echo -e "Time metrics endpoint: http://localhost:${NODE_EXPORTER_PORT}/metrics"
if [[ "$SKIP_PROMETHEUS" == false ]]; then
echo -e "Prometheus configuration templates created in /etc/prometheus/"
fi
echo -e "\nKey metrics to monitor:"
echo -e " - node_timex_offset_seconds (clock offset)"
echo -e " - node_timex_sync_status (NTP sync status)"
echo -e " - node_timex_estimated_error_seconds (clock accuracy)"
rm -f "/tmp/node_exporter-${NODE_EXPORTER_VERSION}.linux-amd64.tar.gz"
rm -rf "/tmp/node_exporter-${NODE_EXPORTER_VERSION}.linux-amd64"
Review the script before running. Execute with: bash install.sh