Monitor Cron Jobs & Systemd Timers with Prometheus

Set up comprehensive monitoring for scheduled tasks using Prometheus node_exporter and custom metrics collection. Configure Grafana dashboards and alerting rules to track job success, failures, and missed executions across your infrastructure.

Prerequisites

Root or sudo access
Basic familiarity with cron and systemd
Prometheus and Grafana knowledge helpful

What this solves

Scheduled tasks like cron jobs and systemd timers are critical for system maintenance, backups, and automated workflows. When they fail silently, you might not notice until data is lost or systems break. This tutorial shows you how to monitor both cron jobs and systemd timers using Prometheus metrics collection and Grafana alerting, giving you visibility into job execution status, runtime duration, and failure patterns.

Step-by-step installation

Install Prometheus node_exporter

Node_exporter provides the foundation for collecting system metrics including systemd service status. Download and install the latest version.

sudo apt update
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz
sudo cp node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/
sudo chown root:root /usr/local/bin/node_exporter
sudo chmod 755 /usr/local/bin/node_exporter

sudo dnf update -y
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz
sudo cp node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/
sudo chown root:root /usr/local/bin/node_exporter
sudo chmod 755 /usr/local/bin/node_exporter

Create node_exporter service user

Create a dedicated system user for running node_exporter securely without shell access.

sudo useradd --no-create-home --shell /bin/false node_exporter

Configure node_exporter systemd service

Create a systemd service file that enables systemd collector and textfile collector for custom metrics.

[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter --collector.systemd --collector.textfile.directory=/var/lib/node_exporter/textfile_collector
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

Create textfile collector directory

Set up the directory where custom job metrics will be written. The node_exporter user needs read access to collect these files.

sudo mkdir -p /var/lib/node_exporter/textfile_collector
sudo chown node_exporter:node_exporter /var/lib/node_exporter/textfile_collector
sudo chmod 755 /var/lib/node_exporter/textfile_collector

Start and enable node_exporter

Start the node_exporter service and enable it to run on boot.

sudo systemctl daemon-reload
sudo systemctl enable --now node_exporter
sudo systemctl status node_exporter

Create job monitoring script

Create a helper script that your cron jobs and systemd timers can use to report their execution status to Prometheus.

#!/bin/bash

JOB_NAME="$1"
JOB_STATUS="$2"  # success, failed, or running
JOB_DURATION="$3"  # optional duration in seconds
METRIC_FILE="/var/lib/node_exporter/textfile_collector/${JOB_NAME}.prom"
TIMESTAMP=$(date +%s)

if [ -z "$JOB_NAME" ] || [ -z "$JOB_STATUS" ]; then
    echo "Usage: $0   [duration_seconds]"
    exit 1
fi

Create temporary file
TEMP_FILE="$(mktemp)"

Write job status metric
echo "# HELP job_last_status Last execution status of scheduled job (1=success, 0=failed)" >> "$TEMP_FILE"
echo "# TYPE job_last_status gauge" >> "$TEMP_FILE"
if [ "$JOB_STATUS" = "success" ]; then
    echo "job_last_status{job=\"$JOB_NAME\"} 1" >> "$TEMP_FILE"
else
    echo "job_last_status{job=\"$JOB_NAME\"} 0" >> "$TEMP_FILE"
fi

Write timestamp metric
echo "# HELP job_last_run_timestamp Unix timestamp of last job execution" >> "$TEMP_FILE"
echo "# TYPE job_last_run_timestamp gauge" >> "$TEMP_FILE"
echo "job_last_run_timestamp{job=\"$JOB_NAME\"} $TIMESTAMP" >> "$TEMP_FILE"

Write duration metric if provided
if [ -n "$JOB_DURATION" ]; then
    echo "# HELP job_duration_seconds Duration of last job execution in seconds" >> "$TEMP_FILE"
    echo "# TYPE job_duration_seconds gauge" >> "$TEMP_FILE"
    echo "job_duration_seconds{job=\"$JOB_NAME\"} $JOB_DURATION" >> "$TEMP_FILE"
fi

Atomically move to final location
sudo mv "$TEMP_FILE" "$METRIC_FILE"
sudo chown node_exporter:node_exporter "$METRIC_FILE"
sudo chmod 644 "$METRIC_FILE"

Make job monitoring script executable

Set proper permissions on the monitoring script so it can be executed by cron jobs and systemd services.

sudo chmod 755 /usr/local/bin/job_monitor

Create wrapper script for cron jobs

Create a wrapper script that measures execution time and reports job status automatically.

#!/bin/bash

JOB_NAME="$1"
shift
COMMAND="$*"

if [ -z "$JOB_NAME" ] || [ -z "$COMMAND" ]; then
    echo "Usage: $0  "
    exit 1
fi

Record start time
START_TIME=$(date +%s)

Mark job as running
/usr/local/bin/job_monitor "$JOB_NAME" "running"

Execute the command and capture exit code
eval "$COMMAND"
EXIT_CODE=$?

Calculate duration
END_TIME=$(date +%s)
DURATION=$((END_TIME - START_TIME))

Report final status
if [ $EXIT_CODE -eq 0 ]; then
    /usr/local/bin/job_monitor "$JOB_NAME" "success" "$DURATION"
else
    /usr/local/bin/job_monitor "$JOB_NAME" "failed" "$DURATION"
fi

exit $EXIT_CODE

Make cron wrapper executable

Set execute permissions on the cron wrapper script.

sudo chmod 755 /usr/local/bin/cron_wrapper

Create example monitored cron job

Add a sample cron job that demonstrates the monitoring setup. Replace with your actual backup or maintenance tasks.

crontab -e

Add this line to monitor a daily backup job:

# Daily backup with monitoring
0 2   * /usr/local/bin/cron_wrapper "daily_backup" "/usr/local/bin/backup_script.sh"

Create monitored systemd timer service

Create a systemd service that uses the monitoring script. This example monitors log cleanup.

[Unit]
Description=Clean old log files

[Service]
Type=oneshot
ExecStartPre=/usr/local/bin/job_monitor "log_cleanup" "running"
ExecStart=/bin/bash -c 'find /var/log -name "*.log.gz" -mtime +30 -delete'
ExecStartPost=/bin/bash -c 'if [ $EXIT_STATUS -eq 0 ]; then /usr/local/bin/job_monitor "log_cleanup" "success"; else /usr/local/bin/job_monitor "log_cleanup" "failed"; fi'
User=root

Create systemd timer for log cleanup

Create the timer that schedules the log cleanup service to run weekly.

[Unit]
Description=Run log cleanup weekly
Requires=log-cleanup.service

[Timer]
OnCalendar=weekly
Persistent=true

[Install]
WantedBy=timers.target

Enable systemd timer

Enable and start the systemd timer so it runs according to schedule.

sudo systemctl daemon-reload
sudo systemctl enable --now log-cleanup.timer
sudo systemctl list-timers log-cleanup.timer

Configure Prometheus server

Add your monitored servers to Prometheus configuration. This assumes you have Prometheus installed and running.

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'node-jobs'
    static_configs:
      - targets: ['localhost:9100', '203.0.113.10:9100']
    scrape_interval: 30s
    metrics_path: /metrics

Create Prometheus alerting rules

Define alerts for failed jobs, missing jobs, and long-running tasks.

groups:
  - name: job_monitoring
    rules:
      - alert: JobFailed
        expr: job_last_status == 0
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "Job {{ $labels.job }} failed"
          description: "Job {{ $labels.job }} on {{ $labels.instance }} has failed"
      
      - alert: JobMissing
        expr: (time() - job_last_run_timestamp) > 86400
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Job {{ $labels.job }} hasn't run in 24 hours"
          description: "Job {{ $labels.job }} on {{ $labels.instance }} last ran {{ $value | humanizeDuration }} ago"
      
      - alert: JobRunningTooLong
        expr: job_duration_seconds > 3600
        for: 0m
        labels:
          severity: warning
        annotations:
          summary: "Job {{ $labels.job }} running too long"
          description: "Job {{ $labels.job }} on {{ $labels.instance }} took {{ $value | humanizeDuration }} to complete"

Install and configure Grafana

Install Grafana for visualizing job metrics and creating dashboards.

sudo apt install -y software-properties-common
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt update
sudo apt install -y grafana

sudo dnf install -y grafana
sudo systemctl daemon-reload

Start Grafana service

Enable and start Grafana to access the web interface on port 3000.

sudo systemctl enable --now grafana-server
sudo systemctl status grafana-server

Configure Grafana dashboard

Create a dashboard JSON that visualizes job execution status, success rates, and execution times. Save this as a dashboard import.

{
  "dashboard": {
    "id": null,
    "title": "Job Monitoring",
    "tags": ["jobs", "monitoring"],
    "timezone": "browser",
    "panels": [
      {
        "title": "Job Success Rate",
        "type": "stat",
        "targets": [
          {
            "expr": "avg(job_last_status) * 100",
            "legendFormat": "Success Rate %"
          }
        ],
        "gridPos": {"h": 4, "w": 6, "x": 0, "y": 0}
      },
      {
        "title": "Job Execution Times",
        "type": "graph",
        "targets": [
          {
            "expr": "job_duration_seconds",
            "legendFormat": "{{ job }}"
          }
        ],
        "gridPos": {"h": 8, "w": 12, "x": 0, "y": 4}
      },
      {
        "title": "Failed Jobs",
        "type": "table",
        "targets": [
          {
            "expr": "job_last_status == 0",
            "format": "table"
          }
        ],
        "gridPos": {"h": 8, "w": 12, "x": 12, "y": 4}
      }
    ],
    "time": {
      "from": "now-24h",
      "to": "now"
    },
    "refresh": "30s"
  }
}

Verify your setup

Check that all components are running and collecting metrics properly.

# Verify node_exporter is running
sudo systemctl status node_exporter

Check metrics are being collected
curl -s http://localhost:9100/metrics | grep job_

Verify systemd timer is active
sudo systemctl list-timers log-cleanup.timer

Test job monitoring manually
/usr/local/bin/job_monitor "test_job" "success" "30"
cat /var/lib/node_exporter/textfile_collector/test_job.prom

Check Grafana is accessible
curl -I http://localhost:3000

Configure alerting

Set up Alertmanager

Install and configure Alertmanager to handle alert notifications from Prometheus rules.

wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz
tar xvfz alertmanager-0.26.0.linux-amd64.tar.gz
sudo cp alertmanager-0.26.0.linux-amd64/alertmanager /usr/local/bin/
sudo chmod 755 /usr/local/bin/alertmanager

wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz
tar xvfz alertmanager-0.26.0.linux-amd64.tar.gz
sudo cp alertmanager-0.26.0.linux-amd64/alertmanager /usr/local/bin/
sudo chmod 755 /usr/local/bin/alertmanager

Configure email notifications

Set up Alertmanager to send email notifications when jobs fail or go missing.

global:
  smtp_smarthost: 'smtp.example.com:587'
  smtp_from: 'alerts@example.com'
  smtp_auth_username: 'alerts@example.com'
  smtp_auth_password: 'your-smtp-password'

route:
  group_by: ['alertname', 'job']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  receiver: 'job-alerts'

receivers:
  - name: 'job-alerts'
    email_configs:
      - to: 'sysadmin@example.com'
        subject: 'Job Alert: {{ .GroupLabels.alertname }}'
        body: |
          {{ range .Alerts }}
          Alert: {{ .Annotations.summary }}
          Description: {{ .Annotations.description }}
          Job: {{ .Labels.job }}
          Instance: {{ .Labels.instance }}
          {{ end }}

Common issues

Symptom	Cause	Fix
Metrics not appearing	Node_exporter can't read textfile directory	Check permissions: `sudo chown -R node_exporter:node_exporter /var/lib/node_exporter`
Job status always shows 0	Monitoring script failing silently	Test manually: `/usr/local/bin/job_monitor "test" "success"`
Systemd timer not firing	Timer not enabled or service has errors	`sudo systemctl enable log-cleanup.timer` and check `journalctl -u log-cleanup.service`
Alerts not sending	Alertmanager configuration or SMTP issues	Check Alertmanager logs: `journalctl -u alertmanager`
Permission denied on metric files	Wrong ownership on textfile collector directory	`sudo chown node_exporter:node_exporter /var/lib/node_exporter/textfile_collector/*.prom`

Never use chmod 777. It gives every user on the system full access to your files. Instead, fix ownership with chown and use minimal permissions like 644 for metric files and 755 for directories.

Next steps

Configure Prometheus long-term storage with Thanos for historical job data retention
Set up Prometheus and Grafana monitoring stack with Docker compose for containerized deployments
Configure advanced Grafana dashboards and alerting with custom visualizations
Monitor system backup jobs with Prometheus alerts for critical data protection tasks
Implement Prometheus multi-cluster federation for monitoring jobs across multiple servers

Running this in production?

Want this handled for you? Setting this up once is straightforward. Keeping it patched, monitored, backed up and performant across environments is the harder part. See how we run infrastructure like this for European teams.

Automated install script

Run this to automate the entire setup

install.sh

#!/usr/bin/env bash

set -euo pipefail

# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color

# Configuration
NODE_EXPORTER_VERSION="1.7.0"
NODE_EXPORTER_USER="node_exporter"
TEXTFILE_DIR="/var/lib/node_exporter/textfile_collector"
MONITOR_SCRIPT_PATH="/usr/local/bin/monitor-job"

# Cleanup function
cleanup() {
    echo -e "${RED}[ERROR] Installation failed. Cleaning up...${NC}"
    systemctl stop node_exporter 2>/dev/null || true
    systemctl disable node_exporter 2>/dev/null || true
    rm -f /etc/systemd/system/node_exporter.service
    rm -f /usr/local/bin/node_exporter
    rm -f "$MONITOR_SCRIPT_PATH"
    rm -rf /var/lib/node_exporter
    userdel "$NODE_EXPORTER_USER" 2>/dev/null || true
    systemctl daemon-reload
}

trap cleanup ERR

# Check if running as root or with sudo
check_privileges() {
    if [[ $EUID -ne 0 ]]; then
        echo -e "${RED}This script must be run as root or with sudo${NC}"
        exit 1
    fi
}

# Auto-detect distribution
detect_distro() {
    if [ -f /etc/os-release ]; then
        . /etc/os-release
        case "$ID" in
            ubuntu|debian) 
                PKG_MGR="apt"
                PKG_UPDATE="apt update"
                PKG_INSTALL="apt install -y"
                ;;
            almalinux|rocky|centos|rhel|ol|fedora) 
                PKG_MGR="dnf"
                PKG_UPDATE="dnf makecache"
                PKG_INSTALL="dnf install -y"
                # Try yum if dnf is not available
                if ! command -v dnf &> /dev/null; then
                    PKG_MGR="yum"
                    PKG_UPDATE="yum makecache"
                    PKG_INSTALL="yum install -y"
                fi
                ;;
            amzn) 
                PKG_MGR="yum"
                PKG_UPDATE="yum makecache"
                PKG_INSTALL="yum install -y"
                ;;
            *) 
                echo -e "${RED}Unsupported distribution: $ID${NC}"
                exit 1
                ;;
        esac
        echo -e "${BLUE}Detected distribution: $PRETTY_NAME${NC}"
    else
        echo -e "${RED}Cannot detect distribution. /etc/os-release not found.${NC}"
        exit 1
    fi
}

# Update package repositories
update_packages() {
    echo -e "${BLUE}[1/8] Updating package repositories...${NC}"
    $PKG_UPDATE
}

# Install required packages
install_dependencies() {
    echo -e "${BLUE}[2/8] Installing dependencies...${NC}"
    $PKG_INSTALL wget tar
}

# Download and install node_exporter
install_node_exporter() {
    echo -e "${BLUE}[3/8] Downloading and installing node_exporter...${NC}"
    
    cd /tmp
    wget -q "https://github.com/prometheus/node_exporter/releases/download/v${NODE_EXPORTER_VERSION}/node_exporter-${NODE_EXPORTER_VERSION}.linux-amd64.tar.gz"
    tar xzf "node_exporter-${NODE_EXPORTER_VERSION}.linux-amd64.tar.gz"
    
    cp "node_exporter-${NODE_EXPORTER_VERSION}.linux-amd64/node_exporter" /usr/local/bin/
    chown root:root /usr/local/bin/node_exporter
    chmod 755 /usr/local/bin/node_exporter
    
    # Clean up
    rm -rf "node_exporter-${NODE_EXPORTER_VERSION}.linux-amd64"*
}

# Create node_exporter system user
create_user() {
    echo -e "${BLUE}[4/8] Creating node_exporter system user...${NC}"
    
    if ! id "$NODE_EXPORTER_USER" &>/dev/null; then
        useradd --no-create-home --shell /bin/false --system "$NODE_EXPORTER_USER"
    fi
}

# Create textfile collector directory
create_directories() {
    echo -e "${BLUE}[5/8] Creating directories...${NC}"
    
    mkdir -p "$TEXTFILE_DIR"
    chown "$NODE_EXPORTER_USER:$NODE_EXPORTER_USER" "$TEXTFILE_DIR"
    chmod 755 "$TEXTFILE_DIR"
    
    # Create parent directory with proper permissions
    chown "$NODE_EXPORTER_USER:$NODE_EXPORTER_USER" "$(dirname $TEXTFILE_DIR)"
}

# Create systemd service
create_service() {
    echo -e "${BLUE}[6/8] Creating systemd service...${NC}"
    
    cat > /etc/systemd/system/node_exporter.service << 'EOF'
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter --collector.systemd --collector.textfile.directory=/var/lib/node_exporter/textfile_collector
Restart=always
RestartSec=3
NoNewPrivileges=yes
PrivateTmp=yes
ProtectSystem=strict
ProtectHome=yes
ReadWritePaths=/var/lib/node_exporter

[Install]
WantedBy=multi-user.target
EOF

    systemctl daemon-reload
    systemctl enable node_exporter
    systemctl start node_exporter
}

# Create job monitoring script
create_monitor_script() {
    echo -e "${BLUE}[7/8] Creating job monitoring script...${NC}"
    
    cat > "$MONITOR_SCRIPT_PATH" << 'EOF'
#!/bin/bash

JOB_NAME="$1"
JOB_STATUS="$2"  # success, failed, or running
JOB_DURATION="$3"  # optional duration in seconds
METRIC_FILE="/var/lib/node_exporter/textfile_collector/${JOB_NAME}.prom"
TIMESTAMP=$(date +%s)

if [ -z "$JOB_NAME" ] || [ -z "$JOB_STATUS" ]; then
    echo "Usage: $0 <job_name> <status> [duration_seconds]"
    echo "Status: success, failed, or running"
    exit 1
fi

# Create temporary file
TEMP_FILE="$(mktemp)"

# Write job status metric
echo "# HELP job_last_status Last execution status of scheduled job (1=success, 0=failed)" >> "$TEMP_FILE"
echo "# TYPE job_last_status gauge" >> "$TEMP_FILE"
if [ "$JOB_STATUS" = "success" ]; then
    echo "job_last_status{job=\"$JOB_NAME\"} 1" >> "$TEMP_FILE"
else
    echo "job_last_status{job=\"$JOB_NAME\"} 0" >> "$TEMP_FILE"
fi

# Write timestamp metric
echo "# HELP job_last_run_timestamp Unix timestamp of last job execution" >> "$TEMP_FILE"
echo "# TYPE job_last_run_timestamp gauge" >> "$TEMP_FILE"
echo "job_last_run_timestamp{job=\"$JOB_NAME\"} $TIMESTAMP" >> "$TEMP_FILE"

# Write duration metric if provided
if [ -n "$JOB_DURATION" ]; then
    echo "# HELP job_duration_seconds Duration of last job execution in seconds" >> "$TEMP_FILE"
    echo "# TYPE job_duration_seconds gauge" >> "$TEMP_FILE"
    echo "job_duration_seconds{job=\"$JOB_NAME\"} $JOB_DURATION" >> "$TEMP_FILE"
fi

# Atomically move file to final location
mv "$TEMP_FILE" "$METRIC_FILE"
chmod 644 "$METRIC_FILE"
EOF

    chmod 755 "$MONITOR_SCRIPT_PATH"
}

# Verify installation
verify_installation() {
    echo -e "${BLUE}[8/8] Verifying installation...${NC}"
    
    # Check if node_exporter is running
    if systemctl is-active --quiet node_exporter; then
        echo -e "${GREEN}✓ node_exporter service is running${NC}"
    else
        echo -e "${RED}✗ node_exporter service is not running${NC}"
        exit 1
    fi
    
    # Check if metrics endpoint is accessible
    sleep 2
    if curl -s http://localhost:9100/metrics | grep -q "node_exporter_build_info"; then
        echo -e "${GREEN}✓ node_exporter metrics endpoint is accessible${NC}"
    else
        echo -e "${RED}✗ node_exporter metrics endpoint is not accessible${NC}"
        exit 1
    fi
    
    # Check if monitor script exists and is executable
    if [[ -x "$MONITOR_SCRIPT_PATH" ]]; then
        echo -e "${GREEN}✓ Job monitoring script is installed${NC}"
    else
        echo -e "${RED}✗ Job monitoring script is not properly installed${NC}"
        exit 1
    fi
    
    # Test monitor script
    if "$MONITOR_SCRIPT_PATH" test_job success 10; then
        echo -e "${GREEN}✓ Job monitoring script is working${NC}"
    else
        echo -e "${RED}✗ Job monitoring script test failed${NC}"
        exit 1
    fi
    
    echo -e "${GREEN}Installation completed successfully!${NC}"
    echo -e "${YELLOW}Usage examples:${NC}"
    echo "  # Report successful job:"
    echo "  $MONITOR_SCRIPT_PATH backup_job success 120"
    echo "  # Report failed job:"
    echo "  $MONITOR_SCRIPT_PATH backup_job failed"
    echo "  # Add to crontab:"
    echo "  0 2 * * * /path/to/backup.sh && $MONITOR_SCRIPT_PATH backup success || $MONITOR_SCRIPT_PATH backup failed"
    echo -e "${YELLOW}node_exporter is running on http://localhost:9100/metrics${NC}"
}

# Main execution
main() {
    echo -e "${GREEN}Prometheus Job Monitoring Setup${NC}"
    echo "=================================="
    
    check_privileges
    detect_distro
    update_packages
    install_dependencies
    install_node_exporter
    create_user
    create_directories
    create_service
    create_monitor_script
    verify_installation
}

main "$@"

Review the script before running. Execute with: bash install.sh

#prometheus #grafana #cron #systemd #monitoring

Monitor cron jobs and systemd timers with Prometheus and Grafana alerting

Prerequisites

What this solves

Step-by-step installation

Install Prometheus node_exporter

Create node_exporter service user

Configure node_exporter systemd service

Create textfile collector directory

Start and enable node_exporter

Create job monitoring script

Create temporary file

Write job status metric

Write timestamp metric

Write duration metric if provided

Atomically move to final location

Make job monitoring script executable

Create wrapper script for cron jobs

Record start time

Mark job as running

Execute the command and capture exit code

Calculate duration

Report final status

Make cron wrapper executable

Create example monitored cron job

Create monitored systemd timer service

Create systemd timer for log cleanup

Enable systemd timer

Configure Prometheus server

Create Prometheus alerting rules

Install and configure Grafana

Start Grafana service

Configure Grafana dashboard

Verify your setup

Check metrics are being collected

Verify systemd timer is active

Test job monitoring manually

Check Grafana is accessible

Configure alerting

Set up Alertmanager

Configure email notifications

Common issues

Next steps

Running this in production?

Related tutorials

Configure Consul Connect service mesh monitoring with distributed tracing

Configure OpenTelemetry custom metrics for application monitoring with Prometheus and Grafana

Configure Jaeger with Elasticsearch backend security and encryption

Don't want to manage this yourself?