Set up log alerting with Fluentd and Prometheus Alertmanager for centralized monitoring

Intermediate 45 min Apr 12, 2026 249 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Configure Fluentd to collect and parse logs, integrate with Prometheus metrics collection, and set up Alertmanager for intelligent routing of log-based alerts to multiple notification channels.

Prerequisites

  • Root or sudo access
  • At least 2GB RAM
  • Network access for package downloads
  • Basic understanding of systemd services

What this solves

Log alerting provides proactive monitoring by triggering notifications when specific log patterns indicate system problems, security threats, or application errors. Fluentd collects logs from multiple sources, transforms them into metrics that Prometheus can scrape, and Alertmanager handles intelligent alert routing with deduplication and escalation policies.

Step-by-step installation

Update system packages

Start by updating your package manager to ensure you have the latest security patches and dependencies.

sudo apt update && sudo apt upgrade -y
sudo apt install -y curl wget gnupg2
sudo dnf update -y
sudo dnf install -y curl wget gnupg2

Install Fluentd

Install Fluentd using the official td-agent package which provides better stability and production support than the gem installation.

curl -fsSL https://toolbelt.treasuredata.com/sh/install-ubuntu-noble-td-agent4.sh | sh
sudo systemctl enable td-agent
sudo systemctl start td-agent
curl -fsSL https://toolbelt.treasuredata.com/sh/install-redhat-td-agent4.sh | sh
sudo systemctl enable td-agent
sudo systemctl start td-agent

Install Prometheus

Download and install Prometheus server to collect metrics from Fluentd and evaluate alerting rules.

sudo useradd --no-create-home --shell /bin/false prometheus
sudo mkdir -p /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /etc/prometheus /var/lib/prometheus

cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
tar -xzf prometheus-2.45.0.linux-amd64.tar.gz
sudo cp prometheus-2.45.0.linux-amd64/prometheus /usr/local/bin/
sudo cp prometheus-2.45.0.linux-amd64/promtool /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/prometheus /usr/local/bin/promtool
sudo cp -r prometheus-2.45.0.linux-amd64/consoles /etc/prometheus/
sudo cp -r prometheus-2.45.0.linux-amd64/console_libraries /etc/prometheus/
sudo chown -R prometheus:prometheus /etc/prometheus/consoles /etc/prometheus/console_libraries

Install Alertmanager

Install Alertmanager to handle alert routing, grouping, and notification delivery to various channels.

sudo useradd --no-create-home --shell /bin/false alertmanager
sudo mkdir -p /etc/alertmanager /var/lib/alertmanager
sudo chown alertmanager:alertmanager /etc/alertmanager /var/lib/alertmanager

cd /tmp
wget https://github.com/prometheus/alertmanager/releases/download/v0.25.0/alertmanager-0.25.0.linux-amd64.tar.gz
tar -xzf alertmanager-0.25.0.linux-amd64.tar.gz
sudo cp alertmanager-0.25.0.linux-amd64/alertmanager /usr/local/bin/
sudo cp alertmanager-0.25.0.linux-amd64/amtool /usr/local/bin/
sudo chown alertmanager:alertmanager /usr/local/bin/alertmanager /usr/local/bin/amtool

Install Fluentd Prometheus plugin

Install the prometheus plugin to expose Fluentd metrics and log counters for Prometheus scraping.

sudo td-agent-gem install fluent-plugin-prometheus

Configure Fluentd for log collection and metrics

Configure Fluentd to collect system logs, parse them for error patterns, and expose metrics for Prometheus. This configuration monitors syslog and nginx access logs.

# Prometheus metrics endpoint

  @type prometheus
  bind 0.0.0.0
  port 24231
  metrics_path /metrics



  @type prometheus_output_monitor
  interval 10
  
    hostname ${hostname}
  


Monitor syslog for errors

@type tail path /var/log/syslog pos_file /var/log/td-agent/syslog.log.pos tag system.syslog @type syslog

Monitor nginx access logs

@type tail path /var/log/nginx/access.log pos_file /var/log/td-agent/nginx.access.log.pos tag nginx.access @type nginx

Monitor nginx error logs

@type tail path /var/log/nginx/error.log pos_file /var/log/td-agent/nginx.error.log.pos tag nginx.error @type multiline format_firstline /^\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}/ format1 /^(?

Count error patterns and expose as metrics

@type prometheus name fluentd_syslog_error_total type counter desc Count of syslog errors hostname ${hostname} severity ${record['severity']} @type prometheus name fluentd_nginx_error_total type counter desc Count of nginx errors hostname ${hostname} log_level ${record['log_level']} @type prometheus name fluentd_nginx_http_requests_total type counter desc Count of HTTP requests hostname ${hostname} method ${record['method']} code ${record['code']}

Output to stdout for debugging (optional)

@type stdout

Create Fluentd log directory permissions

Set correct permissions for Fluentd to write position files and access log files. The td-agent user needs read access to system logs and write access to its working directory.

Never use chmod 777. It gives every user on the system full access to your files. Instead, fix ownership with chown and use minimal permissions.
sudo mkdir -p /var/log/td-agent
sudo chown td-agent:td-agent /var/log/td-agent
sudo chmod 755 /var/log/td-agent

Add td-agent to adm group for log access

sudo usermod -a -G adm td-agent

Configure Prometheus to scrape Fluentd metrics

Configure Prometheus to collect metrics from Fluentd and define alerting rules based on log patterns and error rates.

global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "/etc/prometheus/alerts.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - localhost:9093

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'fluentd'
    static_configs:
      - targets: ['localhost:24231']
    scrape_interval: 30s
    metrics_path: /metrics

Create alerting rules for log-based monitoring

Define alerting rules that trigger when log patterns indicate problems like high error rates, failed authentication attempts, or service failures.

groups:
  - name: log-based-alerts
    rules:
      - alert: HighNginxErrorRate
        expr: rate(fluentd_nginx_error_total[5m]) > 0.1
        for: 2m
        labels:
          severity: warning
          service: nginx
        annotations:
          summary: "High nginx error rate detected"
          description: "Nginx error rate is {{ $value }} errors per second on {{ $labels.hostname }}"

      - alert: CriticalNginxErrors
        expr: rate(fluentd_nginx_error_total{log_level="crit"}[5m]) > 0
        for: 1m
        labels:
          severity: critical
          service: nginx
        annotations:
          summary: "Critical nginx errors detected"
          description: "Critical nginx errors occurring on {{ $labels.hostname }}"

      - alert: HighSystemLogErrors
        expr: rate(fluentd_syslog_error_total{severity="error"}[10m]) > 0.05
        for: 5m
        labels:
          severity: warning
          service: system
        annotations:
          summary: "High system error rate"
          description: "System error rate is {{ $value }} per second on {{ $labels.hostname }}"

      - alert: HTTP4xxErrors
        expr: rate(fluentd_nginx_http_requests_total{code=~"4.."}[5m]) > 2
        for: 3m
        labels:
          severity: warning
          service: nginx
        annotations:
          summary: "High HTTP 4xx error rate"
          description: "HTTP 4xx error rate is {{ $value }} per second on {{ $labels.hostname }}"

      - alert: HTTP5xxErrors
        expr: rate(fluentd_nginx_http_requests_total{code=~"5.."}[5m]) > 0.5
        for: 1m
        labels:
          severity: critical
          service: nginx
        annotations:
          summary: "High HTTP 5xx error rate"
          description: "HTTP 5xx error rate is {{ $value }} per second on {{ $labels.hostname }}"

      - alert: FluentdDown
        expr: up{job="fluentd"} == 0
        for: 1m
        labels:
          severity: critical
          service: fluentd
        annotations:
          summary: "Fluentd is down"
          description: "Fluentd has been down for more than 1 minute"

Configure Alertmanager notification channels

Set up Alertmanager to route alerts to different notification channels based on severity and service. This example includes email and Slack integration.

global:
  smtp_smarthost: 'smtp.example.com:587'
  smtp_from: 'alerts@example.com'
  smtp_auth_username: 'alerts@example.com'
  smtp_auth_password: 'your-smtp-password'
  slack_api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'

templates:
  - '/etc/alertmanager/templates/*.tmpl'

route:
  group_by: ['alertname', 'service']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'default'
  routes:
    - match:
        severity: critical
      receiver: 'critical-alerts'
      group_wait: 5s
      repeat_interval: 15m
    - match:
        service: nginx
      receiver: 'web-team'
    - match:
        service: system
      receiver: 'ops-team'

receivers:
  - name: 'default'
    email_configs:
      - to: 'admin@example.com'
        subject: 'Prometheus Alert: {{ .GroupLabels.alertname }}'
        body: |
          {{ range .Alerts }}
          Alert: {{ .Annotations.summary }}
          Description: {{ .Annotations.description }}
          Labels: {{ range .Labels.SortedPairs }}{{ .Name }}={{ .Value }} {{ end }}
          {{ end }}

  - name: 'critical-alerts'
    email_configs:
      - to: 'oncall@example.com'
        subject: 'CRITICAL: {{ .GroupLabels.alertname }}'
        body: |
          {{ range .Alerts }}
          CRITICAL ALERT: {{ .Annotations.summary }}
          Description: {{ .Annotations.description }}
          Started: {{ .StartsAt }}
          Labels: {{ range .Labels.SortedPairs }}{{ .Name }}={{ .Value }} {{ end }}
          {{ end }}
    slack_configs:
      - channel: '#alerts'
        title: 'Critical Alert: {{ .GroupLabels.alertname }}'
        text: |
          {{ range .Alerts }}
          {{ .Annotations.summary }}
          {{ .Annotations.description }}
          {{ end }}
        color: 'danger'

  - name: 'web-team'
    slack_configs:
      - channel: '#web-team'
        title: 'Web Service Alert: {{ .GroupLabels.alertname }}'
        text: |
          {{ range .Alerts }}
          {{ .Annotations.summary }}
          {{ .Annotations.description }}
          {{ end }}
        color: 'warning'

  - name: 'ops-team'
    email_configs:
      - to: 'ops@example.com'
        subject: 'System Alert: {{ .GroupLabels.alertname }}'
        body: |
          {{ range .Alerts }}
          System Alert: {{ .Annotations.summary }}
          Description: {{ .Annotations.description }}
          {{ end }}

inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'service']

Create systemd service files

Create systemd service files for Prometheus and Alertmanager to ensure they start automatically and run with proper security constraints.

[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
    --config.file /etc/prometheus/prometheus.yml \
    --storage.tsdb.path /var/lib/prometheus/ \
    --web.console.templates=/etc/prometheus/consoles \
    --web.console.libraries=/etc/prometheus/console_libraries \
    --web.listen-address=0.0.0.0:9090 \
    --web.enable-lifecycle \
    --storage.tsdb.retention.time=90d

Restart=always
RestartSec=10s
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target
[Unit]
Description=Alertmanager
Wants=network-online.target
After=network-online.target

[Service]
User=alertmanager
Group=alertmanager
Type=simple
ExecStart=/usr/local/bin/alertmanager \
    --config.file=/etc/alertmanager/alertmanager.yml \
    --storage.path=/var/lib/alertmanager/ \
    --web.external-url=http://localhost:9093/

Restart=always
RestartSec=10s
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target

Set correct file ownership

Ensure all configuration files have the correct ownership and permissions for the service users.

sudo chown prometheus:prometheus /etc/prometheus/prometheus.yml /etc/prometheus/alerts.yml
sudo chmod 644 /etc/prometheus/prometheus.yml /etc/prometheus/alerts.yml

sudo chown alertmanager:alertmanager /etc/alertmanager/alertmanager.yml
sudo chmod 644 /etc/alertmanager/alertmanager.yml

Enable and start all services

Enable and start Fluentd, Prometheus, and Alertmanager services with proper startup order.

sudo systemctl daemon-reload

Start Fluentd first

sudo systemctl restart td-agent sudo systemctl enable td-agent

Start Prometheus

sudo systemctl enable prometheus sudo systemctl start prometheus

Start Alertmanager

sudo systemctl enable alertmanager sudo systemctl start alertmanager

Configure firewall rules

Open the necessary ports for Prometheus, Alertmanager, and Fluentd metrics endpoint. These services need to communicate with each other and external monitoring tools.

sudo ufw allow 9090/tcp comment 'Prometheus'
sudo ufw allow 9093/tcp comment 'Alertmanager'
sudo ufw allow 24231/tcp comment 'Fluentd metrics'
sudo ufw reload
sudo firewall-cmd --permanent --add-port=9090/tcp --add-port=9093/tcp --add-port=24231/tcp
sudo firewall-cmd --reload

Verify your setup

Check that all services are running correctly and can communicate with each other.

# Check service status
sudo systemctl status td-agent
sudo systemctl status prometheus
sudo systemctl status alertmanager

Verify Fluentd metrics endpoint

curl http://localhost:24231/metrics

Check Prometheus targets

curl http://localhost:9090/api/v1/targets

Verify alerting rules are loaded

curl http://localhost:9090/api/v1/rules

Check Alertmanager status

curl http://localhost:9093/api/v1/status

Test alert by generating nginx errors (if nginx is installed)

sudo nginx -t || echo "Expected error for testing"

View current alerts

curl http://localhost:9090/api/v1/alerts
Note: Access the web interfaces at http://localhost:9090 for Prometheus and http://localhost:9093 for Alertmanager to view metrics, rules, and active alerts graphically.

Common issues

SymptomCauseFix
Fluentd not collecting logsPermission denied on log filessudo usermod -a -G adm td-agent && sudo systemctl restart td-agent
Prometheus can't scrape FluentdFluentd metrics plugin not loadedCheck /var/log/td-agent/td-agent.log and verify plugin installation
Alerts not firingIncorrect rule syntax or thresholds/usr/local/bin/promtool check rules /etc/prometheus/alerts.yml
Notifications not sentSMTP/Slack configuration errorsCheck Alertmanager logs: sudo journalctl -u alertmanager -f
High memory usageToo many log files or metricsAdjust retention settings and add log rotation
Position file errorsIncorrect permissions on pos filessudo chown -R td-agent:td-agent /var/log/td-agent

Next steps

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle managed devops services for businesses that depend on uptime. From initial setup to ongoing operations.