Fluentd Prometheus Alertmanager Log Alerting Setup

Configure Fluentd to collect and parse logs, integrate with Prometheus metrics collection, and set up Alertmanager for intelligent routing of log-based alerts to multiple notification channels.

Prerequisites

Root or sudo access
At least 2GB RAM
Network access for package downloads
Basic understanding of systemd services

What this solves

Log alerting provides proactive monitoring by triggering notifications when specific log patterns indicate system problems, security threats, or application errors. Fluentd collects logs from multiple sources, transforms them into metrics that Prometheus can scrape, and Alertmanager handles intelligent alert routing with deduplication and escalation policies.

Step-by-step installation

Update system packages

Start by updating your package manager to ensure you have the latest security patches and dependencies.

sudo apt update && sudo apt upgrade -y
sudo apt install -y curl wget gnupg2

sudo dnf update -y
sudo dnf install -y curl wget gnupg2

Install Fluentd

Install Fluentd using the official td-agent package which provides better stability and production support than the gem installation.

curl -fsSL https://toolbelt.treasuredata.com/sh/install-ubuntu-noble-td-agent4.sh | sh
sudo systemctl enable td-agent
sudo systemctl start td-agent

curl -fsSL https://toolbelt.treasuredata.com/sh/install-redhat-td-agent4.sh | sh
sudo systemctl enable td-agent
sudo systemctl start td-agent

Install Prometheus

Download and install Prometheus server to collect metrics from Fluentd and evaluate alerting rules.

sudo useradd --no-create-home --shell /bin/false prometheus
sudo mkdir -p /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /etc/prometheus /var/lib/prometheus

cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
tar -xzf prometheus-2.45.0.linux-amd64.tar.gz
sudo cp prometheus-2.45.0.linux-amd64/prometheus /usr/local/bin/
sudo cp prometheus-2.45.0.linux-amd64/promtool /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/prometheus /usr/local/bin/promtool
sudo cp -r prometheus-2.45.0.linux-amd64/consoles /etc/prometheus/
sudo cp -r prometheus-2.45.0.linux-amd64/console_libraries /etc/prometheus/
sudo chown -R prometheus:prometheus /etc/prometheus/consoles /etc/prometheus/console_libraries

Install Alertmanager

Install Alertmanager to handle alert routing, grouping, and notification delivery to various channels.

sudo useradd --no-create-home --shell /bin/false alertmanager
sudo mkdir -p /etc/alertmanager /var/lib/alertmanager
sudo chown alertmanager:alertmanager /etc/alertmanager /var/lib/alertmanager

cd /tmp
wget https://github.com/prometheus/alertmanager/releases/download/v0.25.0/alertmanager-0.25.0.linux-amd64.tar.gz
tar -xzf alertmanager-0.25.0.linux-amd64.tar.gz
sudo cp alertmanager-0.25.0.linux-amd64/alertmanager /usr/local/bin/
sudo cp alertmanager-0.25.0.linux-amd64/amtool /usr/local/bin/
sudo chown alertmanager:alertmanager /usr/local/bin/alertmanager /usr/local/bin/amtool

Install Fluentd Prometheus plugin

Install the prometheus plugin to expose Fluentd metrics and log counters for Prometheus scraping.

sudo td-agent-gem install fluent-plugin-prometheus

Configure Fluentd for log collection and metrics

Configure Fluentd to collect system logs, parse them for error patterns, and expose metrics for Prometheus. This configuration monitors syslog and nginx access logs.

# Prometheus metrics endpoint

  @type prometheus
  bind 0.0.0.0
  port 24231
  metrics_path /metrics



  @type prometheus_output_monitor
  interval 10
  
    hostname ${hostname}
  


Monitor syslog for errors

  @type tail
  path /var/log/syslog
  pos_file /var/log/td-agent/syslog.log.pos
  tag system.syslog
  
    @type syslog
  


Monitor nginx access logs

  @type tail
  path /var/log/nginx/access.log
  pos_file /var/log/td-agent/nginx.access.log.pos
  tag nginx.access
  
    @type nginx
  


Monitor nginx error logs

  @type tail
  path /var/log/nginx/error.log
  pos_file /var/log/td-agent/nginx.error.log.pos
  tag nginx.error
  
    @type multiline
    format_firstline /^\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}/
    format1 /^(?\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}) \[(?\w+)\] (?.*)/
  


Count error patterns and expose as metrics

  @type prometheus
  
    name fluentd_syslog_error_total
    type counter
    desc Count of syslog errors
    
      hostname ${hostname}
      severity ${record['severity']}
    
  



  @type prometheus
  
    name fluentd_nginx_error_total
    type counter
    desc Count of nginx errors
    
      hostname ${hostname}
      log_level ${record['log_level']}
    
  



  @type prometheus
  
    name fluentd_nginx_http_requests_total
    type counter
    desc Count of HTTP requests
    
      hostname ${hostname}
      method ${record['method']}
      code ${record['code']}
    
  


Output to stdout for debugging (optional)

  @type stdout

Create Fluentd log directory permissions

Set correct permissions for Fluentd to write position files and access log files. The td-agent user needs read access to system logs and write access to its working directory.

Never use chmod 777. It gives every user on the system full access to your files. Instead, fix ownership with chown and use minimal permissions.

sudo mkdir -p /var/log/td-agent
sudo chown td-agent:td-agent /var/log/td-agent
sudo chmod 755 /var/log/td-agent

Add td-agent to adm group for log access
sudo usermod -a -G adm td-agent

Configure Prometheus to scrape Fluentd metrics

Configure Prometheus to collect metrics from Fluentd and define alerting rules based on log patterns and error rates.

global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "/etc/prometheus/alerts.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - localhost:9093

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'fluentd'
    static_configs:
      - targets: ['localhost:24231']
    scrape_interval: 30s
    metrics_path: /metrics

Create alerting rules for log-based monitoring

Define alerting rules that trigger when log patterns indicate problems like high error rates, failed authentication attempts, or service failures.

groups:
  - name: log-based-alerts
    rules:
      - alert: HighNginxErrorRate
        expr: rate(fluentd_nginx_error_total[5m]) > 0.1
        for: 2m
        labels:
          severity: warning
          service: nginx
        annotations:
          summary: "High nginx error rate detected"
          description: "Nginx error rate is {{ $value }} errors per second on {{ $labels.hostname }}"

      - alert: CriticalNginxErrors
        expr: rate(fluentd_nginx_error_total{log_level="crit"}[5m]) > 0
        for: 1m
        labels:
          severity: critical
          service: nginx
        annotations:
          summary: "Critical nginx errors detected"
          description: "Critical nginx errors occurring on {{ $labels.hostname }}"

      - alert: HighSystemLogErrors
        expr: rate(fluentd_syslog_error_total{severity="error"}[10m]) > 0.05
        for: 5m
        labels:
          severity: warning
          service: system
        annotations:
          summary: "High system error rate"
          description: "System error rate is {{ $value }} per second on {{ $labels.hostname }}"

      - alert: HTTP4xxErrors
        expr: rate(fluentd_nginx_http_requests_total{code=~"4.."}[5m]) > 2
        for: 3m
        labels:
          severity: warning
          service: nginx
        annotations:
          summary: "High HTTP 4xx error rate"
          description: "HTTP 4xx error rate is {{ $value }} per second on {{ $labels.hostname }}"

      - alert: HTTP5xxErrors
        expr: rate(fluentd_nginx_http_requests_total{code=~"5.."}[5m]) > 0.5
        for: 1m
        labels:
          severity: critical
          service: nginx
        annotations:
          summary: "High HTTP 5xx error rate"
          description: "HTTP 5xx error rate is {{ $value }} per second on {{ $labels.hostname }}"

      - alert: FluentdDown
        expr: up{job="fluentd"} == 0
        for: 1m
        labels:
          severity: critical
          service: fluentd
        annotations:
          summary: "Fluentd is down"
          description: "Fluentd has been down for more than 1 minute"

Configure Alertmanager notification channels

Set up Alertmanager to route alerts to different notification channels based on severity and service. This example includes email and Slack integration.

global:
  smtp_smarthost: 'smtp.example.com:587'
  smtp_from: 'alerts@example.com'
  smtp_auth_username: 'alerts@example.com'
  smtp_auth_password: 'your-smtp-password'
  slack_api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'

templates:
  - '/etc/alertmanager/templates/*.tmpl'

route:
  group_by: ['alertname', 'service']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'default'
  routes:
    - match:
        severity: critical
      receiver: 'critical-alerts'
      group_wait: 5s
      repeat_interval: 15m
    - match:
        service: nginx
      receiver: 'web-team'
    - match:
        service: system
      receiver: 'ops-team'

receivers:
  - name: 'default'
    email_configs:
      - to: 'admin@example.com'
        subject: 'Prometheus Alert: {{ .GroupLabels.alertname }}'
        body: |
          {{ range .Alerts }}
          Alert: {{ .Annotations.summary }}
          Description: {{ .Annotations.description }}
          Labels: {{ range .Labels.SortedPairs }}{{ .Name }}={{ .Value }} {{ end }}
          {{ end }}

  - name: 'critical-alerts'
    email_configs:
      - to: 'oncall@example.com'
        subject: 'CRITICAL: {{ .GroupLabels.alertname }}'
        body: |
          {{ range .Alerts }}
          CRITICAL ALERT: {{ .Annotations.summary }}
          Description: {{ .Annotations.description }}
          Started: {{ .StartsAt }}
          Labels: {{ range .Labels.SortedPairs }}{{ .Name }}={{ .Value }} {{ end }}
          {{ end }}
    slack_configs:
      - channel: '#alerts'
        title: 'Critical Alert: {{ .GroupLabels.alertname }}'
        text: |
          {{ range .Alerts }}
          {{ .Annotations.summary }}
          {{ .Annotations.description }}
          {{ end }}
        color: 'danger'

  - name: 'web-team'
    slack_configs:
      - channel: '#web-team'
        title: 'Web Service Alert: {{ .GroupLabels.alertname }}'
        text: |
          {{ range .Alerts }}
          {{ .Annotations.summary }}
          {{ .Annotations.description }}
          {{ end }}
        color: 'warning'

  - name: 'ops-team'
    email_configs:
      - to: 'ops@example.com'
        subject: 'System Alert: {{ .GroupLabels.alertname }}'
        body: |
          {{ range .Alerts }}
          System Alert: {{ .Annotations.summary }}
          Description: {{ .Annotations.description }}
          {{ end }}

inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'service']

Create systemd service files

Create systemd service files for Prometheus and Alertmanager to ensure they start automatically and run with proper security constraints.

[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
    --config.file /etc/prometheus/prometheus.yml \
    --storage.tsdb.path /var/lib/prometheus/ \
    --web.console.templates=/etc/prometheus/consoles \
    --web.console.libraries=/etc/prometheus/console_libraries \
    --web.listen-address=0.0.0.0:9090 \
    --web.enable-lifecycle \
    --storage.tsdb.retention.time=90d

Restart=always
RestartSec=10s
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target

[Unit]
Description=Alertmanager
Wants=network-online.target
After=network-online.target

[Service]
User=alertmanager
Group=alertmanager
Type=simple
ExecStart=/usr/local/bin/alertmanager \
    --config.file=/etc/alertmanager/alertmanager.yml \
    --storage.path=/var/lib/alertmanager/ \
    --web.external-url=http://localhost:9093/

Restart=always
RestartSec=10s
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target

Set correct file ownership

Ensure all configuration files have the correct ownership and permissions for the service users.

sudo chown prometheus:prometheus /etc/prometheus/prometheus.yml /etc/prometheus/alerts.yml
sudo chmod 644 /etc/prometheus/prometheus.yml /etc/prometheus/alerts.yml

sudo chown alertmanager:alertmanager /etc/alertmanager/alertmanager.yml
sudo chmod 644 /etc/alertmanager/alertmanager.yml

Enable and start all services

Enable and start Fluentd, Prometheus, and Alertmanager services with proper startup order.

sudo systemctl daemon-reload

Start Fluentd first
sudo systemctl restart td-agent
sudo systemctl enable td-agent

Start Prometheus
sudo systemctl enable prometheus
sudo systemctl start prometheus

Start Alertmanager
sudo systemctl enable alertmanager
sudo systemctl start alertmanager

Configure firewall rules

Open the necessary ports for Prometheus, Alertmanager, and Fluentd metrics endpoint. These services need to communicate with each other and external monitoring tools.

sudo ufw allow 9090/tcp comment 'Prometheus'
sudo ufw allow 9093/tcp comment 'Alertmanager'
sudo ufw allow 24231/tcp comment 'Fluentd metrics'
sudo ufw reload

sudo firewall-cmd --permanent --add-port=9090/tcp --add-port=9093/tcp --add-port=24231/tcp
sudo firewall-cmd --reload

Verify your setup

Check that all services are running correctly and can communicate with each other.

# Check service status
sudo systemctl status td-agent
sudo systemctl status prometheus
sudo systemctl status alertmanager

Verify Fluentd metrics endpoint
curl http://localhost:24231/metrics

Check Prometheus targets
curl http://localhost:9090/api/v1/targets

Verify alerting rules are loaded
curl http://localhost:9090/api/v1/rules

Check Alertmanager status
curl http://localhost:9093/api/v1/status

Test alert by generating nginx errors (if nginx is installed)
sudo nginx -t || echo "Expected error for testing"

View current alerts
curl http://localhost:9090/api/v1/alerts

Note: Access the web interfaces at http://localhost:9090 for Prometheus and http://localhost:9093 for Alertmanager to view metrics, rules, and active alerts graphically.

Common issues

Symptom	Cause	Fix
Fluentd not collecting logs	Permission denied on log files	`sudo usermod -a -G adm td-agent && sudo systemctl restart td-agent`
Prometheus can't scrape Fluentd	Fluentd metrics plugin not loaded	Check `/var/log/td-agent/td-agent.log` and verify plugin installation
Alerts not firing	Incorrect rule syntax or thresholds	`/usr/local/bin/promtool check rules /etc/prometheus/alerts.yml`
Notifications not sent	SMTP/Slack configuration errors	Check Alertmanager logs: `sudo journalctl -u alertmanager -f`
High memory usage	Too many log files or metrics	Adjust retention settings and add log rotation
Position file errors	Incorrect permissions on pos files	`sudo chown -R td-agent:td-agent /var/log/td-agent`

Next steps

Automated install script

Run this to automate the entire setup

install.sh

#!/usr/bin/env bash

set -euo pipefail

# Colors for output
readonly RED='\033[0;31m'
readonly GREEN='\033[0;32m'
readonly YELLOW='\033[1;33m'
readonly NC='\033[0m' # No Color

# Script configuration
readonly SCRIPT_NAME="$(basename "$0")"
readonly PROMETHEUS_VERSION="2.45.0"
readonly ALERTMANAGER_VERSION="0.25.0"

# Print colored output
print_status() {
    echo -e "${GREEN}[INFO]${NC} $1"
}

print_warning() {
    echo -e "${YELLOW}[WARN]${NC} $1"
}

print_error() {
    echo -e "${RED}[ERROR]${NC} $1"
}

# Usage message
usage() {
    cat << EOF
Usage: $SCRIPT_NAME [OPTIONS]

Install Fluentd, Prometheus, and Alertmanager for log alerting

OPTIONS:
    -h, --help              Show this help message
    --prometheus-port PORT  Prometheus port (default: 9090)
    --alertmanager-port PORT Alertmanager port (default: 9093)

Example:
    $SCRIPT_NAME
    $SCRIPT_NAME --prometheus-port 9091 --alertmanager-port 9094

EOF
}

# Cleanup on error
cleanup() {
    local exit_code=$?
    if [ $exit_code -ne 0 ]; then
        print_error "Installation failed. Cleaning up..."
        systemctl stop td-agent prometheus alertmanager 2>/dev/null || true
        systemctl disable td-agent prometheus alertmanager 2>/dev/null || true
        rm -f /etc/systemd/system/prometheus.service /etc/systemd/system/alertmanager.service
        systemctl daemon-reload
    fi
    exit $exit_code
}

trap cleanup ERR

# Parse command line arguments
PROMETHEUS_PORT=9090
ALERTMANAGER_PORT=9093

while [[ $# -gt 0 ]]; do
    case $1 in
        -h|--help)
            usage
            exit 0
            ;;
        --prometheus-port)
            PROMETHEUS_PORT="$2"
            shift 2
            ;;
        --alertmanager-port)
            ALERTMANAGER_PORT="$2"
            shift 2
            ;;
        *)
            print_error "Unknown option: $1"
            usage
            exit 1
            ;;
    esac
done

# Check prerequisites
check_prerequisites() {
    print_status "Checking prerequisites..."
    
    if [[ $EUID -ne 0 ]]; then
        print_error "This script must be run as root"
        exit 1
    fi

    if ! command -v systemctl >/dev/null 2>&1; then
        print_error "systemd is required"
        exit 1
    fi
}

# Detect distribution
detect_distro() {
    if [ -f /etc/os-release ]; then
        . /etc/os-release
        case "$ID" in
            ubuntu|debian)
                PKG_MGR="apt"
                PKG_UPDATE="apt update -y"
                PKG_INSTALL="apt install -y"
                PKG_UPGRADE="apt upgrade -y"
                ;;
            almalinux|rocky|centos|rhel|ol)
                PKG_MGR="dnf"
                PKG_UPDATE="dnf update -y"
                PKG_INSTALL="dnf install -y"
                PKG_UPGRADE="dnf upgrade -y"
                ;;
            fedora)
                PKG_MGR="dnf"
                PKG_UPDATE="dnf update -y"
                PKG_INSTALL="dnf install -y"
                PKG_UPGRADE="dnf upgrade -y"
                ;;
            amzn)
                PKG_MGR="yum"
                PKG_UPDATE="yum update -y"
                PKG_INSTALL="yum install -y"
                PKG_UPGRADE="yum upgrade -y"
                ;;
            *)
                print_error "Unsupported distribution: $ID"
                exit 1
                ;;
        esac
    else
        print_error "Cannot detect distribution (/etc/os-release not found)"
        exit 1
    fi
    print_status "Detected distribution: $ID"
}

# Update system packages
update_system() {
    print_status "[1/8] Updating system packages..."
    $PKG_UPDATE
    $PKG_UPGRADE
    $PKG_INSTALL curl wget gnupg2 tar
}

# Install Fluentd
install_fluentd() {
    print_status "[2/8] Installing Fluentd (td-agent)..."
    
    case "$ID" in
        ubuntu|debian)
            curl -fsSL https://toolbelt.treasuredata.com/sh/install-ubuntu-noble-td-agent4.sh | sh
            ;;
        almalinux|rocky|centos|rhel|ol|fedora|amzn)
            curl -fsSL https://toolbelt.treasuredata.com/sh/install-redhat-td-agent4.sh | sh
            ;;
    esac
    
    systemctl enable td-agent
    systemctl start td-agent
}

# Install Prometheus
install_prometheus() {
    print_status "[3/8] Installing Prometheus..."
    
    useradd --no-create-home --shell /bin/false prometheus || true
    mkdir -p /etc/prometheus /var/lib/prometheus
    chown prometheus:prometheus /etc/prometheus /var/lib/prometheus
    
    cd /tmp
    wget "https://github.com/prometheus/prometheus/releases/download/v${PROMETHEUS_VERSION}/prometheus-${PROMETHEUS_VERSION}.linux-amd64.tar.gz"
    tar -xzf "prometheus-${PROMETHEUS_VERSION}.linux-amd64.tar.gz"
    
    cp "prometheus-${PROMETHEUS_VERSION}.linux-amd64/prometheus" /usr/local/bin/
    cp "prometheus-${PROMETHEUS_VERSION}.linux-amd64/promtool" /usr/local/bin/
    chown prometheus:prometheus /usr/local/bin/prometheus /usr/local/bin/promtool
    chmod 755 /usr/local/bin/prometheus /usr/local/bin/promtool
    
    cp -r "prometheus-${PROMETHEUS_VERSION}.linux-amd64/consoles" /etc/prometheus/
    cp -r "prometheus-${PROMETHEUS_VERSION}.linux-amd64/console_libraries" /etc/prometheus/
    chown -R prometheus:prometheus /etc/prometheus/consoles /etc/prometheus/console_libraries
    
    rm -rf "prometheus-${PROMETHEUS_VERSION}.linux-amd64"*
}

# Install Alertmanager
install_alertmanager() {
    print_status "[4/8] Installing Alertmanager..."
    
    useradd --no-create-home --shell /bin/false alertmanager || true
    mkdir -p /etc/alertmanager /var/lib/alertmanager
    chown alertmanager:alertmanager /etc/alertmanager /var/lib/alertmanager
    
    cd /tmp
    wget "https://github.com/prometheus/alertmanager/releases/download/v${ALERTMANAGER_VERSION}/alertmanager-${ALERTMANAGER_VERSION}.linux-amd64.tar.gz"
    tar -xzf "alertmanager-${ALERTMANAGER_VERSION}.linux-amd64.tar.gz"
    
    cp "alertmanager-${ALERTMANAGER_VERSION}.linux-amd64/alertmanager" /usr/local/bin/
    cp "alertmanager-${ALERTMANAGER_VERSION}.linux-amd64/amtool" /usr/local/bin/
    chown alertmanager:alertmanager /usr/local/bin/alertmanager /usr/local/bin/amtool
    chmod 755 /usr/local/bin/alertmanager /usr/local/bin/amtool
    
    rm -rf "alertmanager-${ALERTMANAGER_VERSION}.linux-amd64"*
}

# Install Fluentd plugins
install_fluentd_plugins() {
    print_status "[5/8] Installing Fluentd Prometheus plugin..."
    td-agent-gem install fluent-plugin-prometheus
}

# Configure Fluentd
configure_fluentd() {
    print_status "[6/8] Configuring Fluentd..."
    
    cat > /etc/td-agent/td-agent.conf << 'EOF'
<source>
  @type prometheus
  bind 0.0.0.0
  port 24231
  metrics_path /metrics
</source>

<source>
  @type prometheus_output_monitor
  interval 10
  <labels>
    hostname ${hostname}
  </labels>
</source>

<source>
  @type tail
  path /var/log/syslog,/var/log/messages
  pos_file /var/log/td-agent/syslog.log.pos
  tag system.syslog
  <parse>
    @type syslog
  </parse>
</source>

<filter system.syslog>
  @type prometheus
  <metric>
    name fluentd_input_status_num_records_total
    type counter
    desc The total number of incoming records
    <labels>
      tag ${tag}
      hostname ${hostname}
    </labels>
  </metric>
</filter>

<match system.syslog>
  @type null
</match>
EOF
    
    chown td-agent:td-agent /etc/td-agent/td-agent.conf
    chmod 644 /etc/td-agent/td-agent.conf
    
    systemctl restart td-agent
}

# Configure Prometheus and Alertmanager
configure_services() {
    print_status "[7/8] Configuring Prometheus and Alertmanager..."
    
    # Prometheus configuration
    cat > /etc/prometheus/prometheus.yml << EOF
global:
  scrape_interval: 15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - localhost:${ALERTMANAGER_PORT}

rule_files:
  - "alert_rules.yml"

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:${PROMETHEUS_PORT}']

  - job_name: 'fluentd'
    static_configs:
      - targets: ['localhost:24231']
EOF
    
    # Alert rules
    cat > /etc/prometheus/alert_rules.yml << 'EOF'
groups:
  - name: fluentd.rules
    rules:
      - alert: FluentdDown
        expr: up{job="fluentd"} == 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Fluentd is down"
          description: "Fluentd has been down for more than 5 minutes"
EOF
    
    # Alertmanager configuration
    cat > /etc/alertmanager/alertmanager.yml << 'EOF'
global:
  smtp_smarthost: 'localhost:587'
  smtp_from: 'alertmanager@example.com'

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'web.hook'

receivers:
  - name: 'web.hook'
    webhook_configs:
      - url: 'http://127.0.0.1:5001/'
EOF
    
    chown -R prometheus:prometheus /etc/prometheus/
    chown -R alertmanager:alertmanager /etc/alertmanager/
    chmod 644 /etc/prometheus/*.yml /etc/alertmanager/*.yml
    
    # Create systemd services
    cat > /etc/systemd/system/prometheus.service << EOF
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \\
    --config.file /etc/prometheus/prometheus.yml \\
    --storage.tsdb.path /var/lib/prometheus/ \\
    --web.console.templates=/etc/prometheus/consoles \\
    --web.console.libraries=/etc/prometheus/console_libraries \\
    --web.listen-address=0.0.0.0:${PROMETHEUS_PORT}

[Install]
WantedBy=multi-user.target
EOF
    
    cat > /etc/systemd/system/alertmanager.service << EOF
[Unit]
Description=Alertmanager
Wants=network-online.target
After=network-online.target

[Service]
User=alertmanager
Group=alertmanager
Type=simple
ExecStart=/usr/local/bin/alertmanager \\
    --config.file=/etc/alertmanager/alertmanager.yml \\
    --storage.path=/var/lib/alertmanager/ \\
    --web.listen-address=0.0.0.0:${ALERTMANAGER_PORT}

[Install]
WantedBy=multi-user.target
EOF
    
    systemctl daemon-reload
    systemctl enable prometheus alertmanager
    systemctl start prometheus alertmanager
}

# Verify installation
verify_installation() {
    print_status "[8/8] Verifying installation..."
    
    local services=("td-agent" "prometheus" "alertmanager")
    local ports=("24231" "$PROMETHEUS_PORT" "$ALERTMANAGER_PORT")
    local all_good=true
    
    for service in "${services[@]}"; do
        if systemctl is-active --quiet "$service"; then
            print_status "$service is running"
        else
            print_error "$service is not running"
            all_good=false
        fi
    done
    
    for port in "${ports[@]}"; do
        if ss -tlnp | grep -q ":$port "; then
            print_status "Port $port is listening"
        else
            print_error "Port $port is not listening"
            all_good=false
        fi
    done
    
    if $all_good; then
        print_status "Installation completed successfully!"
        echo ""
        print_status "Access URLs:"
        print_status "  Prometheus: http://$(hostname -I | awk '{print $1}'):$PROMETHEUS_PORT"
        print_status "  Alertmanager: http://$(hostname -I | awk '{print $1}'):$ALERTMANAGER_PORT"
        print_status "  Fluentd metrics: http://$(hostname -I | awk '{print $1}'):24231/metrics"
    else
        print_error "Some services are not running properly. Check systemctl status for details."
        exit 1
    fi
}

# Main installation flow
main() {
    check_prerequisites
    detect_distro
    update_system
    install_fluentd
    install_prometheus
    install_alertmanager
    install_fluentd_plugins
    configure_fluentd
    configure_services
    verify_installation
}

main "$@"

Review the script before running. Execute with: bash install.sh

#fluentd #prometheus #alertmanager #log-monitoring #centralized-logging

Set up log alerting with Fluentd and Prometheus Alertmanager for centralized monitoring

Prerequisites

What this solves

Step-by-step installation

Update system packages

Install Fluentd

Install Prometheus

Install Alertmanager

Install Fluentd Prometheus plugin

Configure Fluentd for log collection and metrics

Monitor syslog for errors

Monitor nginx access logs

Monitor nginx error logs

Count error patterns and expose as metrics

Output to stdout for debugging (optional)

Create Fluentd log directory permissions

Add td-agent to adm group for log access

Configure Prometheus to scrape Fluentd metrics

Create alerting rules for log-based monitoring

Configure Alertmanager notification channels

Create systemd service files

Set correct file ownership

Enable and start all services

Start Fluentd first

Start Prometheus

Start Alertmanager

Configure firewall rules

Verify your setup

Verify Fluentd metrics endpoint

Check Prometheus targets

Verify alerting rules are loaded

Check Alertmanager status

Test alert by generating nginx errors (if nginx is installed)

View current alerts

Common issues

Next steps

Related tutorials

Configure Consul Connect service mesh monitoring with distributed tracing

Configure OpenTelemetry custom metrics for application monitoring with Prometheus and Grafana

Configure Jaeger with Elasticsearch backend security and encryption

Don't want to manage this yourself?