Configure Fluentd to collect and parse logs, integrate with Prometheus metrics collection, and set up Alertmanager for intelligent routing of log-based alerts to multiple notification channels.
Prerequisites
- Root or sudo access
- At least 2GB RAM
- Network access for package downloads
- Basic understanding of systemd services
What this solves
Log alerting provides proactive monitoring by triggering notifications when specific log patterns indicate system problems, security threats, or application errors. Fluentd collects logs from multiple sources, transforms them into metrics that Prometheus can scrape, and Alertmanager handles intelligent alert routing with deduplication and escalation policies.
Step-by-step installation
Update system packages
Start by updating your package manager to ensure you have the latest security patches and dependencies.
sudo apt update && sudo apt upgrade -y
sudo apt install -y curl wget gnupg2
Install Fluentd
Install Fluentd using the official td-agent package which provides better stability and production support than the gem installation.
curl -fsSL https://toolbelt.treasuredata.com/sh/install-ubuntu-noble-td-agent4.sh | sh
sudo systemctl enable td-agent
sudo systemctl start td-agent
Install Prometheus
Download and install Prometheus server to collect metrics from Fluentd and evaluate alerting rules.
sudo useradd --no-create-home --shell /bin/false prometheus
sudo mkdir -p /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /etc/prometheus /var/lib/prometheus
cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
tar -xzf prometheus-2.45.0.linux-amd64.tar.gz
sudo cp prometheus-2.45.0.linux-amd64/prometheus /usr/local/bin/
sudo cp prometheus-2.45.0.linux-amd64/promtool /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/prometheus /usr/local/bin/promtool
sudo cp -r prometheus-2.45.0.linux-amd64/consoles /etc/prometheus/
sudo cp -r prometheus-2.45.0.linux-amd64/console_libraries /etc/prometheus/
sudo chown -R prometheus:prometheus /etc/prometheus/consoles /etc/prometheus/console_libraries
Install Alertmanager
Install Alertmanager to handle alert routing, grouping, and notification delivery to various channels.
sudo useradd --no-create-home --shell /bin/false alertmanager
sudo mkdir -p /etc/alertmanager /var/lib/alertmanager
sudo chown alertmanager:alertmanager /etc/alertmanager /var/lib/alertmanager
cd /tmp
wget https://github.com/prometheus/alertmanager/releases/download/v0.25.0/alertmanager-0.25.0.linux-amd64.tar.gz
tar -xzf alertmanager-0.25.0.linux-amd64.tar.gz
sudo cp alertmanager-0.25.0.linux-amd64/alertmanager /usr/local/bin/
sudo cp alertmanager-0.25.0.linux-amd64/amtool /usr/local/bin/
sudo chown alertmanager:alertmanager /usr/local/bin/alertmanager /usr/local/bin/amtool
Install Fluentd Prometheus plugin
Install the prometheus plugin to expose Fluentd metrics and log counters for Prometheus scraping.
sudo td-agent-gem install fluent-plugin-prometheus
Configure Fluentd for log collection and metrics
Configure Fluentd to collect system logs, parse them for error patterns, and expose metrics for Prometheus. This configuration monitors syslog and nginx access logs.
# Prometheus metrics endpoint
@type prometheus
bind 0.0.0.0
port 24231
metrics_path /metrics
@type prometheus_output_monitor
interval 10
hostname ${hostname}
Monitor syslog for errors
@type tail
path /var/log/syslog
pos_file /var/log/td-agent/syslog.log.pos
tag system.syslog
@type syslog
Monitor nginx access logs
@type tail
path /var/log/nginx/access.log
pos_file /var/log/td-agent/nginx.access.log.pos
tag nginx.access
@type nginx
Monitor nginx error logs
@type tail
path /var/log/nginx/error.log
pos_file /var/log/td-agent/nginx.error.log.pos
tag nginx.error
@type multiline
format_firstline /^\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}/
format1 /^(?
Count error patterns and expose as metrics
@type prometheus
name fluentd_syslog_error_total
type counter
desc Count of syslog errors
hostname ${hostname}
severity ${record['severity']}
@type prometheus
name fluentd_nginx_error_total
type counter
desc Count of nginx errors
hostname ${hostname}
log_level ${record['log_level']}
@type prometheus
name fluentd_nginx_http_requests_total
type counter
desc Count of HTTP requests
hostname ${hostname}
method ${record['method']}
code ${record['code']}
Output to stdout for debugging (optional)
@type stdout
Create Fluentd log directory permissions
Set correct permissions for Fluentd to write position files and access log files. The td-agent user needs read access to system logs and write access to its working directory.
sudo mkdir -p /var/log/td-agent
sudo chown td-agent:td-agent /var/log/td-agent
sudo chmod 755 /var/log/td-agent
Add td-agent to adm group for log access
sudo usermod -a -G adm td-agent
Configure Prometheus to scrape Fluentd metrics
Configure Prometheus to collect metrics from Fluentd and define alerting rules based on log patterns and error rates.
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "/etc/prometheus/alerts.yml"
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'fluentd'
static_configs:
- targets: ['localhost:24231']
scrape_interval: 30s
metrics_path: /metrics
Create alerting rules for log-based monitoring
Define alerting rules that trigger when log patterns indicate problems like high error rates, failed authentication attempts, or service failures.
groups:
- name: log-based-alerts
rules:
- alert: HighNginxErrorRate
expr: rate(fluentd_nginx_error_total[5m]) > 0.1
for: 2m
labels:
severity: warning
service: nginx
annotations:
summary: "High nginx error rate detected"
description: "Nginx error rate is {{ $value }} errors per second on {{ $labels.hostname }}"
- alert: CriticalNginxErrors
expr: rate(fluentd_nginx_error_total{log_level="crit"}[5m]) > 0
for: 1m
labels:
severity: critical
service: nginx
annotations:
summary: "Critical nginx errors detected"
description: "Critical nginx errors occurring on {{ $labels.hostname }}"
- alert: HighSystemLogErrors
expr: rate(fluentd_syslog_error_total{severity="error"}[10m]) > 0.05
for: 5m
labels:
severity: warning
service: system
annotations:
summary: "High system error rate"
description: "System error rate is {{ $value }} per second on {{ $labels.hostname }}"
- alert: HTTP4xxErrors
expr: rate(fluentd_nginx_http_requests_total{code=~"4.."}[5m]) > 2
for: 3m
labels:
severity: warning
service: nginx
annotations:
summary: "High HTTP 4xx error rate"
description: "HTTP 4xx error rate is {{ $value }} per second on {{ $labels.hostname }}"
- alert: HTTP5xxErrors
expr: rate(fluentd_nginx_http_requests_total{code=~"5.."}[5m]) > 0.5
for: 1m
labels:
severity: critical
service: nginx
annotations:
summary: "High HTTP 5xx error rate"
description: "HTTP 5xx error rate is {{ $value }} per second on {{ $labels.hostname }}"
- alert: FluentdDown
expr: up{job="fluentd"} == 0
for: 1m
labels:
severity: critical
service: fluentd
annotations:
summary: "Fluentd is down"
description: "Fluentd has been down for more than 1 minute"
Configure Alertmanager notification channels
Set up Alertmanager to route alerts to different notification channels based on severity and service. This example includes email and Slack integration.
global:
smtp_smarthost: 'smtp.example.com:587'
smtp_from: 'alerts@example.com'
smtp_auth_username: 'alerts@example.com'
smtp_auth_password: 'your-smtp-password'
slack_api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
templates:
- '/etc/alertmanager/templates/*.tmpl'
route:
group_by: ['alertname', 'service']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'default'
routes:
- match:
severity: critical
receiver: 'critical-alerts'
group_wait: 5s
repeat_interval: 15m
- match:
service: nginx
receiver: 'web-team'
- match:
service: system
receiver: 'ops-team'
receivers:
- name: 'default'
email_configs:
- to: 'admin@example.com'
subject: 'Prometheus Alert: {{ .GroupLabels.alertname }}'
body: |
{{ range .Alerts }}
Alert: {{ .Annotations.summary }}
Description: {{ .Annotations.description }}
Labels: {{ range .Labels.SortedPairs }}{{ .Name }}={{ .Value }} {{ end }}
{{ end }}
- name: 'critical-alerts'
email_configs:
- to: 'oncall@example.com'
subject: 'CRITICAL: {{ .GroupLabels.alertname }}'
body: |
{{ range .Alerts }}
CRITICAL ALERT: {{ .Annotations.summary }}
Description: {{ .Annotations.description }}
Started: {{ .StartsAt }}
Labels: {{ range .Labels.SortedPairs }}{{ .Name }}={{ .Value }} {{ end }}
{{ end }}
slack_configs:
- channel: '#alerts'
title: 'Critical Alert: {{ .GroupLabels.alertname }}'
text: |
{{ range .Alerts }}
{{ .Annotations.summary }}
{{ .Annotations.description }}
{{ end }}
color: 'danger'
- name: 'web-team'
slack_configs:
- channel: '#web-team'
title: 'Web Service Alert: {{ .GroupLabels.alertname }}'
text: |
{{ range .Alerts }}
{{ .Annotations.summary }}
{{ .Annotations.description }}
{{ end }}
color: 'warning'
- name: 'ops-team'
email_configs:
- to: 'ops@example.com'
subject: 'System Alert: {{ .GroupLabels.alertname }}'
body: |
{{ range .Alerts }}
System Alert: {{ .Annotations.summary }}
Description: {{ .Annotations.description }}
{{ end }}
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'service']
Create systemd service files
Create systemd service files for Prometheus and Alertmanager to ensure they start automatically and run with proper security constraints.
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries \
--web.listen-address=0.0.0.0:9090 \
--web.enable-lifecycle \
--storage.tsdb.retention.time=90d
Restart=always
RestartSec=10s
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target
[Unit]
Description=Alertmanager
Wants=network-online.target
After=network-online.target
[Service]
User=alertmanager
Group=alertmanager
Type=simple
ExecStart=/usr/local/bin/alertmanager \
--config.file=/etc/alertmanager/alertmanager.yml \
--storage.path=/var/lib/alertmanager/ \
--web.external-url=http://localhost:9093/
Restart=always
RestartSec=10s
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target
Set correct file ownership
Ensure all configuration files have the correct ownership and permissions for the service users.
sudo chown prometheus:prometheus /etc/prometheus/prometheus.yml /etc/prometheus/alerts.yml
sudo chmod 644 /etc/prometheus/prometheus.yml /etc/prometheus/alerts.yml
sudo chown alertmanager:alertmanager /etc/alertmanager/alertmanager.yml
sudo chmod 644 /etc/alertmanager/alertmanager.yml
Enable and start all services
Enable and start Fluentd, Prometheus, and Alertmanager services with proper startup order.
sudo systemctl daemon-reload
Start Fluentd first
sudo systemctl restart td-agent
sudo systemctl enable td-agent
Start Prometheus
sudo systemctl enable prometheus
sudo systemctl start prometheus
Start Alertmanager
sudo systemctl enable alertmanager
sudo systemctl start alertmanager
Configure firewall rules
Open the necessary ports for Prometheus, Alertmanager, and Fluentd metrics endpoint. These services need to communicate with each other and external monitoring tools.
sudo ufw allow 9090/tcp comment 'Prometheus'
sudo ufw allow 9093/tcp comment 'Alertmanager'
sudo ufw allow 24231/tcp comment 'Fluentd metrics'
sudo ufw reload
Verify your setup
Check that all services are running correctly and can communicate with each other.
# Check service status
sudo systemctl status td-agent
sudo systemctl status prometheus
sudo systemctl status alertmanager
Verify Fluentd metrics endpoint
curl http://localhost:24231/metrics
Check Prometheus targets
curl http://localhost:9090/api/v1/targets
Verify alerting rules are loaded
curl http://localhost:9090/api/v1/rules
Check Alertmanager status
curl http://localhost:9093/api/v1/status
Test alert by generating nginx errors (if nginx is installed)
sudo nginx -t || echo "Expected error for testing"
View current alerts
curl http://localhost:9090/api/v1/alerts
Common issues
| Symptom | Cause | Fix |
|---|---|---|
| Fluentd not collecting logs | Permission denied on log files | sudo usermod -a -G adm td-agent && sudo systemctl restart td-agent |
| Prometheus can't scrape Fluentd | Fluentd metrics plugin not loaded | Check /var/log/td-agent/td-agent.log and verify plugin installation |
| Alerts not firing | Incorrect rule syntax or thresholds | /usr/local/bin/promtool check rules /etc/prometheus/alerts.yml |
| Notifications not sent | SMTP/Slack configuration errors | Check Alertmanager logs: sudo journalctl -u alertmanager -f |
| High memory usage | Too many log files or metrics | Adjust retention settings and add log rotation |
| Position file errors | Incorrect permissions on pos files | sudo chown -R td-agent:td-agent /var/log/td-agent |
Next steps
- Set up centralized log aggregation with Elasticsearch, Logstash, and Kibana (ELK Stack)
- Monitor Kubernetes clusters with Prometheus and Grafana for container orchestration insights
- Configure Prometheus long-term storage with Thanos for unlimited data retention
- Set up advanced Fluentd log parsing and filtering for complex log formats
- Integrate Prometheus Alertmanager with PagerDuty and OpsGenie for enterprise alerting
Automated install script
Run this to automate the entire setup
#!/usr/bin/env bash
set -euo pipefail
# Colors for output
readonly RED='\033[0;31m'
readonly GREEN='\033[0;32m'
readonly YELLOW='\033[1;33m'
readonly NC='\033[0m' # No Color
# Script configuration
readonly SCRIPT_NAME="$(basename "$0")"
readonly PROMETHEUS_VERSION="2.45.0"
readonly ALERTMANAGER_VERSION="0.25.0"
# Print colored output
print_status() {
echo -e "${GREEN}[INFO]${NC} $1"
}
print_warning() {
echo -e "${YELLOW}[WARN]${NC} $1"
}
print_error() {
echo -e "${RED}[ERROR]${NC} $1"
}
# Usage message
usage() {
cat << EOF
Usage: $SCRIPT_NAME [OPTIONS]
Install Fluentd, Prometheus, and Alertmanager for log alerting
OPTIONS:
-h, --help Show this help message
--prometheus-port PORT Prometheus port (default: 9090)
--alertmanager-port PORT Alertmanager port (default: 9093)
Example:
$SCRIPT_NAME
$SCRIPT_NAME --prometheus-port 9091 --alertmanager-port 9094
EOF
}
# Cleanup on error
cleanup() {
local exit_code=$?
if [ $exit_code -ne 0 ]; then
print_error "Installation failed. Cleaning up..."
systemctl stop td-agent prometheus alertmanager 2>/dev/null || true
systemctl disable td-agent prometheus alertmanager 2>/dev/null || true
rm -f /etc/systemd/system/prometheus.service /etc/systemd/system/alertmanager.service
systemctl daemon-reload
fi
exit $exit_code
}
trap cleanup ERR
# Parse command line arguments
PROMETHEUS_PORT=9090
ALERTMANAGER_PORT=9093
while [[ $# -gt 0 ]]; do
case $1 in
-h|--help)
usage
exit 0
;;
--prometheus-port)
PROMETHEUS_PORT="$2"
shift 2
;;
--alertmanager-port)
ALERTMANAGER_PORT="$2"
shift 2
;;
*)
print_error "Unknown option: $1"
usage
exit 1
;;
esac
done
# Check prerequisites
check_prerequisites() {
print_status "Checking prerequisites..."
if [[ $EUID -ne 0 ]]; then
print_error "This script must be run as root"
exit 1
fi
if ! command -v systemctl >/dev/null 2>&1; then
print_error "systemd is required"
exit 1
fi
}
# Detect distribution
detect_distro() {
if [ -f /etc/os-release ]; then
. /etc/os-release
case "$ID" in
ubuntu|debian)
PKG_MGR="apt"
PKG_UPDATE="apt update -y"
PKG_INSTALL="apt install -y"
PKG_UPGRADE="apt upgrade -y"
;;
almalinux|rocky|centos|rhel|ol)
PKG_MGR="dnf"
PKG_UPDATE="dnf update -y"
PKG_INSTALL="dnf install -y"
PKG_UPGRADE="dnf upgrade -y"
;;
fedora)
PKG_MGR="dnf"
PKG_UPDATE="dnf update -y"
PKG_INSTALL="dnf install -y"
PKG_UPGRADE="dnf upgrade -y"
;;
amzn)
PKG_MGR="yum"
PKG_UPDATE="yum update -y"
PKG_INSTALL="yum install -y"
PKG_UPGRADE="yum upgrade -y"
;;
*)
print_error "Unsupported distribution: $ID"
exit 1
;;
esac
else
print_error "Cannot detect distribution (/etc/os-release not found)"
exit 1
fi
print_status "Detected distribution: $ID"
}
# Update system packages
update_system() {
print_status "[1/8] Updating system packages..."
$PKG_UPDATE
$PKG_UPGRADE
$PKG_INSTALL curl wget gnupg2 tar
}
# Install Fluentd
install_fluentd() {
print_status "[2/8] Installing Fluentd (td-agent)..."
case "$ID" in
ubuntu|debian)
curl -fsSL https://toolbelt.treasuredata.com/sh/install-ubuntu-noble-td-agent4.sh | sh
;;
almalinux|rocky|centos|rhel|ol|fedora|amzn)
curl -fsSL https://toolbelt.treasuredata.com/sh/install-redhat-td-agent4.sh | sh
;;
esac
systemctl enable td-agent
systemctl start td-agent
}
# Install Prometheus
install_prometheus() {
print_status "[3/8] Installing Prometheus..."
useradd --no-create-home --shell /bin/false prometheus || true
mkdir -p /etc/prometheus /var/lib/prometheus
chown prometheus:prometheus /etc/prometheus /var/lib/prometheus
cd /tmp
wget "https://github.com/prometheus/prometheus/releases/download/v${PROMETHEUS_VERSION}/prometheus-${PROMETHEUS_VERSION}.linux-amd64.tar.gz"
tar -xzf "prometheus-${PROMETHEUS_VERSION}.linux-amd64.tar.gz"
cp "prometheus-${PROMETHEUS_VERSION}.linux-amd64/prometheus" /usr/local/bin/
cp "prometheus-${PROMETHEUS_VERSION}.linux-amd64/promtool" /usr/local/bin/
chown prometheus:prometheus /usr/local/bin/prometheus /usr/local/bin/promtool
chmod 755 /usr/local/bin/prometheus /usr/local/bin/promtool
cp -r "prometheus-${PROMETHEUS_VERSION}.linux-amd64/consoles" /etc/prometheus/
cp -r "prometheus-${PROMETHEUS_VERSION}.linux-amd64/console_libraries" /etc/prometheus/
chown -R prometheus:prometheus /etc/prometheus/consoles /etc/prometheus/console_libraries
rm -rf "prometheus-${PROMETHEUS_VERSION}.linux-amd64"*
}
# Install Alertmanager
install_alertmanager() {
print_status "[4/8] Installing Alertmanager..."
useradd --no-create-home --shell /bin/false alertmanager || true
mkdir -p /etc/alertmanager /var/lib/alertmanager
chown alertmanager:alertmanager /etc/alertmanager /var/lib/alertmanager
cd /tmp
wget "https://github.com/prometheus/alertmanager/releases/download/v${ALERTMANAGER_VERSION}/alertmanager-${ALERTMANAGER_VERSION}.linux-amd64.tar.gz"
tar -xzf "alertmanager-${ALERTMANAGER_VERSION}.linux-amd64.tar.gz"
cp "alertmanager-${ALERTMANAGER_VERSION}.linux-amd64/alertmanager" /usr/local/bin/
cp "alertmanager-${ALERTMANAGER_VERSION}.linux-amd64/amtool" /usr/local/bin/
chown alertmanager:alertmanager /usr/local/bin/alertmanager /usr/local/bin/amtool
chmod 755 /usr/local/bin/alertmanager /usr/local/bin/amtool
rm -rf "alertmanager-${ALERTMANAGER_VERSION}.linux-amd64"*
}
# Install Fluentd plugins
install_fluentd_plugins() {
print_status "[5/8] Installing Fluentd Prometheus plugin..."
td-agent-gem install fluent-plugin-prometheus
}
# Configure Fluentd
configure_fluentd() {
print_status "[6/8] Configuring Fluentd..."
cat > /etc/td-agent/td-agent.conf << 'EOF'
<source>
@type prometheus
bind 0.0.0.0
port 24231
metrics_path /metrics
</source>
<source>
@type prometheus_output_monitor
interval 10
<labels>
hostname ${hostname}
</labels>
</source>
<source>
@type tail
path /var/log/syslog,/var/log/messages
pos_file /var/log/td-agent/syslog.log.pos
tag system.syslog
<parse>
@type syslog
</parse>
</source>
<filter system.syslog>
@type prometheus
<metric>
name fluentd_input_status_num_records_total
type counter
desc The total number of incoming records
<labels>
tag ${tag}
hostname ${hostname}
</labels>
</metric>
</filter>
<match system.syslog>
@type null
</match>
EOF
chown td-agent:td-agent /etc/td-agent/td-agent.conf
chmod 644 /etc/td-agent/td-agent.conf
systemctl restart td-agent
}
# Configure Prometheus and Alertmanager
configure_services() {
print_status "[7/8] Configuring Prometheus and Alertmanager..."
# Prometheus configuration
cat > /etc/prometheus/prometheus.yml << EOF
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:${ALERTMANAGER_PORT}
rule_files:
- "alert_rules.yml"
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:${PROMETHEUS_PORT}']
- job_name: 'fluentd'
static_configs:
- targets: ['localhost:24231']
EOF
# Alert rules
cat > /etc/prometheus/alert_rules.yml << 'EOF'
groups:
- name: fluentd.rules
rules:
- alert: FluentdDown
expr: up{job="fluentd"} == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Fluentd is down"
description: "Fluentd has been down for more than 5 minutes"
EOF
# Alertmanager configuration
cat > /etc/alertmanager/alertmanager.yml << 'EOF'
global:
smtp_smarthost: 'localhost:587'
smtp_from: 'alertmanager@example.com'
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'web.hook'
receivers:
- name: 'web.hook'
webhook_configs:
- url: 'http://127.0.0.1:5001/'
EOF
chown -R prometheus:prometheus /etc/prometheus/
chown -R alertmanager:alertmanager /etc/alertmanager/
chmod 644 /etc/prometheus/*.yml /etc/alertmanager/*.yml
# Create systemd services
cat > /etc/systemd/system/prometheus.service << EOF
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \\
--config.file /etc/prometheus/prometheus.yml \\
--storage.tsdb.path /var/lib/prometheus/ \\
--web.console.templates=/etc/prometheus/consoles \\
--web.console.libraries=/etc/prometheus/console_libraries \\
--web.listen-address=0.0.0.0:${PROMETHEUS_PORT}
[Install]
WantedBy=multi-user.target
EOF
cat > /etc/systemd/system/alertmanager.service << EOF
[Unit]
Description=Alertmanager
Wants=network-online.target
After=network-online.target
[Service]
User=alertmanager
Group=alertmanager
Type=simple
ExecStart=/usr/local/bin/alertmanager \\
--config.file=/etc/alertmanager/alertmanager.yml \\
--storage.path=/var/lib/alertmanager/ \\
--web.listen-address=0.0.0.0:${ALERTMANAGER_PORT}
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable prometheus alertmanager
systemctl start prometheus alertmanager
}
# Verify installation
verify_installation() {
print_status "[8/8] Verifying installation..."
local services=("td-agent" "prometheus" "alertmanager")
local ports=("24231" "$PROMETHEUS_PORT" "$ALERTMANAGER_PORT")
local all_good=true
for service in "${services[@]}"; do
if systemctl is-active --quiet "$service"; then
print_status "$service is running"
else
print_error "$service is not running"
all_good=false
fi
done
for port in "${ports[@]}"; do
if ss -tlnp | grep -q ":$port "; then
print_status "Port $port is listening"
else
print_error "Port $port is not listening"
all_good=false
fi
done
if $all_good; then
print_status "Installation completed successfully!"
echo ""
print_status "Access URLs:"
print_status " Prometheus: http://$(hostname -I | awk '{print $1}'):$PROMETHEUS_PORT"
print_status " Alertmanager: http://$(hostname -I | awk '{print $1}'):$ALERTMANAGER_PORT"
print_status " Fluentd metrics: http://$(hostname -I | awk '{print $1}'):24231/metrics"
else
print_error "Some services are not running properly. Check systemctl status for details."
exit 1
fi
}
# Main installation flow
main() {
check_prerequisites
detect_distro
update_system
install_fluentd
install_prometheus
install_alertmanager
install_fluentd_plugins
configure_fluentd
configure_services
verify_installation
}
main "$@"
Review the script before running. Execute with: bash install.sh