Implement log-based monitoring and alerting with Grafana and Loki

Intermediate 45 min Apr 26, 2026 99 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Set up comprehensive log aggregation with Loki, centralized visualization with Grafana dashboards, and automated alerting rules for proactive incident response across your infrastructure.

Prerequisites

  • Root or sudo access
  • 4GB RAM minimum
  • 10GB available disk space
  • Network connectivity for package downloads

What this solves

Log-based monitoring provides real-time insights into application behavior, system errors, and performance patterns that metrics alone cannot capture. This tutorial implements a complete logging stack with Loki for efficient log aggregation, Promtail for automated collection, and Grafana for visualization and alerting.

Step-by-step installation

Update system packages

Start by updating your package manager to ensure you get the latest versions of dependencies.

sudo apt update && sudo apt upgrade -y
sudo apt install -y curl wget unzip
sudo dnf update -y
sudo dnf install -y curl wget unzip

Create system users

Create dedicated users for Loki and Promtail services with restricted privileges for security.

sudo useradd --system --no-create-home --shell /bin/false loki
sudo useradd --system --no-create-home --shell /bin/false promtail

Download and install Loki

Download the latest Loki binary and install it to the system path.

cd /tmp
wget https://github.com/grafana/loki/releases/download/v2.9.3/loki-linux-amd64.zip
unzip loki-linux-amd64.zip
sudo mv loki-linux-amd64 /usr/local/bin/loki
sudo chmod +x /usr/local/bin/loki

Create Loki directories and configuration

Set up the directory structure and configuration file for Loki with proper permissions.

sudo mkdir -p /etc/loki /var/lib/loki
sudo chown -R loki:loki /etc/loki /var/lib/loki
sudo chmod 755 /etc/loki /var/lib/loki

Configure Loki

Create the main configuration file with optimized settings for log retention and performance.

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

ingester:
  wal:
    enabled: true
    dir: /var/lib/loki/wal
  lifecycler:
    address: 127.0.0.1
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1
    final_sleep: 0s
  chunk_idle_period: 1h
  max_chunk_age: 1h
  chunk_target_size: 1048576
  chunk_retain_period: 30s
  max_transfer_retries: 0

schema_config:
  configs:
    - from: 2020-10-24
      store: boltdb-shipper
      object_store: filesystem
      schema: v11
      index:
        prefix: index_
        period: 24h

storage_config:
  boltdb_shipper:
    active_index_directory: /var/lib/loki/boltdb-shipper-active
    cache_location: /var/lib/loki/boltdb-shipper-cache
    cache_ttl: 24h
    shared_store: filesystem
  filesystem:
    directory: /var/lib/loki/chunks

compactor:
  working_directory: /var/lib/loki
  shared_store: filesystem

limits_config:
  reject_old_samples: true
  reject_old_samples_max_age: 168h
  ingestion_rate_mb: 16
  ingestion_burst_size_mb: 32

chunk_store_config:
  max_look_back_period: 0s

table_manager:
  retention_deletes_enabled: false
  retention_period: 0s

ruler:
  storage:
    type: local
    local:
      directory: /var/lib/loki/rules
  rule_path: /var/lib/loki/rules
  alertmanager_url: http://localhost:9093
  ring:
    kvstore:
      store: inmemory
  enable_api: true

Create Loki systemd service

Set up the systemd service file to manage Loki as a system service.

[Unit]
Description=Loki service
After=network.target

[Service]
Type=simple
User=loki
Group=loki
ExecStart=/usr/local/bin/loki -config.file /etc/loki/loki.yml
Restart=on-failure
RestartSec=20
StandardOutput=journal
StandardError=journal
SyslogIdentifier=loki
KillMode=mixed
KillSignal=SIGTERM

[Install]
WantedBy=multi-user.target

Download and install Promtail

Download Promtail for log collection and forward it to Loki.

cd /tmp
wget https://github.com/grafana/loki/releases/download/v2.9.3/promtail-linux-amd64.zip
unzip promtail-linux-amd64.zip
sudo mv promtail-linux-amd64 /usr/local/bin/promtail
sudo chmod +x /usr/local/bin/promtail

Create Promtail directories

Set up directories for Promtail configuration and position tracking.

sudo mkdir -p /etc/promtail /var/lib/promtail
sudo chown -R promtail:promtail /etc/promtail /var/lib/promtail
sudo chmod 755 /etc/promtail /var/lib/promtail

Configure Promtail

Create the Promtail configuration to collect system logs and application logs.

server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /var/lib/promtail/positions.yaml

clients:
  - url: http://localhost:3100/loki/api/v1/push

scrape_configs:
  - job_name: system
    static_configs:
      - targets:
          - localhost
        labels:
          job: varlogs
          __path__: /var/log/*log

  - job_name: syslog
    static_configs:
      - targets:
          - localhost
        labels:
          job: syslog
          __path__: /var/log/syslog

  - job_name: auth
    static_configs:
      - targets:
          - localhost
        labels:
          job: auth
          __path__: /var/log/auth.log

  - job_name: nginx
    static_configs:
      - targets:
          - localhost
        labels:
          job: nginx
          __path__: /var/log/nginx/*log

  - job_name: apache
    static_configs:
      - targets:
          - localhost
        labels:
          job: apache
          __path__: /var/log/apache2/*log

  - job_name: docker
    static_configs:
      - targets:
          - localhost
        labels:
          job: docker
          __path__: /var/lib/docker/containers//log
    pipeline_stages:
      - json:
          expressions:
            output: log
            stream: stream
            attrs:
      - json:
          expressions:
            tag:
          source: attrs
      - regex:
          expression: (?P(?:[^|]*))
          source: tag
      - timestamp:
          format: RFC3339Nano
          source: time
      - labels:
          stream:
          container_name:
      - output:
          source: output

Create Promtail systemd service

Set up the systemd service for Promtail with proper user permissions.

[Unit]
Description=Promtail service
After=network.target

[Service]
Type=simple
User=promtail
Group=promtail
ExecStart=/usr/local/bin/promtail -config.file /etc/promtail/promtail.yml
Restart=on-failure
RestartSec=20
StandardOutput=journal
StandardError=journal
SyslogIdentifier=promtail
KillMode=mixed
KillSignal=SIGTERM
SupplementaryGroups=adm

[Install]
WantedBy=multi-user.target

Configure log file permissions

Add Promtail user to necessary groups to read system logs without compromising security.

sudo usermod -aG adm promtail
sudo usermod -aG systemd-journal promtail
Never use chmod 777. Instead of making log files world-readable, we add the promtail user to the appropriate groups (adm, systemd-journal) that already have the correct permissions.

Install and configure Grafana

Install Grafana for log visualization and dashboard creation. We'll use the existing Grafana installation if you followed our Grafana setup tutorial.

wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list
sudo apt update
sudo apt install -y grafana
sudo tee /etc/yum.repos.d/grafana.repo<

Start and enable services

Enable and start all services in the correct order.

sudo systemctl daemon-reload
sudo systemctl enable --now loki
sudo systemctl enable --now promtail
sudo systemctl enable --now grafana-server

Configure firewall rules

Open necessary ports for Grafana web interface and Loki API.

sudo ufw allow 3000/tcp comment "Grafana"
sudo ufw allow 3100/tcp comment "Loki"
sudo ufw reload
sudo firewall-cmd --permanent --add-port=3000/tcp
sudo firewall-cmd --permanent --add-port=3100/tcp
sudo firewall-cmd --reload

Add Loki as Grafana data source

Configure Loki as a data source in Grafana through the web interface or API.

curl -X POST \
  http://admin:admin@localhost:3000/api/datasources \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "Loki",
    "type": "loki",
    "url": "http://localhost:3100",
    "access": "proxy",
    "basicAuth": false,
    "isDefault": false
  }'

Create log dashboard

Import a comprehensive dashboard for log monitoring and analysis.

{
  "dashboard": {
    "id": null,
    "title": "System Logs",
    "tags": ["logs"],
    "timezone": "browser",
    "panels": [
      {
        "id": 1,
        "title": "Log Volume",
        "type": "stat",
        "targets": [
          {
            "expr": "sum(count_over_time({job=~\".*\"}[5m]))",
            "refId": "A"
          }
        ],
        "gridPos": {"h": 8, "w": 12, "x": 0, "y": 0}
      },
      {
        "id": 2,
        "title": "Recent Logs",
        "type": "logs",
        "targets": [
          {
            "expr": "{job=~\".*\"}",
            "refId": "A"
          }
        ],
        "gridPos": {"h": 12, "w": 24, "x": 0, "y": 8},
        "options": {
          "showTime": true,
          "showLabels": true,
          "showCommonLabels": false,
          "wrapLogMessage": true,
          "enableLogDetails": true
        }
      }
    ],
    "time": {
      "from": "now-1h",
      "to": "now"
    },
    "timepicker": {},
    "refresh": "5s"
  }
}
curl -X POST \
  http://admin:admin@localhost:3000/api/dashboards/db \
  -H 'Content-Type: application/json' \
  -d @/tmp/logs-dashboard.json

Configure alerting rules

Create alerting rules for critical log patterns and error rates.

groups:
  - name: log_alerts
    rules:
      - alert: HighErrorRate
        expr: |
          (
            sum(rate({job=~".*"} |~ "(?i)error" [5m])) by (job)
            /
            sum(rate({job=~".*"}[5m])) by (job)
          ) > 0.1
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "High error rate detected in {{ $labels.job }}"
          description: "Error rate is {{ $value | humanizePercentage }} in job {{ $labels.job }}"

      - alert: AuthenticationFailures
        expr: |
          sum(rate({job="auth"} |~ "Failed password" [5m])) > 5
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "High authentication failure rate"
          description: "{{ $value }} authentication failures per second detected"

      - alert: DiskSpaceWarning
        expr: |
          sum(rate({job=~".*"} |~ "No space left on device" [5m])) > 0
        for: 0m
        labels:
          severity: critical
        annotations:
          summary: "Disk space warning detected"
          description: "Disk space issues detected in logs"

      - alert: ServiceDown
        expr: |
          sum(rate({job=~"."} |~ "(?i)(service|daemon) . (failed|stopped|crashed)" [5m])) > 0
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "Service failure detected"
          description: "Service failure patterns detected in logs"
sudo chown -R loki:loki /var/lib/loki/rules
sudo systemctl restart loki

Install and configure Alertmanager

Install Alertmanager to handle alert notifications from Loki rules.

cd /tmp
wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz
tar xzf alertmanager-0.26.0.linux-amd64.tar.gz
sudo mv alertmanager-0.26.0.linux-amd64/alertmanager /usr/local/bin/
sudo mv alertmanager-0.26.0.linux-amd64/amtool /usr/local/bin/
sudo chmod +x /usr/local/bin/alertmanager /usr/local/bin/amtool

Configure Alertmanager

Set up Alertmanager with email notifications for log-based alerts.

sudo useradd --system --no-create-home --shell /bin/false alertmanager
sudo mkdir -p /etc/alertmanager /var/lib/alertmanager
sudo chown -R alertmanager:alertmanager /etc/alertmanager /var/lib/alertmanager
global:
  smtp_smarthost: 'localhost:587'
  smtp_from: 'alerts@example.com'
  smtp_auth_username: 'alerts@example.com'
  smtp_auth_password: 'your-smtp-password'

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'web.hook'

receivers:
  - name: 'web.hook'
    email_configs:
      - to: 'admin@example.com'
        subject: 'Log Alert: {{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'
        body: |
          {{ range .Alerts }}
          Alert: {{ .Annotations.summary }}
          Description: {{ .Annotations.description }}
          Labels: {{ range .Labels.SortedPairs }}{{ .Name }}={{ .Value }} {{ end }}
          {{ end }}

inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

Create Alertmanager service

Set up systemd service for Alertmanager.

[Unit]
Description=Alertmanager
Wants=network-online.target
After=network-online.target

[Service]
User=alertmanager
Group=alertmanager
Type=simple
ExecStart=/usr/local/bin/alertmanager \
  --config.file /etc/alertmanager/alertmanager.yml \
  --storage.path /var/lib/alertmanager/ \
  --web.console.libraries=/etc/alertmanager/console_libraries \
  --web.console.templates=/etc/alertmanager/consoles \
  --web.listen-address=0.0.0.0:9093
Restart=always

[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable --now alertmanager

Verify your setup

Check that all services are running correctly and data is being collected.

# Check service status
sudo systemctl status loki promtail grafana-server alertmanager

Verify Loki is receiving logs

curl -s "http://localhost:3100/loki/api/v1/labels" | jq

Check recent logs

curl -s "http://localhost:3100/loki/api/v1/query_range?query={job=~\".*\"}&start=$(date -d '1 hour ago' --iso-8601)&end=$(date --iso-8601)" | jq '.data.result[] | .values[-1]'

Test alerting rules

curl -s "http://localhost:3100/loki/api/v1/rules" | jq

Access Grafana

echo "Grafana: http://$(hostname -I | awk '{print $1}'):3000 (admin/admin)" echo "Loki API: http://$(hostname -I | awk '{print $1}'):3100" echo "Alertmanager: http://$(hostname -I | awk '{print $1}'):9093"
Note: Change the default Grafana admin password on first login. Navigate to the Explore section to run LogQL queries against your logs.

Configure advanced dashboards

Create application-specific dashboards

Set up targeted dashboards for different log sources with relevant metrics and filters.

# Create nginx dashboard variables
curl -X POST \
  'http://admin:admin@localhost:3000/api/dashboards/db' \
  -H 'Content-Type: application/json' \
  -d '{
    "dashboard": {
      "title": "Nginx Logs",
      "panels": [
        {
          "title": "HTTP Status Codes",
          "type": "stat",
          "targets": [{
            "expr": "sum by (status) (count_over_time({job=\"nginx\"} | pattern \"<_> <_> [<_>] \\\"<_> <_> <_>\\\"  <_> <_> <_>\" [5m]))"
          }]
        },
        {
          "title": "Top IPs",
          "type": "stat", 
          "targets": [{
            "expr": "topk(10, sum by (ip) (count_over_time({job=\"nginx\"} | pattern \" <_> [<_>] \\\"<_> <_> <_>\\\" <_> <_> <_> <_>\" [5m])))"
          }]
        }
      ]
    }
  }'

Set up log parsing and filtering

Configure advanced LogQL queries for structured log analysis with the NGINX log analysis guide.

# Search for errors across all services
{job=~".*"} |~ "(?i)error|exception|fatal"

Parse nginx access logs for status codes

{job="nginx"} | pattern " <_> [<_>] \" <_>\" <_> <_>"

Find failed SSH attempts

{job="auth"} |~ "Failed password"

Application errors with context

{job="app"} |= "error" | json | line_format "{{.timestamp}} [{{.level}}] {{.message}}"

Rate of errors per minute

sum(rate({job=~".*"} |~ "(?i)error" [1m]))

Common issues

SymptomCauseFix
Promtail can't read logs Insufficient permissions Add promtail user to adm and systemd-journal groups: sudo usermod -aG adm,systemd-journal promtail
Loki high memory usage Too many active streams Increase ingestion_rate_mb and ingestion_burst_size_mb in loki.yml
Missing logs in Grafana Clock skew between systems Synchronize time with sudo chrony sources -v and check reject_old_samples_max_age
Alerts not firing Incorrect LogQL syntax Test queries in Grafana Explore before creating alert rules
Dashboard shows no data Data source not configured Verify Loki URL in data source settings: curl http://localhost:3100/ready
High disk usage No log retention policy Configure retention in table_manager section and enable retention_deletes_enabled: true

Next steps

Running this in production?

Want this handled for you? Setting this up once is straightforward. Keeping it patched, monitored, backed up and performant across environments is the harder part. See how we run infrastructure like this for European SaaS and e-commerce teams.

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle managed devops services for businesses that depend on uptime. From initial setup to ongoing operations.