Build comprehensive Grafana dashboards to visualize collectd system metrics stored in InfluxDB with custom panels, advanced queries, and automated alerting rules for production monitoring.
Prerequisites
- Working collectd and InfluxDB setup
- Grafana installed and accessible
- Basic knowledge of InfluxQL queries
- System administrator access
What this solves
This tutorial helps you create production-grade Grafana dashboards for monitoring Linux system metrics collected by collectd and stored in InfluxDB. You'll learn to configure data sources, import dashboard templates, create custom visualizations, and set up intelligent alerting for proactive system monitoring.
Prerequisites
Before starting, ensure you have a working collectd and InfluxDB setup as covered in our Linux performance monitoring with collectd and InfluxDB tutorial. You'll also need Grafana installed and accessible via web interface.
Step-by-step configuration
Configure InfluxDB data source in Grafana
Add your InfluxDB instance as a data source to enable Grafana to query collectd metrics.
curl -X GET http://localhost:8086/ping
Access Grafana web interface and navigate to Configuration > Data Sources > Add data source. Select InfluxDB and configure the connection settings.
URL: http://localhost:8086
Database: collectd
User: collectd_user
Password: your_secure_password
HTTP Method: GET
Test the connection to verify Grafana can communicate with InfluxDB successfully.
Import collectd dashboard template
Download and import a comprehensive collectd dashboard template to get started with system monitoring visualizations.
curl -O https://grafana.com/api/dashboards/52/revisions/2/download
mv download collectd-dashboard.json
In Grafana, navigate to Dashboards > Import and upload the collectd-dashboard.json file. Configure the dashboard to use your InfluxDB data source and adjust the time range settings.
Create custom CPU utilization panel
Build a custom panel to monitor CPU utilization across all cores with advanced visualization options.
Create a new dashboard and add a Time Series panel with the following InfluxQL query:
SELECT mean("value") FROM "cpu_value" WHERE ("type_instance" = "idle" AND "host" =~ /^$host$/) AND $timeFilter GROUP BY time($__interval), "host" fill(null)
Transform the idle CPU values to show actual utilization:
Transform: Add field from calculation
Mode: Reduce row
Calculation: Difference
Alias: CPU Usage %
Formula: 100 - ${__field.name}
Configure memory usage visualization
Create a comprehensive memory monitoring panel showing used, cached, and available memory.
SELECT mean("value") FROM "memory_value" WHERE ("type_instance" =~ /^(used|cached|free|buffered)$/ AND "host" =~ /^$host$/) AND $timeFilter GROUP BY time($__interval), "type_instance", "host" fill(null)
Configure the panel as a stacked area chart to visualize memory allocation over time. Set appropriate colors and legends for each memory type.
Create disk I/O monitoring panel
Monitor disk read/write operations and throughput with separate panels for IOPS and bandwidth.
SELECT derivative(mean("value"), 1s) FROM "disk_ops" WHERE ("host" =~ /^$host$/ AND "instance" =~ /^$disk$/) AND $timeFilter GROUP BY time($__interval), "type_instance", "instance" fill(null)
SELECT derivative(mean("value"), 1s) FROM "disk_octets" WHERE ("host" =~ /^$host$/ AND "instance" =~ /^$disk$/) AND $timeFilter GROUP BY time($__interval), "type_instance", "instance" fill(null)
Configure network interface monitoring
Create panels to monitor network traffic, packet rates, and error counters for network interfaces.
SELECT derivative(mean("value"), 1s) * 8 FROM "interface_octets" WHERE ("host" =~ /^$host$/ AND "instance" =~ /^$interface$/) AND $timeFilter GROUP BY time($__interval), "type_instance", "instance" fill(null)
Configure the panel to show traffic in bits per second and add appropriate unit formatting (bps, Kbps, Mbps).
Set up dashboard variables
Create template variables to make dashboards dynamic and reusable across multiple hosts and components.
Navigate to Dashboard Settings > Variables and create the following variables:
Name: host
Type: Query
Data source: InfluxDB
Query: SHOW TAG VALUES FROM "cpu_value" WITH KEY = "host"
Multi-value: true
Include All option: true
Name: disk
Type: Query
Data source: InfluxDB
Query: SHOW TAG VALUES FROM "disk_ops" WITH KEY = "instance" WHERE "host" =~ /^$host$/
Multi-value: true
Include All option: true
Configure alerting rules
Set up intelligent alerting rules to notify you of system performance issues before they become critical.
Create a new alert rule for high CPU utilization:
Rule Name: High CPU Usage
Query: SELECT mean("value") FROM "cpu_value" WHERE "type_instance" = "idle" AND "host" =~ /^$host$/ AND $timeFilter
Condition: IS BELOW 20 (for idle CPU, meaning >80% usage)
Evaluation: Every 1m for 5m
No Data State: Alerting
Execution Error State: Alerting
Set up notification channels
Configure notification channels to receive alerts via email, Slack, or other communication platforms.
Navigate to Alerting > Notification channels and create an email notification:
Name: System Alerts Email
Type: Email
Email addresses: admin@example.com, ops-team@example.com
Subject: [ALERT] {{range .Alerts}}{{.AlertName}}{{end}}
Message: System alert triggered on {{range .Alerts}}{{.Labels.host}}{{end}}
Create memory alert rule
Set up alerting for low available memory conditions to prevent out-of-memory situations.
Rule Name: Low Available Memory
Query: SELECT mean("value") FROM "memory_value" WHERE "type_instance" = "free" AND "host" =~ /^$host$/ AND $timeFilter
Condition: IS BELOW 500000000 (500MB in bytes)
Evaluation: Every 30s for 2m
Notification: System Alerts Email
Configure disk space monitoring alert
Monitor disk usage and alert when filesystems approach capacity limits.
SELECT last("value") FROM "df_complex" WHERE "type_instance" = "used" AND "host" =~ /^$host$/ AND "instance" =~ /^root$/ AND $timeFilter
Create a threshold alert when disk usage exceeds 85% of total capacity.
Advanced dashboard customization
Configure custom time ranges
Set up dashboard-specific time ranges and refresh intervals for optimal monitoring experience.
Default time range: Last 1 hour
Refresh intervals: 5s,10s,30s,1m,5m,15m,30m,1h
Auto-refresh: 30s
Timezone: Browser
Implement threshold visualization
Add threshold lines to panels to visualize warning and critical levels directly on graphs.
CPU Panel Thresholds:
- Warning: 70 (yellow)
- Critical: 85 (red)
Memory Panel Thresholds:
- Warning: 80% (orange)
- Critical: 90% (red)
Create status overview panel
Build a single stat panel showing overall system health status using multiple metrics.
SELECT mean("load_shortterm") FROM "load_shortterm" WHERE "host" =~ /^$host$/ AND $timeFilter
Configure the panel to show system load average with color-coded thresholds and trend indicators.
Verify your setup
Test your dashboard configuration and alerting rules to ensure everything works correctly.
# Check InfluxDB connectivity
curl -X GET "http://localhost:8086/query?q=SHOW%20DATABASES"
Verify collectd data is flowing
curl -X GET "http://localhost:8086/query?db=collectd&q=SELECT%20*%20FROM%20cpu_value%20LIMIT%205"
Test Grafana API
curl -X GET http://admin:admin@localhost:3000/api/health
Access your Grafana dashboard and verify that:
- All panels display data correctly
- Template variables filter data appropriately
- Alert rules trigger under test conditions
- Notifications are delivered to configured channels
SHOW MEASUREMENTS and SHOW FIELD KEYS.Performance optimization
For high-volume metrics collection, consider implementing the following optimizations covered in our related monitoring tutorials:
- Configure retention policies in InfluxDB to manage disk usage
- Implement continuous queries for pre-aggregated data
- Use appropriate dashboard refresh intervals to reduce query load
- Consider implementing caching layers for frequently accessed dashboards
Common issues
| Symptom | Cause | Fix |
|---|---|---|
| No data in panels | InfluxDB connection issues | Verify data source configuration and test connection |
| Alert rules not triggering | Incorrect query syntax | Test queries in Explore view and adjust thresholds |
| Dashboard loading slowly | Too many data points | Increase time interval aggregation and reduce time range |
| Missing metrics | Collectd plugins not enabled | Check collectd configuration and restart service |
| Notifications not delivered | SMTP or webhook misconfiguration | Test notification channels and check Grafana logs |
Next steps
- Monitor Istio service mesh with Prometheus and Grafana dashboards
- Configure Apache Airflow monitoring with Prometheus alerts and Grafana dashboards
- Configure Grafana LDAP authentication and role-based access control
- Implement Grafana high availability clustering for enterprise monitoring
- Configure custom Grafana plugins for specialized monitoring requirements
Automated install script
Run this to automate the entire setup
#!/usr/bin/env bash
set -euo pipefail
# Colors for output
readonly RED='\033[0;31m'
readonly GREEN='\033[0;32m'
readonly YELLOW='\033[1;33m'
readonly NC='\033[0m'
# Global variables
INFLUXDB_USER="${1:-collectd_user}"
INFLUXDB_PASSWORD="${2:-$(openssl rand -base64 32)}"
GRAFANA_ADMIN_PASSWORD="${3:-$(openssl rand -base64 32)}"
usage() {
echo "Usage: $0 [influxdb_user] [influxdb_password] [grafana_admin_password]"
echo " influxdb_user: InfluxDB user for Grafana (default: collectd_user)"
echo " influxdb_password: Password for InfluxDB user (default: auto-generated)"
echo " grafana_admin_password: Grafana admin password (default: auto-generated)"
exit 1
}
log() {
echo -e "${GREEN}[INFO]${NC} $1"
}
warn() {
echo -e "${YELLOW}[WARN]${NC} $1"
}
error() {
echo -e "${RED}[ERROR]${NC} $1" >&2
}
cleanup() {
error "Script failed. Check logs above for details."
exit 1
}
trap cleanup ERR
check_prerequisites() {
log "Checking prerequisites..."
if [[ $EUID -ne 0 ]]; then
error "This script must be run as root"
exit 1
fi
if ! command -v curl &> /dev/null; then
error "curl is required but not installed"
exit 1
fi
if ! systemctl is-active --quiet influxdb; then
error "InfluxDB is not running. Please install and configure InfluxDB first."
exit 1
fi
}
detect_distro() {
if [ -f /etc/os-release ]; then
. /etc/os-release
case "$ID" in
ubuntu|debian)
PKG_MGR="apt"
PKG_INSTALL="apt install -y"
PKG_UPDATE="apt update"
GRAFANA_CONFIG="/etc/grafana/grafana.ini"
FIREWALL_CMD="ufw"
;;
almalinux|rocky|centos|rhel|ol|fedora)
PKG_MGR="dnf"
PKG_INSTALL="dnf install -y"
PKG_UPDATE="dnf update -y"
GRAFANA_CONFIG="/etc/grafana/grafana.ini"
FIREWALL_CMD="firewall-cmd"
;;
amzn)
PKG_MGR="yum"
PKG_INSTALL="yum install -y"
PKG_UPDATE="yum update -y"
GRAFANA_CONFIG="/etc/grafana/grafana.ini"
FIREWALL_CMD="firewall-cmd"
;;
*)
error "Unsupported distribution: $ID"
exit 1
;;
esac
else
error "Cannot detect distribution"
exit 1
fi
}
install_grafana() {
echo "[2/7] Installing Grafana..."
if command -v grafana-server &> /dev/null; then
log "Grafana already installed"
return
fi
case "$PKG_MGR" in
apt)
curl -fsSL https://packages.grafana.com/gpg.key | gpg --dearmor -o /usr/share/keyrings/grafana.gpg
echo "deb [signed-by=/usr/share/keyrings/grafana.gpg] https://packages.grafana.com/oss/deb stable main" > /etc/apt/sources.list.d/grafana.list
$PKG_UPDATE
$PKG_INSTALL grafana
;;
dnf|yum)
cat > /etc/yum.repos.d/grafana.repo << 'EOF'
[grafana]
name=grafana
baseurl=https://packages.grafana.com/oss/rpm
repo_gpgcheck=1
enabled=1
gpgcheck=1
gpgkey=https://packages.grafana.com/gpg.key
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
EOF
$PKG_INSTALL grafana
;;
esac
systemctl daemon-reload
systemctl enable grafana-server
}
configure_grafana() {
echo "[3/7] Configuring Grafana..."
# Set admin password
grafana-cli admin reset-admin-password "$GRAFANA_ADMIN_PASSWORD"
# Configure Grafana settings
sed -i "s/^;http_port = 3000/http_port = 3000/" "$GRAFANA_CONFIG"
sed -i "s/^;domain = localhost/domain = $(hostname -f)/" "$GRAFANA_CONFIG"
systemctl start grafana-server
# Wait for Grafana to start
local count=0
while ! curl -s http://localhost:3000 > /dev/null; do
sleep 2
count=$((count + 1))
if [ $count -gt 30 ]; then
error "Grafana failed to start"
exit 1
fi
done
}
configure_firewall() {
echo "[4/7] Configuring firewall..."
case "$FIREWALL_CMD" in
ufw)
if command -v ufw &> /dev/null && ufw status | grep -q "Status: active"; then
ufw allow 3000/tcp
log "UFW rule added for Grafana port 3000"
fi
;;
firewall-cmd)
if systemctl is-active --quiet firewalld; then
firewall-cmd --permanent --add-port=3000/tcp
firewall-cmd --reload
log "Firewall rule added for Grafana port 3000"
fi
;;
esac
}
setup_influxdb_user() {
echo "[5/7] Setting up InfluxDB user..."
# Create InfluxDB user for Grafana
influx -execute "CREATE USER \"$INFLUXDB_USER\" WITH PASSWORD '$INFLUXDB_PASSWORD'"
influx -execute "GRANT READ ON \"collectd\" TO \"$INFLUXDB_USER\""
log "InfluxDB user created: $INFLUXDB_USER"
}
configure_datasource() {
echo "[6/7] Configuring InfluxDB data source in Grafana..."
# Wait a bit more for Grafana API to be ready
sleep 5
# Create InfluxDB data source
cat > /tmp/datasource.json << EOF
{
"name": "InfluxDB-collectd",
"type": "influxdb",
"url": "http://localhost:8086",
"access": "proxy",
"database": "collectd",
"user": "$INFLUXDB_USER",
"password": "$INFLUXDB_PASSWORD",
"isDefault": true
}
EOF
curl -X POST \
-H "Content-Type: application/json" \
-d @/tmp/datasource.json \
http://admin:$GRAFANA_ADMIN_PASSWORD@localhost:3000/api/datasources
rm -f /tmp/datasource.json
log "InfluxDB data source configured"
}
import_dashboard() {
echo "[7/7] Creating collectd dashboard..."
# Create a basic collectd dashboard
cat > /tmp/dashboard.json << 'EOF'
{
"dashboard": {
"id": null,
"title": "Collectd System Monitoring",
"tags": ["collectd"],
"timezone": "browser",
"panels": [
{
"id": 1,
"title": "CPU Usage",
"type": "timeseries",
"targets": [
{
"query": "SELECT 100 - mean(\"value\") FROM \"cpu_value\" WHERE \"type_instance\" = 'idle' AND $timeFilter GROUP BY time($__interval), \"host\" fill(null)"
}
],
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 0}
},
{
"id": 2,
"title": "Memory Usage",
"type": "timeseries",
"targets": [
{
"query": "SELECT mean(\"value\") FROM \"memory_value\" WHERE \"type_instance\" =~ /^(used|cached|free|buffered)$/ AND $timeFilter GROUP BY time($__interval), \"type_instance\" fill(null)"
}
],
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 0}
}
],
"time": {"from": "now-1h", "to": "now"},
"refresh": "30s"
},
"overwrite": true
}
EOF
curl -X POST \
-H "Content-Type: application/json" \
-d @/tmp/dashboard.json \
http://admin:$GRAFANA_ADMIN_PASSWORD@localhost:3000/api/dashboards/db
rm -f /tmp/dashboard.json
log "Collectd dashboard imported"
}
verify_installation() {
log "Verifying installation..."
if ! systemctl is-active --quiet grafana-server; then
error "Grafana service is not running"
return 1
fi
if ! curl -s http://localhost:3000 > /dev/null; then
error "Grafana web interface is not accessible"
return 1
fi
log "Installation completed successfully!"
log "Grafana URL: http://$(hostname -I | awk '{print $1}'):3000"
log "Username: admin"
log "Password: $GRAFANA_ADMIN_PASSWORD"
log "InfluxDB User: $INFLUXDB_USER"
log "InfluxDB Password: $INFLUXDB_PASSWORD"
}
main() {
log "Starting Grafana dashboard setup for collectd metrics..."
check_prerequisites
detect_distro
install_grafana
configure_grafana
configure_firewall
setup_influxdb_user
configure_datasource
import_dashboard
verify_installation
log "Setup completed! Access Grafana at http://localhost:3000"
}
main "$@"
Review the script before running. Execute with: bash install.sh