Set up Thanos Receiver clustering with hashring configuration to distribute Prometheus remote write traffic across multiple replicas for high availability and scalability.
Prerequisites
- Root or sudo access
- At least 8GB RAM per receiver instance
- Network connectivity between receiver nodes
- Object storage (MinIO, S3, or compatible)
- HAProxy for load balancing
What this solves
Thanos Receiver clustering enables horizontal scaling of Prometheus remote write ingestion by distributing time series data across multiple receiver instances. This configuration provides high availability, load distribution, and prevents data loss during receiver outages. You need this when handling high-volume metrics ingestion from multiple Prometheus instances or when single receiver instances become bottlenecks.
Step-by-step configuration
Update system packages
Start by updating your package manager to ensure you have the latest security patches.
sudo apt update && sudo apt upgrade -y
Install Thanos binary
Download and install Thanos binary from the official GitHub releases. This provides all Thanos components including the receiver.
wget https://github.com/thanos-io/thanos/releases/download/v0.34.1/thanos-0.34.1.linux-amd64.tar.gz
tar -xzf thanos-0.34.1.linux-amd64.tar.gz
sudo mv thanos-0.34.1.linux-amd64/thanos /usr/local/bin/
sudo chmod +x /usr/local/bin/thanos
Create Thanos user and directories
Create a dedicated system user for Thanos and establish the required directory structure with proper ownership.
sudo useradd --no-create-home --shell /bin/false thanos
sudo mkdir -p /etc/thanos /var/lib/thanos/receive /var/log/thanos
sudo chown -R thanos:thanos /var/lib/thanos /var/log/thanos
sudo chown thanos:thanos /etc/thanos
Configure object storage
Create the object storage configuration file for Thanos to store received data. This example uses MinIO but works with any S3-compatible storage.
type: s3
config:
bucket: "thanos-metrics"
endpoint: "minio.example.com:9000"
access_key: "minio-access-key"
secret_key: "minio-secret-key"
insecure: false
signature_version2: false
Create hashring configuration
Define the hashring configuration that determines how data is distributed across receiver instances. This enables consistent hashing for tenant routing.
[
{
"hashring": "default",
"tenants": [],
"endpoints": [
"receiver-01.example.com:10901",
"receiver-02.example.com:10901",
"receiver-03.example.com:10901"
]
},
{
"hashring": "tenant-a",
"tenants": ["tenant-a"],
"endpoints": [
"receiver-01.example.com:10901",
"receiver-02.example.com:10901"
]
},
{
"hashring": "tenant-b",
"tenants": ["tenant-b"],
"endpoints": [
"receiver-02.example.com:10901",
"receiver-03.example.com:10901"
]
}
]
Configure first receiver instance
Create the systemd service file for the first Thanos receiver instance with clustering enabled.
[Unit]
Description=Thanos Receiver 01
After=network.target
[Service]
Type=simple
User=thanos
Group=thanos
ExecStart=/usr/local/bin/thanos receive \
--grpc-address=0.0.0.0:10901 \
--http-address=0.0.0.0:10902 \
--remote-write.address=0.0.0.0:19291 \
--objstore.config-file=/etc/thanos/bucket.yml \
--tsdb.path=/var/lib/thanos/receive/01 \
--label=receive_replica=\"01\" \
--label=receive_cluster=\"production\" \
--receive.hashrings-file=/etc/thanos/hashring.json \
--receive.local-endpoint=receiver-01.example.com:10901 \
--receive.replication-factor=2 \
--log.level=info \
--log.format=logfmt
Restart=always
RestartSec=3
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
Configure second receiver instance
Create the systemd service file for the second receiver instance with different ports and data directory.
[Unit]
Description=Thanos Receiver 02
After=network.target
[Service]
Type=simple
User=thanos
Group=thanos
ExecStart=/usr/local/bin/thanos receive \
--grpc-address=0.0.0.0:10903 \
--http-address=0.0.0.0:10904 \
--remote-write.address=0.0.0.0:19292 \
--objstore.config-file=/etc/thanos/bucket.yml \
--tsdb.path=/var/lib/thanos/receive/02 \
--label=receive_replica=\"02\" \
--label=receive_cluster=\"production\" \
--receive.hashrings-file=/etc/thanos/hashring.json \
--receive.local-endpoint=receiver-02.example.com:10903 \
--receive.replication-factor=2 \
--log.level=info \
--log.format=logfmt
Restart=always
RestartSec=3
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
Configure third receiver instance
Create the systemd service file for the third receiver instance to complete the high availability cluster.
[Unit]
Description=Thanos Receiver 03
After=network.target
[Service]
Type=simple
User=thanos
Group=thanos
ExecStart=/usr/local/bin/thanos receive \
--grpc-address=0.0.0.0:10905 \
--http-address=0.0.0.0:10906 \
--remote-write.address=0.0.0.0:19293 \
--objstore.config-file=/etc/thanos/bucket.yml \
--tsdb.path=/var/lib/thanos/receive/03 \
--label=receive_replica=\"03\" \
--label=receive_cluster=\"production\" \
--receive.hashrings-file=/etc/thanos/hashring.json \
--receive.local-endpoint=receiver-03.example.com:10905 \
--receive.replication-factor=2 \
--log.level=info \
--log.format=logfmt
Restart=always
RestartSec=3
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
Create additional data directories
Create the remaining data directories for receiver instances 02 and 03 with proper ownership.
sudo mkdir -p /var/lib/thanos/receive/02 /var/lib/thanos/receive/03
sudo chown -R thanos:thanos /var/lib/thanos/receive
Configure firewall rules
Open the necessary ports for Thanos receiver clustering communication and remote write ingestion.
sudo ufw allow 10901:10906/tcp
sudo ufw allow 19291:19293/tcp
sudo ufw reload
Set proper file permissions
Ensure all configuration files have the correct permissions for security and functionality.
sudo chmod 640 /etc/thanos/bucket.yml /etc/thanos/hashring.json
sudo chown thanos:thanos /etc/thanos/bucket.yml /etc/thanos/hashring.json
Enable and start receiver services
Enable and start all three Thanos receiver instances to form the cluster.
sudo systemctl daemon-reload
sudo systemctl enable thanos-receiver-01 thanos-receiver-02 thanos-receiver-03
sudo systemctl start thanos-receiver-01 thanos-receiver-02 thanos-receiver-03
Configure load balancer
Set up HAProxy to distribute Prometheus remote write traffic across receiver instances. This provides a single endpoint for Prometheus instances.
global
daemon
log stdout local0
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
defaults
mode http
log global
option httplog
option dontlognull
option log-health-checks
option forwardfor
option http-server-close
timeout connect 5000
timeout client 50000
timeout server 50000
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http
frontend thanos_receive_frontend
bind *:19290
default_backend thanos_receive_backend
backend thanos_receive_backend
balance roundrobin
option httpchk GET /api/v1/receive
server receiver-01 127.0.0.1:19291 check
server receiver-02 127.0.0.1:19292 check
server receiver-03 127.0.0.1:19293 check
frontend stats
bind *:8404
stats enable
stats uri /stats
stats refresh 10s
Configure Prometheus remote write
Update your Prometheus configuration to send metrics to the load-balanced Thanos receiver cluster.
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
cluster: 'production'
replica: 'prometheus-01'
remote_write:
- url: http://thanos-lb.example.com:19290/api/v1/receive
queue_config:
max_samples_per_send: 10000
batch_send_deadline: 5s
min_shards: 4
max_shards: 200
capacity: 10000
metadata_config:
send: true
send_interval: 30s
headers:
THANOS-TENANT: "default"
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
Configure tenant-specific routing
Set up tenant-specific Prometheus configurations for multi-tenant scenarios with isolated hashrings.
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
cluster: 'production'
replica: 'prometheus-tenant-a'
tenant: 'tenant-a'
remote_write:
- url: http://thanos-lb.example.com:19290/api/v1/receive
headers:
THANOS-TENANT: "tenant-a"
queue_config:
max_samples_per_send: 10000
batch_send_deadline: 5s
scrape_configs:
- job_name: 'tenant-a-services'
static_configs:
- targets: ['app-a-01:8080', 'app-a-02:8080']
Verify your setup
Check that all receiver instances are running and healthy.
sudo systemctl status thanos-receiver-01 thanos-receiver-02 thanos-receiver-03
thanos --version
Verify receiver endpoints are responding to health checks.
curl -s http://localhost:10902/api/v1/receive | jq .
curl -s http://localhost:10904/api/v1/receive | jq .
curl -s http://localhost:10906/api/v1/receive | jq .
Test the hashring configuration and cluster membership.
curl -s http://localhost:10902/-/config | jq .receive
curl -s http://localhost:10902/-/healthy
Verify HAProxy load balancer status and backend health.
curl -s http://localhost:8404/stats
sudo systemctl status haproxy
Test remote write functionality through the load balancer.
curl -X POST http://localhost:19290/api/v1/receive \
-H "Content-Type: application/x-protobuf" \
-H "Content-Encoding: snappy" \
-H "THANOS-TENANT: default" \
--data-binary @/dev/null
Common issues
| Symptom | Cause | Fix |
|---|---|---|
| Receiver won't start | Port already in use | Check with sudo netstat -tulpn | grep :10901 and use different ports |
| Hashring errors in logs | Invalid JSON syntax | Validate JSON with jq . /etc/thanos/hashring.json |
| Object storage connection fails | Incorrect credentials | Verify bucket config and test with thanos tools bucket ls --objstore.config-file=/etc/thanos/bucket.yml |
| Prometheus remote write fails | Wrong tenant header or URL | Check Prometheus logs and verify THANOS-TENANT header matches hashring config |
| Cluster members not discovering | Firewall blocking GRPC ports | Verify ports 10901, 10903, 10905 are open between receivers |
| Data not replicated properly | Replication factor mismatch | Ensure all receivers use same --receive.replication-factor value |
| HAProxy backend servers down | Health check endpoint unavailable | Check receiver health endpoints and adjust HAProxy health check URL |
Next steps
- Configure Thanos Ruler for distributed alerting across multiple Prometheus clusters
- Set up Thanos Receiver for remote write scalability with Prometheus integration
- Implement Thanos multi-cluster federation for global Prometheus metrics aggregation
- Configure Prometheus long-term storage with Thanos for unlimited data retention
- Monitor Thanos components with Prometheus and Grafana dashboards
Automated install script
Run this to automate the entire setup
#!/usr/bin/env bash
set -euo pipefail
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'
# Configuration
THANOS_VERSION="0.34.1"
RECEIVER_COUNT=3
# Usage function
usage() {
echo "Usage: $0 [OPTIONS]"
echo "Options:"
echo " -d, --domain DOMAIN Domain for receiver endpoints (default: example.com)"
echo " -b, --bucket BUCKET S3 bucket name (default: thanos-metrics)"
echo " -e, --endpoint ENDPOINT S3 endpoint (default: minio.example.com:9000)"
echo " -k, --access-key KEY S3 access key (required)"
echo " -s, --secret-key KEY S3 secret key (required)"
echo " -h, --help Show this help message"
exit 1
}
# Parse arguments
DOMAIN="example.com"
BUCKET="thanos-metrics"
ENDPOINT="minio.example.com:9000"
ACCESS_KEY=""
SECRET_KEY=""
while [[ $# -gt 0 ]]; do
case $1 in
-d|--domain) DOMAIN="$2"; shift 2 ;;
-b|--bucket) BUCKET="$2"; shift 2 ;;
-e|--endpoint) ENDPOINT="$2"; shift 2 ;;
-k|--access-key) ACCESS_KEY="$2"; shift 2 ;;
-s|--secret-key) SECRET_KEY="$2"; shift 2 ;;
-h|--help) usage ;;
*) echo "Unknown option: $1"; usage ;;
esac
done
if [[ -z "$ACCESS_KEY" || -z "$SECRET_KEY" ]]; then
echo -e "${RED}Error: S3 access key and secret key are required${NC}"
usage
fi
# Cleanup function
cleanup() {
echo -e "${YELLOW}Installation failed. Cleaning up...${NC}"
systemctl stop thanos-receiver-* 2>/dev/null || true
systemctl disable thanos-receiver-* 2>/dev/null || true
rm -f /etc/systemd/system/thanos-receiver-*.service
systemctl daemon-reload
}
trap cleanup ERR
# Check if running as root
if [[ $EUID -ne 0 ]]; then
echo -e "${RED}This script must be run as root${NC}"
exit 1
fi
# Detect distribution
echo -e "${YELLOW}[1/9] Detecting distribution...${NC}"
if [ -f /etc/os-release ]; then
. /etc/os-release
case "$ID" in
ubuntu|debian)
PKG_MGR="apt"
PKG_UPDATE="apt update && apt upgrade -y"
PKG_INSTALL="apt install -y"
;;
almalinux|rocky|centos|rhel|ol|fedora)
PKG_MGR="dnf"
PKG_UPDATE="dnf update -y"
PKG_INSTALL="dnf install -y"
;;
amzn)
PKG_MGR="yum"
PKG_UPDATE="yum update -y"
PKG_INSTALL="yum install -y"
;;
*)
echo -e "${RED}Unsupported distribution: $ID${NC}"
exit 1
;;
esac
else
echo -e "${RED}Cannot detect distribution${NC}"
exit 1
fi
echo -e "${GREEN}Detected: $PRETTY_NAME${NC}"
# Update system
echo -e "${YELLOW}[2/9] Updating system packages...${NC}"
$PKG_UPDATE
# Install required packages
echo -e "${YELLOW}[3/9] Installing required packages...${NC}"
$PKG_INSTALL wget tar
# Download and install Thanos
echo -e "${YELLOW}[4/9] Downloading and installing Thanos ${THANOS_VERSION}...${NC}"
cd /tmp
wget -q "https://github.com/thanos-io/thanos/releases/download/v${THANOS_VERSION}/thanos-${THANOS_VERSION}.linux-amd64.tar.gz"
tar -xzf "thanos-${THANOS_VERSION}.linux-amd64.tar.gz"
mv "thanos-${THANOS_VERSION}.linux-amd64/thanos" /usr/local/bin/
chmod 755 /usr/local/bin/thanos
rm -rf "thanos-${THANOS_VERSION}"*
# Create user and directories
echo -e "${YELLOW}[5/9] Creating Thanos user and directories...${NC}"
useradd --system --no-create-home --shell /bin/false thanos 2>/dev/null || true
mkdir -p /etc/thanos /var/lib/thanos/receive /var/log/thanos
for i in $(seq 1 $RECEIVER_COUNT); do
mkdir -p "/var/lib/thanos/receive/0$i"
done
chown -R thanos:thanos /var/lib/thanos /var/log/thanos /etc/thanos
chmod 755 /etc/thanos /var/lib/thanos /var/log/thanos
find /var/lib/thanos -type d -exec chmod 755 {} \;
# Configure object storage
echo -e "${YELLOW}[6/9] Creating object storage configuration...${NC}"
cat > /etc/thanos/bucket.yml <<EOF
type: s3
config:
bucket: "${BUCKET}"
endpoint: "${ENDPOINT}"
access_key: "${ACCESS_KEY}"
secret_key: "${SECRET_KEY}"
insecure: false
signature_version2: false
EOF
chown thanos:thanos /etc/thanos/bucket.yml
chmod 600 /etc/thanos/bucket.yml
# Create hashring configuration
echo -e "${YELLOW}[7/9] Creating hashring configuration...${NC}"
cat > /etc/thanos/hashring.json <<EOF
[
{
"hashring": "default",
"tenants": [],
"endpoints": [
"receiver-01.${DOMAIN}:10901",
"receiver-02.${DOMAIN}:10901",
"receiver-03.${DOMAIN}:10901"
]
}
]
EOF
chown thanos:thanos /etc/thanos/hashring.json
chmod 644 /etc/thanos/hashring.json
# Create systemd services
echo -e "${YELLOW}[8/9] Creating systemd service files...${NC}"
for i in $(seq 1 $RECEIVER_COUNT); do
GRPC_PORT=$((10900 + i))
HTTP_PORT=$((10901 + i))
REMOTE_WRITE_PORT=$((19290 + i))
cat > "/etc/systemd/system/thanos-receiver-0$i.service" <<EOF
[Unit]
Description=Thanos Receiver 0$i
After=network.target
[Service]
Type=simple
User=thanos
Group=thanos
ExecStart=/usr/local/bin/thanos receive \\
--grpc-address=0.0.0.0:${GRPC_PORT} \\
--http-address=0.0.0.0:${HTTP_PORT} \\
--remote-write.address=0.0.0.0:${REMOTE_WRITE_PORT} \\
--objstore.config-file=/etc/thanos/bucket.yml \\
--tsdb.path=/var/lib/thanos/receive/0$i \\
--label=receive_replica=\"0$i\" \\
--label=receive_cluster=\"production\" \\
--receive.hashrings-file=/etc/thanos/hashring.json \\
--receive.local-endpoint=receiver-0$i.${DOMAIN}:${GRPC_PORT} \\
--receive.replication-factor=2 \\
--log.level=info \\
--log.format=logfmt
Restart=always
RestartSec=3
LimitNOFILE=65536
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target
EOF
chmod 644 "/etc/systemd/system/thanos-receiver-0$i.service"
done
# Configure SELinux if present
if command -v getenforce >/dev/null 2>&1 && [[ "$(getenforce)" != "Disabled" ]]; then
echo -e "${YELLOW}Configuring SELinux policies...${NC}"
setsebool -P httpd_can_network_connect 1 2>/dev/null || true
fi
# Start and enable services
echo -e "${YELLOW}[9/9] Starting and enabling Thanos receiver services...${NC}"
systemctl daemon-reload
for i in $(seq 1 $RECEIVER_COUNT); do
systemctl enable "thanos-receiver-0$i.service"
systemctl start "thanos-receiver-0$i.service"
done
# Verification
echo -e "${YELLOW}Verifying installation...${NC}"
sleep 5
FAILED=0
for i in $(seq 1 $RECEIVER_COUNT); do
if ! systemctl is-active --quiet "thanos-receiver-0$i.service"; then
echo -e "${RED}thanos-receiver-0$i service is not running${NC}"
FAILED=1
else
echo -e "${GREEN}thanos-receiver-0$i service is running${NC}"
fi
done
if [[ $FAILED -eq 1 ]]; then
echo -e "${RED}Some services failed to start. Check logs with: journalctl -u thanos-receiver-XX${NC}"
exit 1
fi
echo -e "${GREEN}Thanos Receiver cluster installation completed successfully!${NC}"
echo -e "${YELLOW}Configuration details:${NC}"
echo " - Receiver instances: $RECEIVER_COUNT"
echo " - Domain: $DOMAIN"
echo " - Config directory: /etc/thanos"
echo " - Data directory: /var/lib/thanos"
echo " - Log directory: /var/log/thanos"
echo ""
echo -e "${YELLOW}Service management:${NC}"
for i in $(seq 1 $RECEIVER_COUNT); do
HTTP_PORT=$((10901 + i))
REMOTE_WRITE_PORT=$((19290 + i))
echo " - thanos-receiver-0$i: HTTP :${HTTP_PORT}, Remote Write :${REMOTE_WRITE_PORT}"
done
Review the script before running. Execute with: bash install.sh