Configure Thanos Receiver clustering for high availability and load distribution

Advanced 45 min Apr 16, 2026 38 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Set up Thanos Receiver clustering with hashring configuration to distribute Prometheus remote write traffic across multiple replicas for high availability and scalability.

Prerequisites

  • Root or sudo access
  • At least 8GB RAM per receiver instance
  • Network connectivity between receiver nodes
  • Object storage (MinIO, S3, or compatible)
  • HAProxy for load balancing

What this solves

Thanos Receiver clustering enables horizontal scaling of Prometheus remote write ingestion by distributing time series data across multiple receiver instances. This configuration provides high availability, load distribution, and prevents data loss during receiver outages. You need this when handling high-volume metrics ingestion from multiple Prometheus instances or when single receiver instances become bottlenecks.

Step-by-step configuration

Update system packages

Start by updating your package manager to ensure you have the latest security patches.

sudo apt update && sudo apt upgrade -y
sudo dnf update -y

Install Thanos binary

Download and install Thanos binary from the official GitHub releases. This provides all Thanos components including the receiver.

wget https://github.com/thanos-io/thanos/releases/download/v0.34.1/thanos-0.34.1.linux-amd64.tar.gz
tar -xzf thanos-0.34.1.linux-amd64.tar.gz
sudo mv thanos-0.34.1.linux-amd64/thanos /usr/local/bin/
sudo chmod +x /usr/local/bin/thanos

Create Thanos user and directories

Create a dedicated system user for Thanos and establish the required directory structure with proper ownership.

sudo useradd --no-create-home --shell /bin/false thanos
sudo mkdir -p /etc/thanos /var/lib/thanos/receive /var/log/thanos
sudo chown -R thanos:thanos /var/lib/thanos /var/log/thanos
sudo chown thanos:thanos /etc/thanos

Configure object storage

Create the object storage configuration file for Thanos to store received data. This example uses MinIO but works with any S3-compatible storage.

type: s3
config:
  bucket: "thanos-metrics"
  endpoint: "minio.example.com:9000"
  access_key: "minio-access-key"
  secret_key: "minio-secret-key"
  insecure: false
  signature_version2: false

Create hashring configuration

Define the hashring configuration that determines how data is distributed across receiver instances. This enables consistent hashing for tenant routing.

[
  {
    "hashring": "default",
    "tenants": [],
    "endpoints": [
      "receiver-01.example.com:10901",
      "receiver-02.example.com:10901",
      "receiver-03.example.com:10901"
    ]
  },
  {
    "hashring": "tenant-a",
    "tenants": ["tenant-a"],
    "endpoints": [
      "receiver-01.example.com:10901",
      "receiver-02.example.com:10901"
    ]
  },
  {
    "hashring": "tenant-b",
    "tenants": ["tenant-b"],
    "endpoints": [
      "receiver-02.example.com:10901",
      "receiver-03.example.com:10901"
    ]
  }
]

Configure first receiver instance

Create the systemd service file for the first Thanos receiver instance with clustering enabled.

[Unit]
Description=Thanos Receiver 01
After=network.target

[Service]
Type=simple
User=thanos
Group=thanos
ExecStart=/usr/local/bin/thanos receive \
  --grpc-address=0.0.0.0:10901 \
  --http-address=0.0.0.0:10902 \
  --remote-write.address=0.0.0.0:19291 \
  --objstore.config-file=/etc/thanos/bucket.yml \
  --tsdb.path=/var/lib/thanos/receive/01 \
  --label=receive_replica=\"01\" \
  --label=receive_cluster=\"production\" \
  --receive.hashrings-file=/etc/thanos/hashring.json \
  --receive.local-endpoint=receiver-01.example.com:10901 \
  --receive.replication-factor=2 \
  --log.level=info \
  --log.format=logfmt
Restart=always
RestartSec=3
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

Configure second receiver instance

Create the systemd service file for the second receiver instance with different ports and data directory.

[Unit]
Description=Thanos Receiver 02
After=network.target

[Service]
Type=simple
User=thanos
Group=thanos
ExecStart=/usr/local/bin/thanos receive \
  --grpc-address=0.0.0.0:10903 \
  --http-address=0.0.0.0:10904 \
  --remote-write.address=0.0.0.0:19292 \
  --objstore.config-file=/etc/thanos/bucket.yml \
  --tsdb.path=/var/lib/thanos/receive/02 \
  --label=receive_replica=\"02\" \
  --label=receive_cluster=\"production\" \
  --receive.hashrings-file=/etc/thanos/hashring.json \
  --receive.local-endpoint=receiver-02.example.com:10903 \
  --receive.replication-factor=2 \
  --log.level=info \
  --log.format=logfmt
Restart=always
RestartSec=3
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

Configure third receiver instance

Create the systemd service file for the third receiver instance to complete the high availability cluster.

[Unit]
Description=Thanos Receiver 03
After=network.target

[Service]
Type=simple
User=thanos
Group=thanos
ExecStart=/usr/local/bin/thanos receive \
  --grpc-address=0.0.0.0:10905 \
  --http-address=0.0.0.0:10906 \
  --remote-write.address=0.0.0.0:19293 \
  --objstore.config-file=/etc/thanos/bucket.yml \
  --tsdb.path=/var/lib/thanos/receive/03 \
  --label=receive_replica=\"03\" \
  --label=receive_cluster=\"production\" \
  --receive.hashrings-file=/etc/thanos/hashring.json \
  --receive.local-endpoint=receiver-03.example.com:10905 \
  --receive.replication-factor=2 \
  --log.level=info \
  --log.format=logfmt
Restart=always
RestartSec=3
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

Create additional data directories

Create the remaining data directories for receiver instances 02 and 03 with proper ownership.

sudo mkdir -p /var/lib/thanos/receive/02 /var/lib/thanos/receive/03
sudo chown -R thanos:thanos /var/lib/thanos/receive

Configure firewall rules

Open the necessary ports for Thanos receiver clustering communication and remote write ingestion.

sudo ufw allow 10901:10906/tcp
sudo ufw allow 19291:19293/tcp
sudo ufw reload
sudo firewall-cmd --permanent --add-port=10901-10906/tcp
sudo firewall-cmd --permanent --add-port=19291-19293/tcp
sudo firewall-cmd --reload

Set proper file permissions

Ensure all configuration files have the correct permissions for security and functionality.

sudo chmod 640 /etc/thanos/bucket.yml /etc/thanos/hashring.json
sudo chown thanos:thanos /etc/thanos/bucket.yml /etc/thanos/hashring.json
Never use chmod 777. It gives every user on the system full access to your files. Instead, use minimal permissions like 640 for configuration files and proper ownership with chown.

Enable and start receiver services

Enable and start all three Thanos receiver instances to form the cluster.

sudo systemctl daemon-reload
sudo systemctl enable thanos-receiver-01 thanos-receiver-02 thanos-receiver-03
sudo systemctl start thanos-receiver-01 thanos-receiver-02 thanos-receiver-03

Configure load balancer

Set up HAProxy to distribute Prometheus remote write traffic across receiver instances. This provides a single endpoint for Prometheus instances.

global
    daemon
    log stdout local0
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin
    stats timeout 30s
    user haproxy
    group haproxy

defaults
    mode http
    log global
    option httplog
    option dontlognull
    option log-health-checks
    option forwardfor
    option http-server-close
    timeout connect 5000
    timeout client 50000
    timeout server 50000
    errorfile 400 /etc/haproxy/errors/400.http
    errorfile 403 /etc/haproxy/errors/403.http
    errorfile 408 /etc/haproxy/errors/408.http
    errorfile 500 /etc/haproxy/errors/500.http
    errorfile 502 /etc/haproxy/errors/502.http
    errorfile 503 /etc/haproxy/errors/503.http
    errorfile 504 /etc/haproxy/errors/504.http

frontend thanos_receive_frontend
    bind *:19290
    default_backend thanos_receive_backend

backend thanos_receive_backend
    balance roundrobin
    option httpchk GET /api/v1/receive
    server receiver-01 127.0.0.1:19291 check
    server receiver-02 127.0.0.1:19292 check
    server receiver-03 127.0.0.1:19293 check

frontend stats
    bind *:8404
    stats enable
    stats uri /stats
    stats refresh 10s

Configure Prometheus remote write

Update your Prometheus configuration to send metrics to the load-balanced Thanos receiver cluster.

global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    cluster: 'production'
    replica: 'prometheus-01'

remote_write:
  - url: http://thanos-lb.example.com:19290/api/v1/receive
    queue_config:
      max_samples_per_send: 10000
      batch_send_deadline: 5s
      min_shards: 4
      max_shards: 200
      capacity: 10000
    metadata_config:
      send: true
      send_interval: 30s
    headers:
      THANOS-TENANT: "default"

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

Configure tenant-specific routing

Set up tenant-specific Prometheus configurations for multi-tenant scenarios with isolated hashrings.

global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    cluster: 'production'
    replica: 'prometheus-tenant-a'
    tenant: 'tenant-a'

remote_write:
  - url: http://thanos-lb.example.com:19290/api/v1/receive
    headers:
      THANOS-TENANT: "tenant-a"
    queue_config:
      max_samples_per_send: 10000
      batch_send_deadline: 5s

scrape_configs:
  - job_name: 'tenant-a-services'
    static_configs:
      - targets: ['app-a-01:8080', 'app-a-02:8080']

Verify your setup

Check that all receiver instances are running and healthy.

sudo systemctl status thanos-receiver-01 thanos-receiver-02 thanos-receiver-03
thanos --version

Verify receiver endpoints are responding to health checks.

curl -s http://localhost:10902/api/v1/receive | jq .
curl -s http://localhost:10904/api/v1/receive | jq .
curl -s http://localhost:10906/api/v1/receive | jq .

Test the hashring configuration and cluster membership.

curl -s http://localhost:10902/-/config | jq .receive
curl -s http://localhost:10902/-/healthy

Verify HAProxy load balancer status and backend health.

curl -s http://localhost:8404/stats
sudo systemctl status haproxy

Test remote write functionality through the load balancer.

curl -X POST http://localhost:19290/api/v1/receive \
  -H "Content-Type: application/x-protobuf" \
  -H "Content-Encoding: snappy" \
  -H "THANOS-TENANT: default" \
  --data-binary @/dev/null

Common issues

SymptomCauseFix
Receiver won't startPort already in useCheck with sudo netstat -tulpn | grep :10901 and use different ports
Hashring errors in logsInvalid JSON syntaxValidate JSON with jq . /etc/thanos/hashring.json
Object storage connection failsIncorrect credentialsVerify bucket config and test with thanos tools bucket ls --objstore.config-file=/etc/thanos/bucket.yml
Prometheus remote write failsWrong tenant header or URLCheck Prometheus logs and verify THANOS-TENANT header matches hashring config
Cluster members not discoveringFirewall blocking GRPC portsVerify ports 10901, 10903, 10905 are open between receivers
Data not replicated properlyReplication factor mismatchEnsure all receivers use same --receive.replication-factor value
HAProxy backend servers downHealth check endpoint unavailableCheck receiver health endpoints and adjust HAProxy health check URL

Next steps

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle managed devops services for businesses that depend on uptime. From initial setup to ongoing operations.