Configure Elasticsearch 8 snapshot and restore policies with automated backup strategies

Intermediate 45 min Jun 03, 2026 13 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Set up comprehensive Elasticsearch 8 backup strategies with snapshot lifecycle management (SLM), filesystem and S3 repository backends, automated scheduling, and recovery procedures for production environments.

Prerequisites

  • Elasticsearch 8.x cluster running
  • Root or sudo access
  • Basic understanding of Elasticsearch concepts
  • S3 bucket for cloud backups (optional)

What this solves

Elasticsearch data loss can destroy your search indices, logs, and analytics forever. This tutorial sets up automated snapshot policies with filesystem and S3 backends, configures retention rules, and provides recovery procedures to protect your cluster data.

Step-by-step configuration

Verify Elasticsearch cluster health

Check that your Elasticsearch cluster is running and accessible before configuring snapshots.

curl -X GET "localhost:9200/_cluster/health?pretty"

Create filesystem snapshot repository

Configure a filesystem-based repository for local snapshots. Add the path to your Elasticsearch configuration first.

path.repo: ["/var/lib/elasticsearch/backups"]

Create the backup directory and set proper permissions.

sudo mkdir -p /var/lib/elasticsearch/backups
sudo chown elasticsearch:elasticsearch /var/lib/elasticsearch/backups
sudo chmod 750 /var/lib/elasticsearch/backups

Restart Elasticsearch to apply the configuration changes.

sudo systemctl restart elasticsearch
sudo systemctl status elasticsearch

Register filesystem snapshot repository

Create the filesystem repository through the Elasticsearch API.

curl -X PUT "localhost:9200/_snapshot/fs_backup" -H 'Content-Type: application/json' -d'
{
  "type": "fs",
  "settings": {
    "location": "/var/lib/elasticsearch/backups",
    "compress": true,
    "max_snapshot_bytes_per_sec": "50mb",
    "max_restore_bytes_per_sec": "50mb"
  }
}'

Configure S3 snapshot repository

Install the S3 repository plugin for cloud backups.

sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install repository-s3

Restart Elasticsearch to load the S3 plugin.

sudo systemctl restart elasticsearch

Add S3 credentials to the Elasticsearch keystore.

sudo /usr/share/elasticsearch/bin/elasticsearch-keystore add s3.client.default.access_key
sudo /usr/share/elasticsearch/bin/elasticsearch-keystore add s3.client.default.secret_key

Register S3 snapshot repository

Create the S3 repository with your bucket configuration.

curl -X PUT "localhost:9200/_snapshot/s3_backup" -H 'Content-Type: application/json' -d'
{
  "type": "s3",
  "settings": {
    "bucket": "elasticsearch-backups",
    "region": "us-east-1",
    "base_path": "snapshots",
    "compress": true,
    "server_side_encryption": true,
    "max_snapshot_bytes_per_sec": "100mb",
    "max_restore_bytes_per_sec": "100mb"
  }
}'

Create snapshot lifecycle management policy

Define an automated policy for daily snapshots with retention rules.

curl -X PUT "localhost:9200/_slm/policy/daily_snapshots" -H 'Content-Type: application/json' -d'
{
  "schedule": "0 2   *",
  "name": "",
  "repository": "fs_backup",
  "config": {
    "indices": ["*"],
    "ignore_unavailable": false,
    "include_global_state": true
  },
  "retention": {
    "expire_after": "30d",
    "min_count": 5,
    "max_count": 50
  }
}'

Create weekly S3 backup policy

Configure a weekly backup policy for long-term S3 storage.

curl -X PUT "localhost:9200/_slm/policy/weekly_s3_snapshots" -H 'Content-Type: application/json' -d'
{
  "schedule": "0 3   0",
  "name": "",
  "repository": "s3_backup",
  "config": {
    "indices": ["*"],
    "ignore_unavailable": false,
    "include_global_state": true,
    "metadata": {
      "backup_type": "weekly",
      "environment": "production"
    }
  },
  "retention": {
    "expire_after": "365d",
    "min_count": 12,
    "max_count": 104
  }
}'

Start snapshot lifecycle management

Enable SLM to begin executing the automated policies.

curl -X POST "localhost:9200/_slm/start"

Create manual snapshot for testing

Take an immediate snapshot to test your repository configuration.

curl -X PUT "localhost:9200/_snapshot/fs_backup/test_snapshot?wait_for_completion=true" -H 'Content-Type: application/json' -d'
{
  "indices": "*",
  "ignore_unavailable": true,
  "include_global_state": true,
  "metadata": {
    "taken_by": "manual_test",
    "taken_because": "configuration_verification"
  }
}'

Configure index-specific backup policy

Create targeted policies for critical indices with different retention requirements.

curl -X PUT "localhost:9200/_slm/policy/critical_indices_backup" -H 'Content-Type: application/json' -d'
{
  "schedule": "0 1,13   *",
  "name": "",
  "repository": "s3_backup",
  "config": {
    "indices": ["logs-", "metrics-", "security-*"],
    "ignore_unavailable": false,
    "include_global_state": false,
    "partial": false
  },
  "retention": {
    "expire_after": "90d",
    "min_count": 10,
    "max_count": 200
  }
}'

Set up snapshot restoration procedure

Create a script for automated snapshot restoration. This helps during disaster recovery.

#!/bin/bash

Elasticsearch snapshot restore script

set -e SNAPSHOT_REPO="${1:-fs_backup}" SNAPSHOT_NAME="${2}" ES_HOST="${3:-localhost:9200}" if [ -z "$SNAPSHOT_NAME" ]; then echo "Usage: $0 [repository] [elasticsearch_host]" exit 1 fi echo "Checking snapshot status..." curl -s "$ES_HOST/_snapshot/$SNAPSHOT_REPO/$SNAPSHOT_NAME" | jq '.snapshots[0].state' echo "Closing indices before restore..." curl -X POST "$ES_HOST/_all/_close" echo "Restoring snapshot: $SNAPSHOT_NAME from $SNAPSHOT_REPO" curl -X POST "$ES_HOST/_snapshot/$SNAPSHOT_REPO/$SNAPSHOT_NAME/_restore" -H 'Content-Type: application/json' -d'{ "ignore_unavailable": true, "include_global_state": true, "include_aliases": true }' echo "Restoration initiated. Monitor progress with:" echo "curl -s '$ES_HOST/_recovery' | jq"

Make the script executable.

sudo chmod +x /usr/local/bin/elasticsearch-restore.sh

Configure monitoring and alerting

Set up monitoring for snapshot failures and policy execution.

#!/bin/bash

Check snapshot policy execution and alert on failures

ES_HOST="${ES_HOST:-localhost:9200}" MAX_AGE_HOURS=25

Check if SLM is running

slm_status=$(curl -s "$ES_HOST/_slm/status" | jq -r '.operation_mode') if [ "$slm_status" != "RUNNING" ]; then echo "CRITICAL: SLM is not running. Status: $slm_status" exit 2 fi

Check for recent snapshots

last_snapshot=$(curl -s "$ES_HOST/_snapshot/_all/_all?sort=start_time&order=desc&size=1" | jq -r '.snapshots[0].start_time') if [ "$last_snapshot" = "null" ]; then echo "WARNING: No snapshots found" exit 1 fi last_timestamp=$(date -d "$last_snapshot" +%s) current_timestamp=$(date +%s) age_hours=$(( (current_timestamp - last_timestamp) / 3600 )) if [ $age_hours -gt $MAX_AGE_HOURS ]; then echo "WARNING: Last snapshot is $age_hours hours old" exit 1 fi echo "OK: Last snapshot taken $age_hours hours ago" exit 0

Make the monitoring script executable.

sudo chmod +x /usr/local/bin/check-elasticsearch-snapshots.sh

Set up automated monitoring with cron

Schedule regular checks for snapshot health and policy execution.

sudo crontab -e

Add these monitoring jobs to the crontab.

# Check snapshot health every 4 hours
0 /4    /usr/local/bin/check-elasticsearch-snapshots.sh

Weekly snapshot repository verification

0 4 1 curl -X POST "localhost:9200/_snapshot/fs_backup/_verify" 0 4 1 curl -X POST "localhost:9200/_snapshot/s3_backup/_verify"

Verify your setup

Check that your snapshot repositories and policies are configured correctly.

# Verify repository configuration
curl -s "localhost:9200/_snapshot" | jq

Check SLM policies

curl -s "localhost:9200/_slm/policy" | jq

View SLM status and stats

curl -s "localhost:9200/_slm/stats" | jq

List all snapshots

curl -s "localhost:9200/_snapshot/_all/_all" | jq '.snapshots[] | {name, state, start_time, end_time}'

Check last snapshot execution

curl -s "localhost:9200/_slm/policy/daily_snapshots" | jq '.daily_snapshots.last_success, .daily_snapshots.last_failure'

Verify snapshot integrity

curl -X POST "localhost:9200/_snapshot/fs_backup/_verify" curl -X POST "localhost:9200/_snapshot/s3_backup/_verify"

Snapshot restoration procedures

Restore specific indices

Restore only selected indices from a snapshot while keeping others running.

curl -X POST "localhost:9200/_snapshot/fs_backup/test_snapshot/_restore" -H 'Content-Type: application/json' -d'
{
  "indices": "logs-2024-01-,metrics-app-",
  "ignore_unavailable": true,
  "include_global_state": false,
  "rename_pattern": "(.+)",
  "rename_replacement": "restored_$1",
  "include_aliases": false
}'

Restore with index renaming

Restore indices with new names to avoid conflicts with existing data.

curl -X POST "localhost:9200/_snapshot/s3_backup/weekly-snap-2024-01-07/_restore" -H 'Content-Type: application/json' -d'
{
  "indices": "production-*",
  "rename_pattern": "production-(.+)",
  "rename_replacement": "backup-$1",
  "include_global_state": false
}'

Monitor restoration progress

Track the restoration process and verify completion.

# Monitor recovery progress
curl -s "localhost:9200/_recovery" | jq '.[] | select(.stage != "DONE")'

Check cluster health during restore

curl -s "localhost:9200/_cluster/health?level=indices" | jq

View restoration stats

curl -s "localhost:9200/_stats/store,docs" | jq '.indices | to_entries[] | {index: .key, docs: .value.total.docs.count, size: .value.total.store.size_in_bytes}'

Common issues

SymptomCauseFix
Path not allowed errorRepository path not in path.repo settingAdd path to elasticsearch.yml and restart
S3 authentication failedInvalid AWS credentialsUpdate keystore with correct access/secret keys
Snapshot stuck in PARTIAL stateShard allocation issuesCheck cluster health and retry with ignore_unavailable: true
SLM policy not executingSLM service stoppedcurl -X POST "localhost:9200/_slm/start"
Restoration failingIndex already existsClose indices first or use rename patterns
Permission denied on backup directoryWrong ownershipsudo chown elasticsearch:elasticsearch /var/lib/elasticsearch/backups

Advanced backup strategies

Cross-cluster replication backup

For additional protection, configure cross-cluster replication for real-time backup to a remote cluster. This complements snapshots for zero-downtime disaster recovery.

# Configure remote cluster connection
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "persistent": {
    "cluster": {
      "remote": {
        "backup_cluster": {
          "seeds": ["backup.example.com:9300"]
        }
      }
    }
  }
}'

You can learn more about setting up cross-cluster replication in our Elasticsearch cross-cluster replication guide.

Integrate with index lifecycle management

Coordinate snapshots with ILM policies to ensure consistent backup timing with data transitions.

curl -X PUT "localhost:9200/_ilm/policy/logs_with_snapshots" -H 'Content-Type: application/json' -d'
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "50gb",
            "max_age": "1d"
          }
        }
      },
      "warm": {
        "min_age": "7d",
        "actions": {
          "allocate": {
            "number_of_replicas": 0
          }
        }
      },
      "cold": {
        "min_age": "30d",
        "actions": {
          "allocate": {
            "number_of_replicas": 0
          }
        }
      },
      "delete": {
        "min_age": "365d"
      }
    }
  }
}'

Learn more about coordinating ILM with snapshots in our Elasticsearch ILM tutorial.

Next steps

Running this in production?

Want this handled for you? Setting this up once is straightforward. Keeping it patched, monitored, backed up and performant across environments is the harder part. See how we run infrastructure like this for European teams.

Need help?

Don't want to manage this yourself?

We handle high availability infrastructure for businesses that depend on uptime. From initial setup to ongoing operations.