Elasticsearch 8 Snapshot Backup & Restore Policies

Set up comprehensive Elasticsearch 8 backup strategies with snapshot lifecycle management (SLM), filesystem and S3 repository backends, automated scheduling, and recovery procedures for production environments.

Prerequisites

Elasticsearch 8.x cluster running
Root or sudo access
Basic understanding of Elasticsearch concepts
S3 bucket for cloud backups (optional)

What this solves

Elasticsearch data loss can destroy your search indices, logs, and analytics forever. This tutorial sets up automated snapshot policies with filesystem and S3 backends, configures retention rules, and provides recovery procedures to protect your cluster data.

Step-by-step configuration

Verify Elasticsearch cluster health

Check that your Elasticsearch cluster is running and accessible before configuring snapshots.

curl -X GET "localhost:9200/_cluster/health?pretty"

Create filesystem snapshot repository

Configure a filesystem-based repository for local snapshots. Add the path to your Elasticsearch configuration first.

path.repo: ["/var/lib/elasticsearch/backups"]

Create the backup directory and set proper permissions.

sudo mkdir -p /var/lib/elasticsearch/backups
sudo chown elasticsearch:elasticsearch /var/lib/elasticsearch/backups
sudo chmod 750 /var/lib/elasticsearch/backups

Restart Elasticsearch to apply the configuration changes.

sudo systemctl restart elasticsearch
sudo systemctl status elasticsearch

Register filesystem snapshot repository

Create the filesystem repository through the Elasticsearch API.

curl -X PUT "localhost:9200/_snapshot/fs_backup" -H 'Content-Type: application/json' -d'
{
  "type": "fs",
  "settings": {
    "location": "/var/lib/elasticsearch/backups",
    "compress": true,
    "max_snapshot_bytes_per_sec": "50mb",
    "max_restore_bytes_per_sec": "50mb"
  }
}'

Configure S3 snapshot repository

Install the S3 repository plugin for cloud backups.

sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install repository-s3

Restart Elasticsearch to load the S3 plugin.

sudo systemctl restart elasticsearch

Add S3 credentials to the Elasticsearch keystore.

sudo /usr/share/elasticsearch/bin/elasticsearch-keystore add s3.client.default.access_key
sudo /usr/share/elasticsearch/bin/elasticsearch-keystore add s3.client.default.secret_key

Register S3 snapshot repository

Create the S3 repository with your bucket configuration.

curl -X PUT "localhost:9200/_snapshot/s3_backup" -H 'Content-Type: application/json' -d'
{
  "type": "s3",
  "settings": {
    "bucket": "elasticsearch-backups",
    "region": "us-east-1",
    "base_path": "snapshots",
    "compress": true,
    "server_side_encryption": true,
    "max_snapshot_bytes_per_sec": "100mb",
    "max_restore_bytes_per_sec": "100mb"
  }
}'

Create snapshot lifecycle management policy

Define an automated policy for daily snapshots with retention rules.

curl -X PUT "localhost:9200/_slm/policy/daily_snapshots" -H 'Content-Type: application/json' -d'
{
  "schedule": "0 2   *",
  "name": "",
  "repository": "fs_backup",
  "config": {
    "indices": ["*"],
    "ignore_unavailable": false,
    "include_global_state": true
  },
  "retention": {
    "expire_after": "30d",
    "min_count": 5,
    "max_count": 50
  }
}'

Create weekly S3 backup policy

Configure a weekly backup policy for long-term S3 storage.

curl -X PUT "localhost:9200/_slm/policy/weekly_s3_snapshots" -H 'Content-Type: application/json' -d'
{
  "schedule": "0 3   0",
  "name": "",
  "repository": "s3_backup",
  "config": {
    "indices": ["*"],
    "ignore_unavailable": false,
    "include_global_state": true,
    "metadata": {
      "backup_type": "weekly",
      "environment": "production"
    }
  },
  "retention": {
    "expire_after": "365d",
    "min_count": 12,
    "max_count": 104
  }
}'

Start snapshot lifecycle management

Enable SLM to begin executing the automated policies.

curl -X POST "localhost:9200/_slm/start"

Create manual snapshot for testing

Take an immediate snapshot to test your repository configuration.

curl -X PUT "localhost:9200/_snapshot/fs_backup/test_snapshot?wait_for_completion=true" -H 'Content-Type: application/json' -d'
{
  "indices": "*",
  "ignore_unavailable": true,
  "include_global_state": true,
  "metadata": {
    "taken_by": "manual_test",
    "taken_because": "configuration_verification"
  }
}'

Configure index-specific backup policy

Create targeted policies for critical indices with different retention requirements.

curl -X PUT "localhost:9200/_slm/policy/critical_indices_backup" -H 'Content-Type: application/json' -d'
{
  "schedule": "0 1,13   *",
  "name": "",
  "repository": "s3_backup",
  "config": {
    "indices": ["logs-", "metrics-", "security-*"],
    "ignore_unavailable": false,
    "include_global_state": false,
    "partial": false
  },
  "retention": {
    "expire_after": "90d",
    "min_count": 10,
    "max_count": 200
  }
}'

Set up snapshot restoration procedure

Create a script for automated snapshot restoration. This helps during disaster recovery.

#!/bin/bash

Elasticsearch snapshot restore script
set -e

SNAPSHOT_REPO="${1:-fs_backup}"
SNAPSHOT_NAME="${2}"
ES_HOST="${3:-localhost:9200}"

if [ -z "$SNAPSHOT_NAME" ]; then
    echo "Usage: $0 [repository]  [elasticsearch_host]"
    exit 1
fi

echo "Checking snapshot status..."
curl -s "$ES_HOST/_snapshot/$SNAPSHOT_REPO/$SNAPSHOT_NAME" | jq '.snapshots[0].state'

echo "Closing indices before restore..."
curl -X POST "$ES_HOST/_all/_close"

echo "Restoring snapshot: $SNAPSHOT_NAME from $SNAPSHOT_REPO"
curl -X POST "$ES_HOST/_snapshot/$SNAPSHOT_REPO/$SNAPSHOT_NAME/_restore" -H 'Content-Type: application/json' -d'{
  "ignore_unavailable": true,
  "include_global_state": true,
  "include_aliases": true
}'

echo "Restoration initiated. Monitor progress with:"
echo "curl -s '$ES_HOST/_recovery' | jq"

Make the script executable.

sudo chmod +x /usr/local/bin/elasticsearch-restore.sh

Configure monitoring and alerting

Set up monitoring for snapshot failures and policy execution.

#!/bin/bash

Check snapshot policy execution and alert on failures
ES_HOST="${ES_HOST:-localhost:9200}"
MAX_AGE_HOURS=25

Check if SLM is running
slm_status=$(curl -s "$ES_HOST/_slm/status" | jq -r '.operation_mode')
if [ "$slm_status" != "RUNNING" ]; then
    echo "CRITICAL: SLM is not running. Status: $slm_status"
    exit 2
fi

Check for recent snapshots
last_snapshot=$(curl -s "$ES_HOST/_snapshot/_all/_all?sort=start_time&order=desc&size=1" | jq -r '.snapshots[0].start_time')
if [ "$last_snapshot" = "null" ]; then
    echo "WARNING: No snapshots found"
    exit 1
fi

last_timestamp=$(date -d "$last_snapshot" +%s)
current_timestamp=$(date +%s)
age_hours=$(( (current_timestamp - last_timestamp) / 3600 ))

if [ $age_hours -gt $MAX_AGE_HOURS ]; then
    echo "WARNING: Last snapshot is $age_hours hours old"
    exit 1
fi

echo "OK: Last snapshot taken $age_hours hours ago"
exit 0

Make the monitoring script executable.

sudo chmod +x /usr/local/bin/check-elasticsearch-snapshots.sh

Set up automated monitoring with cron

Schedule regular checks for snapshot health and policy execution.

sudo crontab -e

Add these monitoring jobs to the crontab.

# Check snapshot health every 4 hours
0 /4    /usr/local/bin/check-elasticsearch-snapshots.sh

Weekly snapshot repository verification
0 4   1 curl -X POST "localhost:9200/_snapshot/fs_backup/_verify"
0 4   1 curl -X POST "localhost:9200/_snapshot/s3_backup/_verify"

Verify your setup

Check that your snapshot repositories and policies are configured correctly.

# Verify repository configuration
curl -s "localhost:9200/_snapshot" | jq

Check SLM policies
curl -s "localhost:9200/_slm/policy" | jq

View SLM status and stats
curl -s "localhost:9200/_slm/stats" | jq

List all snapshots
curl -s "localhost:9200/_snapshot/_all/_all" | jq '.snapshots[] | {name, state, start_time, end_time}'

Check last snapshot execution
curl -s "localhost:9200/_slm/policy/daily_snapshots" | jq '.daily_snapshots.last_success, .daily_snapshots.last_failure'

Verify snapshot integrity
curl -X POST "localhost:9200/_snapshot/fs_backup/_verify"
curl -X POST "localhost:9200/_snapshot/s3_backup/_verify"

Snapshot restoration procedures

Restore specific indices

Restore only selected indices from a snapshot while keeping others running.

curl -X POST "localhost:9200/_snapshot/fs_backup/test_snapshot/_restore" -H 'Content-Type: application/json' -d'
{
  "indices": "logs-2024-01-,metrics-app-",
  "ignore_unavailable": true,
  "include_global_state": false,
  "rename_pattern": "(.+)",
  "rename_replacement": "restored_$1",
  "include_aliases": false
}'

Restore with index renaming

Restore indices with new names to avoid conflicts with existing data.

curl -X POST "localhost:9200/_snapshot/s3_backup/weekly-snap-2024-01-07/_restore" -H 'Content-Type: application/json' -d'
{
  "indices": "production-*",
  "rename_pattern": "production-(.+)",
  "rename_replacement": "backup-$1",
  "include_global_state": false
}'

Monitor restoration progress

Track the restoration process and verify completion.

# Monitor recovery progress
curl -s "localhost:9200/_recovery" | jq '.[] | select(.stage != "DONE")'

Check cluster health during restore
curl -s "localhost:9200/_cluster/health?level=indices" | jq

View restoration stats
curl -s "localhost:9200/_stats/store,docs" | jq '.indices | to_entries[] | {index: .key, docs: .value.total.docs.count, size: .value.total.store.size_in_bytes}'

Common issues

Symptom	Cause	Fix
Path not allowed error	Repository path not in path.repo setting	Add path to elasticsearch.yml and restart
S3 authentication failed	Invalid AWS credentials	Update keystore with correct access/secret keys
Snapshot stuck in PARTIAL state	Shard allocation issues	Check cluster health and retry with ignore_unavailable: true
SLM policy not executing	SLM service stopped	`curl -X POST "localhost:9200/_slm/start"`
Restoration failing	Index already exists	Close indices first or use rename patterns
Permission denied on backup directory	Wrong ownership	`sudo chown elasticsearch:elasticsearch /var/lib/elasticsearch/backups`

Advanced backup strategies

Cross-cluster replication backup

For additional protection, configure cross-cluster replication for real-time backup to a remote cluster. This complements snapshots for zero-downtime disaster recovery.

# Configure remote cluster connection
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "persistent": {
    "cluster": {
      "remote": {
        "backup_cluster": {
          "seeds": ["backup.example.com:9300"]
        }
      }
    }
  }
}'

You can learn more about setting up cross-cluster replication in our Elasticsearch cross-cluster replication guide.

Integrate with index lifecycle management

Coordinate snapshots with ILM policies to ensure consistent backup timing with data transitions.

curl -X PUT "localhost:9200/_ilm/policy/logs_with_snapshots" -H 'Content-Type: application/json' -d'
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "50gb",
            "max_age": "1d"
          }
        }
      },
      "warm": {
        "min_age": "7d",
        "actions": {
          "allocate": {
            "number_of_replicas": 0
          }
        }
      },
      "cold": {
        "min_age": "30d",
        "actions": {
          "allocate": {
            "number_of_replicas": 0
          }
        }
      },
      "delete": {
        "min_age": "365d"
      }
    }
  }
}'

Learn more about coordinating ILM with snapshots in our Elasticsearch ILM tutorial.

Next steps

Running this in production?

Want this handled for you? Setting this up once is straightforward. Keeping it patched, monitored, backed up and performant across environments is the harder part. See how we run infrastructure like this for European teams.

#elasticsearch #backup #snapshots #disaster recovery #automation

Configure Elasticsearch 8 snapshot and restore policies with automated backup strategies

Prerequisites

What this solves

Step-by-step configuration

Verify Elasticsearch cluster health

Create filesystem snapshot repository

Register filesystem snapshot repository

Configure S3 snapshot repository

Register S3 snapshot repository

Create snapshot lifecycle management policy

Create weekly S3 backup policy

Start snapshot lifecycle management

Create manual snapshot for testing

Configure index-specific backup policy

Set up snapshot restoration procedure

Elasticsearch snapshot restore script

Configure monitoring and alerting

Check snapshot policy execution and alert on failures

Check if SLM is running

Check for recent snapshots

Set up automated monitoring with cron

Weekly snapshot repository verification

Verify your setup

Check SLM policies

View SLM status and stats

List all snapshots

Check last snapshot execution

Verify snapshot integrity

Snapshot restoration procedures

Restore specific indices

Restore with index renaming

Monitor restoration progress

Check cluster health during restore

View restoration stats

Common issues

Advanced backup strategies

Cross-cluster replication backup

Integrate with index lifecycle management

Next steps

Running this in production?

Related tutorials

Setup ScyllaDB backup validation and automated restore testing

Implement MariaDB backup encryption with Mariabackup and automated restoration

Configure MariaDB Galera cluster for multi-master replication with automatic failover

Don't want to manage this yourself?