Set up comprehensive Elasticsearch 8 backup strategies with snapshot lifecycle management (SLM), filesystem and S3 repository backends, automated scheduling, and recovery procedures for production environments.
Prerequisites
- Elasticsearch 8.x cluster running
- Root or sudo access
- Basic understanding of Elasticsearch concepts
- S3 bucket for cloud backups (optional)
What this solves
Elasticsearch data loss can destroy your search indices, logs, and analytics forever. This tutorial sets up automated snapshot policies with filesystem and S3 backends, configures retention rules, and provides recovery procedures to protect your cluster data.
Step-by-step configuration
Verify Elasticsearch cluster health
Check that your Elasticsearch cluster is running and accessible before configuring snapshots.
curl -X GET "localhost:9200/_cluster/health?pretty"
Create filesystem snapshot repository
Configure a filesystem-based repository for local snapshots. Add the path to your Elasticsearch configuration first.
path.repo: ["/var/lib/elasticsearch/backups"]
Create the backup directory and set proper permissions.
sudo mkdir -p /var/lib/elasticsearch/backups
sudo chown elasticsearch:elasticsearch /var/lib/elasticsearch/backups
sudo chmod 750 /var/lib/elasticsearch/backups
Restart Elasticsearch to apply the configuration changes.
sudo systemctl restart elasticsearch
sudo systemctl status elasticsearch
Register filesystem snapshot repository
Create the filesystem repository through the Elasticsearch API.
curl -X PUT "localhost:9200/_snapshot/fs_backup" -H 'Content-Type: application/json' -d'
{
"type": "fs",
"settings": {
"location": "/var/lib/elasticsearch/backups",
"compress": true,
"max_snapshot_bytes_per_sec": "50mb",
"max_restore_bytes_per_sec": "50mb"
}
}'
Configure S3 snapshot repository
Install the S3 repository plugin for cloud backups.
sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install repository-s3
Restart Elasticsearch to load the S3 plugin.
sudo systemctl restart elasticsearch
Add S3 credentials to the Elasticsearch keystore.
sudo /usr/share/elasticsearch/bin/elasticsearch-keystore add s3.client.default.access_key
sudo /usr/share/elasticsearch/bin/elasticsearch-keystore add s3.client.default.secret_key
Register S3 snapshot repository
Create the S3 repository with your bucket configuration.
curl -X PUT "localhost:9200/_snapshot/s3_backup" -H 'Content-Type: application/json' -d'
{
"type": "s3",
"settings": {
"bucket": "elasticsearch-backups",
"region": "us-east-1",
"base_path": "snapshots",
"compress": true,
"server_side_encryption": true,
"max_snapshot_bytes_per_sec": "100mb",
"max_restore_bytes_per_sec": "100mb"
}
}'
Create snapshot lifecycle management policy
Define an automated policy for daily snapshots with retention rules.
curl -X PUT "localhost:9200/_slm/policy/daily_snapshots" -H 'Content-Type: application/json' -d'
{
"schedule": "0 2 *",
"name": "",
"repository": "fs_backup",
"config": {
"indices": ["*"],
"ignore_unavailable": false,
"include_global_state": true
},
"retention": {
"expire_after": "30d",
"min_count": 5,
"max_count": 50
}
}'
Create weekly S3 backup policy
Configure a weekly backup policy for long-term S3 storage.
curl -X PUT "localhost:9200/_slm/policy/weekly_s3_snapshots" -H 'Content-Type: application/json' -d'
{
"schedule": "0 3 0",
"name": "",
"repository": "s3_backup",
"config": {
"indices": ["*"],
"ignore_unavailable": false,
"include_global_state": true,
"metadata": {
"backup_type": "weekly",
"environment": "production"
}
},
"retention": {
"expire_after": "365d",
"min_count": 12,
"max_count": 104
}
}'
Start snapshot lifecycle management
Enable SLM to begin executing the automated policies.
curl -X POST "localhost:9200/_slm/start"
Create manual snapshot for testing
Take an immediate snapshot to test your repository configuration.
curl -X PUT "localhost:9200/_snapshot/fs_backup/test_snapshot?wait_for_completion=true" -H 'Content-Type: application/json' -d'
{
"indices": "*",
"ignore_unavailable": true,
"include_global_state": true,
"metadata": {
"taken_by": "manual_test",
"taken_because": "configuration_verification"
}
}'
Configure index-specific backup policy
Create targeted policies for critical indices with different retention requirements.
curl -X PUT "localhost:9200/_slm/policy/critical_indices_backup" -H 'Content-Type: application/json' -d'
{
"schedule": "0 1,13 *",
"name": "",
"repository": "s3_backup",
"config": {
"indices": ["logs-", "metrics-", "security-*"],
"ignore_unavailable": false,
"include_global_state": false,
"partial": false
},
"retention": {
"expire_after": "90d",
"min_count": 10,
"max_count": 200
}
}'
Set up snapshot restoration procedure
Create a script for automated snapshot restoration. This helps during disaster recovery.
#!/bin/bash
Elasticsearch snapshot restore script
set -e
SNAPSHOT_REPO="${1:-fs_backup}"
SNAPSHOT_NAME="${2}"
ES_HOST="${3:-localhost:9200}"
if [ -z "$SNAPSHOT_NAME" ]; then
echo "Usage: $0 [repository] [elasticsearch_host]"
exit 1
fi
echo "Checking snapshot status..."
curl -s "$ES_HOST/_snapshot/$SNAPSHOT_REPO/$SNAPSHOT_NAME" | jq '.snapshots[0].state'
echo "Closing indices before restore..."
curl -X POST "$ES_HOST/_all/_close"
echo "Restoring snapshot: $SNAPSHOT_NAME from $SNAPSHOT_REPO"
curl -X POST "$ES_HOST/_snapshot/$SNAPSHOT_REPO/$SNAPSHOT_NAME/_restore" -H 'Content-Type: application/json' -d'{
"ignore_unavailable": true,
"include_global_state": true,
"include_aliases": true
}'
echo "Restoration initiated. Monitor progress with:"
echo "curl -s '$ES_HOST/_recovery' | jq"
Make the script executable.
sudo chmod +x /usr/local/bin/elasticsearch-restore.sh
Configure monitoring and alerting
Set up monitoring for snapshot failures and policy execution.
#!/bin/bash
Check snapshot policy execution and alert on failures
ES_HOST="${ES_HOST:-localhost:9200}"
MAX_AGE_HOURS=25
Check if SLM is running
slm_status=$(curl -s "$ES_HOST/_slm/status" | jq -r '.operation_mode')
if [ "$slm_status" != "RUNNING" ]; then
echo "CRITICAL: SLM is not running. Status: $slm_status"
exit 2
fi
Check for recent snapshots
last_snapshot=$(curl -s "$ES_HOST/_snapshot/_all/_all?sort=start_time&order=desc&size=1" | jq -r '.snapshots[0].start_time')
if [ "$last_snapshot" = "null" ]; then
echo "WARNING: No snapshots found"
exit 1
fi
last_timestamp=$(date -d "$last_snapshot" +%s)
current_timestamp=$(date +%s)
age_hours=$(( (current_timestamp - last_timestamp) / 3600 ))
if [ $age_hours -gt $MAX_AGE_HOURS ]; then
echo "WARNING: Last snapshot is $age_hours hours old"
exit 1
fi
echo "OK: Last snapshot taken $age_hours hours ago"
exit 0
Make the monitoring script executable.
sudo chmod +x /usr/local/bin/check-elasticsearch-snapshots.sh
Set up automated monitoring with cron
Schedule regular checks for snapshot health and policy execution.
sudo crontab -e
Add these monitoring jobs to the crontab.
# Check snapshot health every 4 hours
0 /4 /usr/local/bin/check-elasticsearch-snapshots.sh
Weekly snapshot repository verification
0 4 1 curl -X POST "localhost:9200/_snapshot/fs_backup/_verify"
0 4 1 curl -X POST "localhost:9200/_snapshot/s3_backup/_verify"
Verify your setup
Check that your snapshot repositories and policies are configured correctly.
# Verify repository configuration
curl -s "localhost:9200/_snapshot" | jq
Check SLM policies
curl -s "localhost:9200/_slm/policy" | jq
View SLM status and stats
curl -s "localhost:9200/_slm/stats" | jq
List all snapshots
curl -s "localhost:9200/_snapshot/_all/_all" | jq '.snapshots[] | {name, state, start_time, end_time}'
Check last snapshot execution
curl -s "localhost:9200/_slm/policy/daily_snapshots" | jq '.daily_snapshots.last_success, .daily_snapshots.last_failure'
Verify snapshot integrity
curl -X POST "localhost:9200/_snapshot/fs_backup/_verify"
curl -X POST "localhost:9200/_snapshot/s3_backup/_verify"
Snapshot restoration procedures
Restore specific indices
Restore only selected indices from a snapshot while keeping others running.
curl -X POST "localhost:9200/_snapshot/fs_backup/test_snapshot/_restore" -H 'Content-Type: application/json' -d'
{
"indices": "logs-2024-01-,metrics-app-",
"ignore_unavailable": true,
"include_global_state": false,
"rename_pattern": "(.+)",
"rename_replacement": "restored_$1",
"include_aliases": false
}'
Restore with index renaming
Restore indices with new names to avoid conflicts with existing data.
curl -X POST "localhost:9200/_snapshot/s3_backup/weekly-snap-2024-01-07/_restore" -H 'Content-Type: application/json' -d'
{
"indices": "production-*",
"rename_pattern": "production-(.+)",
"rename_replacement": "backup-$1",
"include_global_state": false
}'
Monitor restoration progress
Track the restoration process and verify completion.
# Monitor recovery progress
curl -s "localhost:9200/_recovery" | jq '.[] | select(.stage != "DONE")'
Check cluster health during restore
curl -s "localhost:9200/_cluster/health?level=indices" | jq
View restoration stats
curl -s "localhost:9200/_stats/store,docs" | jq '.indices | to_entries[] | {index: .key, docs: .value.total.docs.count, size: .value.total.store.size_in_bytes}'
Common issues
| Symptom | Cause | Fix |
|---|---|---|
| Path not allowed error | Repository path not in path.repo setting | Add path to elasticsearch.yml and restart |
| S3 authentication failed | Invalid AWS credentials | Update keystore with correct access/secret keys |
| Snapshot stuck in PARTIAL state | Shard allocation issues | Check cluster health and retry with ignore_unavailable: true |
| SLM policy not executing | SLM service stopped | curl -X POST "localhost:9200/_slm/start" |
| Restoration failing | Index already exists | Close indices first or use rename patterns |
| Permission denied on backup directory | Wrong ownership | sudo chown elasticsearch:elasticsearch /var/lib/elasticsearch/backups |
Advanced backup strategies
Cross-cluster replication backup
For additional protection, configure cross-cluster replication for real-time backup to a remote cluster. This complements snapshots for zero-downtime disaster recovery.
# Configure remote cluster connection
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
"persistent": {
"cluster": {
"remote": {
"backup_cluster": {
"seeds": ["backup.example.com:9300"]
}
}
}
}
}'
You can learn more about setting up cross-cluster replication in our Elasticsearch cross-cluster replication guide.
Integrate with index lifecycle management
Coordinate snapshots with ILM policies to ensure consistent backup timing with data transitions.
curl -X PUT "localhost:9200/_ilm/policy/logs_with_snapshots" -H 'Content-Type: application/json' -d'
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_size": "50gb",
"max_age": "1d"
}
}
},
"warm": {
"min_age": "7d",
"actions": {
"allocate": {
"number_of_replicas": 0
}
}
},
"cold": {
"min_age": "30d",
"actions": {
"allocate": {
"number_of_replicas": 0
}
}
},
"delete": {
"min_age": "365d"
}
}
}
}'
Learn more about coordinating ILM with snapshots in our Elasticsearch ILM tutorial.
Next steps
- Configure Elasticsearch cross-cluster replication for disaster recovery
- Configure Elasticsearch Index Lifecycle Management (ILM) for automated data retention
- Setup Elasticsearch monitoring with Metricbeat and Kibana dashboards
- Implement Elasticsearch backup encryption with S3 and KMS
- Configure Elasticsearch snapshot restoration automation with disaster recovery testing