Configure CSI snapshot controllers, create persistent volume snapshots, and implement automated backup strategies with Velero for production Kubernetes environments.
Prerequisites
- Running Kubernetes cluster with CSI-compatible storage driver
- kubectl configured with cluster admin access
- S3-compatible storage bucket for backups
- Basic understanding of Kubernetes persistent volumes
What this solves
Kubernetes persistent volumes contain critical application data that needs protection against data loss, corruption, or accidental deletion. This tutorial shows you how to implement production-grade backup and recovery using CSI volume snapshots and Velero automation. You'll create point-in-time snapshots of persistent volumes and automate backup workflows across multiple storage backends.
Step-by-step configuration
Install CSI snapshot controller and CRDs
The CSI snapshot controller manages volume snapshots across different storage drivers. Install the controller and custom resource definitions first.
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/release-6.3/client/config/crd/snapshot.storage.k8s.io_volumesnapshotclasses.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/release-6.3/client/config/crd/snapshot.storage.k8s.io_volumesnapshotcontents.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/release-6.3/client/config/crd/snapshot.storage.k8s.io_volumesnapshots.yaml
Deploy the snapshot controller
Create the snapshot controller deployment to handle snapshot operations across your cluster.
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/release-6.3/deploy/kubernetes/snapshot-controller/rbac-snapshot-controller.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/release-6.3/deploy/kubernetes/snapshot-controller/setup-snapshot-controller.yaml
Create volume snapshot class
Define a snapshot class that specifies which CSI driver handles snapshots and retention policies.
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: csi-snapshot-class
annotations:
snapshot.storage.kubernetes.io/is-default-class: "true"
driver: ebs.csi.aws.com # Replace with your CSI driver
deletionPolicy: Delete
parameters:
incremental: "true"
encrypted: "true"
kubectl apply -f snapshot-class.yaml
Create a test persistent volume and data
Set up a test application with persistent storage to demonstrate snapshot functionality.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-pvc
namespace: default
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: gp3 # Replace with your storage class
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-app
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: test-app
template:
metadata:
labels:
app: test-app
spec:
containers:
- name: test-container
image: nginx:1.24
volumeMounts:
- name: data-volume
mountPath: /data
command: ["/bin/sh", "-c"]
args:
- |
echo "Initial data: $(date)" > /data/test.txt
nginx -g "daemon off;"
volumes:
- name: data-volume
persistentVolumeClaim:
claimName: test-pvc
kubectl apply -f test-app.yaml
Create manual volume snapshot
Take a manual snapshot of the persistent volume to capture the current state.
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: test-snapshot-manual
namespace: default
spec:
volumeSnapshotClassName: csi-snapshot-class
source:
persistentVolumeClaimName: test-pvc
kubectl apply -f manual-snapshot.yaml
Install Velero for automated backups
Velero provides comprehensive backup automation including persistent volumes, application manifests, and cross-cluster restore capabilities.
wget https://github.com/vmware-tanzu/velero/releases/download/v1.12.0/velero-v1.12.0-linux-amd64.tar.gz
tar -xzf velero-v1.12.0-linux-amd64.tar.gz
sudo mv velero-v1.12.0-linux-amd64/velero /usr/local/bin/
sudo chmod +x /usr/local/bin/velero
Configure S3-compatible backup storage
Set up credentials for your backup storage backend. This example uses AWS S3 but works with MinIO or other S3-compatible storage.
[default]
aws_access_key_id = YOUR_ACCESS_KEY
aws_secret_access_key = YOUR_SECRET_KEY
kubectl create secret generic cloud-credentials \
--namespace velero \
--from-file cloud=credentials-velero
Install Velero server components
Deploy Velero to your cluster with CSI snapshot plugin support and S3 backend configuration.
velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.8.0,velero/velero-plugin-for-csi:v0.6.0 \
--bucket your-backup-bucket \
--backup-location-config region=us-west-2 \
--snapshot-location-config region=us-west-2 \
--secret-file ./credentials-velero \
--features=EnableCSI
Create backup schedule for persistent volumes
Configure automated daily backups that include both Kubernetes manifests and persistent volume snapshots.
apiVersion: velero.io/v1
kind: Schedule
metadata:
name: daily-backup
namespace: velero
spec:
schedule: "0 2 *" # Daily at 2 AM UTC
template:
includedNamespaces:
- default
- production
- staging
excludedResources:
- events
- events.events.k8s.io
- backups.velero.io
- restores.velero.io
snapshotVolumes: true
includeClusterResources: true
ttl: 720h # 30 days retention
csiSnapshotTimeout: 10m
itemOperationTimeout: 4h
kubectl apply -f backup-schedule.yaml
Create backup policy with retention
Configure advanced backup policies with different retention periods for different backup types.
apiVersion: velero.io/v1
kind: Schedule
metadata:
name: weekly-longterm-backup
namespace: velero
spec:
schedule: "0 3 0" # Weekly on Sunday at 3 AM UTC
template:
includedNamespaces:
- production
snapshotVolumes: true
includeClusterResources: true
ttl: 2160h # 90 days retention
storageLocation: longterm-storage
csiSnapshotTimeout: 15m
metadata:
labels:
backup-type: longterm
kubectl apply -f retention-policy.yaml
Configure backup monitoring and alerts
Set up monitoring for backup success and failure notifications using Velero's built-in metrics.
apiVersion: v1
kind: ServiceMonitor
metadata:
name: velero-metrics
namespace: velero
labels:
app.kubernetes.io/name: velero
spec:
selector:
matchLabels:
app.kubernetes.io/name: velero
endpoints:
- port: monitoring
path: /metrics
interval: 30s
---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: velero-backup-alerts
namespace: velero
spec:
groups:
- name: velero.rules
rules:
- alert: VeleroBackupFailed
expr: velero_backup_failure_total > 0
for: 5m
labels:
severity: critical
annotations:
summary: "Velero backup failed"
description: "Backup {{ $labels.schedule }} failed with {{ $value }} failures"
kubectl apply -f backup-monitoring.yaml
Test snapshot restore functionality
Verify that snapshots work correctly by restoring data from a previous snapshot.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: restored-pvc
namespace: default
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
dataSource:
name: test-snapshot-manual
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
storageClassName: gp3
kubectl apply -f restore-test.yaml
kubectl get pvc restored-pvc
Verify your setup
Check that all components are running and snapshots are being created successfully.
# Check CSI snapshot controller
kubectl get pods -n kube-system | grep snapshot-controller
Verify snapshot classes
kubectl get volumesnapshotclass
Check manual snapshot status
kubectl get volumesnapshot test-snapshot-manual -o yaml
Verify Velero installation
velero version
kubectl get pods -n velero
Check backup schedules
velero schedule get
List recent backups
velero backup get
Check backup locations
velero backup-location get
Automate cross-namespace backups
Create namespace-specific backup policies
Configure different backup strategies for development, staging, and production environments.
apiVersion: velero.io/v1
kind: Schedule
metadata:
name: production-backup
namespace: velero
spec:
schedule: "0 1,13 *" # Twice daily
template:
includedNamespaces:
- production
snapshotVolumes: true
ttl: 168h # 7 days
---
apiVersion: velero.io/v1
kind: Schedule
metadata:
name: development-backup
namespace: velero
spec:
schedule: "0 4 1-5" # Weekdays only
template:
includedNamespaces:
- development
- staging
snapshotVolumes: false # Skip volume snapshots for dev
ttl: 72h # 3 days
kubectl apply -f namespace-backups.yaml
Common issues
| Symptom | Cause | Fix |
|---|---|---|
| Snapshot creation fails | CSI driver doesn't support snapshots | Verify driver compatibility: kubectl get csidriver |
| Velero backup hangs | CSI timeout too short | Increase csiSnapshotTimeout in backup spec |
| Restore fails with permissions | RBAC issues with CSI driver | Check service account permissions: kubectl describe clusterrole velero |
| Snapshots consume too much storage | No retention policy configured | Set ttl in backup template and configure storage lifecycle |
| Backup location unreachable | Invalid S3 credentials or bucket | Test connectivity: velero backup-location get -o yaml |
| Monitoring alerts not firing | ServiceMonitor not discovered | Check Prometheus operator labels and selectors |
Next steps
- Implement ClickHouse backup automation with compression and S3 integration for database-specific strategies
- Set up MySQL backup monitoring with Prometheus alerts and Grafana dashboards for application-level backup monitoring
- Configure Kubernetes disaster recovery with cross-cluster replication for multi-region backup strategies
- Implement Kubernetes backup security with encryption and RBAC policies
- Set up Velero multi-cloud backup federation for vendor independence
Running this in production?
Automated install script
Run this to automate the entire setup
#!/usr/bin/env bash
set -euo pipefail
# Kubernetes Persistent Volume Snapshots and Backup Automation Setup
# Production-quality installation script
# Color definitions
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color
# Configuration variables
CSI_DRIVER="${CSI_DRIVER:-ebs.csi.aws.com}"
STORAGE_CLASS="${STORAGE_CLASS:-gp3}"
VELERO_VERSION="${VELERO_VERSION:-v1.12.0}"
SNAPSHOT_VERSION="${SNAPSHOT_VERSION:-release-6.3}"
# Function definitions
log_info() { echo -e "${BLUE}[INFO]${NC} $1"; }
log_success() { echo -e "${GREEN}[SUCCESS]${NC} $1"; }
log_warning() { echo -e "${YELLOW}[WARNING]${NC} $1"; }
log_error() { echo -e "${RED}[ERROR]${NC} $1"; }
cleanup_on_error() {
log_error "Installation failed. Cleaning up..."
rm -f /tmp/velero-*.tar.gz
rm -rf /tmp/velero-*-linux-amd64
rm -f /tmp/*.yaml
exit 1
}
trap cleanup_on_error ERR
usage() {
echo "Usage: $0 [OPTIONS]"
echo "Options:"
echo " --csi-driver DRIVER CSI driver name (default: ebs.csi.aws.com)"
echo " --storage-class CLASS Storage class name (default: gp3)"
echo " --velero-version VER Velero version (default: v1.12.0)"
echo " -h, --help Show this help message"
echo ""
echo "Example:"
echo " $0 --csi-driver disk.csi.azure.com --storage-class managed-premium"
exit 1
}
# Parse command line arguments
while [[ $# -gt 0 ]]; do
case $1 in
--csi-driver)
CSI_DRIVER="$2"
shift 2
;;
--storage-class)
STORAGE_CLASS="$2"
shift 2
;;
--velero-version)
VELERO_VERSION="$2"
shift 2
;;
-h|--help)
usage
;;
*)
log_error "Unknown option: $1"
usage
;;
esac
done
# Check prerequisites
echo "[1/9] Checking prerequisites..."
if [[ $EUID -ne 0 ]] && ! command -v sudo >/dev/null 2>&1; then
log_error "This script requires root privileges or sudo access"
exit 1
fi
SUDO=""
if [[ $EUID -ne 0 ]]; then
SUDO="sudo"
fi
# Check for kubectl
if ! command -v kubectl >/dev/null 2>&1; then
log_error "kubectl is required but not installed. Please install kubectl first."
exit 1
fi
# Test kubectl connectivity
if ! kubectl cluster-info >/dev/null 2>&1; then
log_error "Cannot connect to Kubernetes cluster. Please check your kubeconfig."
exit 1
fi
log_success "Prerequisites check passed"
# Detect distribution
echo "[2/9] Detecting operating system..."
if [ -f /etc/os-release ]; then
. /etc/os-release
case "$ID" in
ubuntu|debian)
PKG_MGR="apt"
PKG_UPDATE="apt update"
PKG_INSTALL="apt install -y"
;;
almalinux|rocky|centos|rhel|ol|fedora)
PKG_MGR="dnf"
PKG_UPDATE="dnf update -y"
PKG_INSTALL="dnf install -y"
if ! command -v dnf >/dev/null 2>&1; then
PKG_MGR="yum"
PKG_UPDATE="yum update -y"
PKG_INSTALL="yum install -y"
fi
;;
amzn)
PKG_MGR="yum"
PKG_UPDATE="yum update -y"
PKG_INSTALL="yum install -y"
;;
*)
log_error "Unsupported distribution: $ID"
exit 1
;;
esac
else
log_error "Cannot detect operating system"
exit 1
fi
log_success "Detected OS: $PRETTY_NAME using $PKG_MGR"
# Install required packages
echo "[3/9] Installing required packages..."
$SUDO $PKG_UPDATE >/dev/null 2>&1 || true
case "$PKG_MGR" in
apt)
$SUDO $PKG_INSTALL wget curl tar gzip >/dev/null 2>&1
;;
dnf|yum)
$SUDO $PKG_INSTALL wget curl tar gzip >/dev/null 2>&1
;;
esac
log_success "Required packages installed"
# Install CSI snapshot CRDs
echo "[4/9] Installing CSI snapshot controller and CRDs..."
kubectl apply -f "https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/${SNAPSHOT_VERSION}/client/config/crd/snapshot.storage.k8s.io_volumesnapshotclasses.yaml" >/dev/null
kubectl apply -f "https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/${SNAPSHOT_VERSION}/client/config/crd/snapshot.storage.k8s.io_volumesnapshotcontents.yaml" >/dev/null
kubectl apply -f "https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/${SNAPSHOT_VERSION}/client/config/crd/snapshot.storage.k8s.io_volumesnapshots.yaml" >/dev/null
log_success "CSI snapshot CRDs installed"
# Deploy snapshot controller
echo "[5/9] Deploying snapshot controller..."
kubectl apply -f "https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/${SNAPSHOT_VERSION}/deploy/kubernetes/snapshot-controller/rbac-snapshot-controller.yaml" >/dev/null
kubectl apply -f "https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/${SNAPSHOT_VERSION}/deploy/kubernetes/snapshot-controller/setup-snapshot-controller.yaml" >/dev/null
# Wait for snapshot controller to be ready
kubectl wait --for=condition=available --timeout=300s deployment/snapshot-controller -n kube-system >/dev/null
log_success "Snapshot controller deployed and ready"
# Create volume snapshot class
echo "[6/9] Creating volume snapshot class..."
cat > /tmp/snapshot-class.yaml <<EOF
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: csi-snapshot-class
annotations:
snapshot.storage.kubernetes.io/is-default-class: "true"
driver: ${CSI_DRIVER}
deletionPolicy: Delete
parameters:
incremental: "true"
encrypted: "true"
EOF
kubectl apply -f /tmp/snapshot-class.yaml >/dev/null
log_success "Volume snapshot class created"
# Install Velero
echo "[7/9] Installing Velero..."
cd /tmp
wget -q "https://github.com/vmware-tanzu/velero/releases/download/${VELERO_VERSION}/velero-${VELERO_VERSION}-linux-amd64.tar.gz"
tar -xzf "velero-${VELERO_VERSION}-linux-amd64.tar.gz"
$SUDO mv "velero-${VELERO_VERSION}-linux-amd64/velero" /usr/local/bin/
$SUDO chmod 755 /usr/local/bin/velero
$SUDO chown root:root /usr/local/bin/velero
# Cleanup temporary files
rm -f "velero-${VELERO_VERSION}-linux-amd64.tar.gz"
rm -rf "velero-${VELERO_VERSION}-linux-amd64"
log_success "Velero binary installed"
# Create test application
echo "[8/9] Creating test application with persistent storage..."
cat > /tmp/test-app.yaml <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-pvc
namespace: default
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: ${STORAGE_CLASS}
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-app
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: test-app
template:
metadata:
labels:
app: test-app
spec:
containers:
- name: test-container
image: nginx:1.24
volumeMounts:
- name: data-volume
mountPath: /data
command: ["/bin/sh", "-c"]
args:
- |
echo "Initial data: \$(date)" > /data/test.txt
nginx -g "daemon off;"
volumes:
- name: data-volume
persistentVolumeClaim:
claimName: test-pvc
EOF
kubectl apply -f /tmp/test-app.yaml >/dev/null
# Wait for PVC to be bound
kubectl wait --for=condition=bound pvc/test-pvc --timeout=300s >/dev/null
log_success "Test application deployed"
# Verification
echo "[9/9] Verifying installation..."
# Check snapshot controller
if kubectl get deployment snapshot-controller -n kube-system >/dev/null 2>&1; then
log_success "✓ Snapshot controller is deployed"
else
log_error "✗ Snapshot controller deployment failed"
exit 1
fi
# Check snapshot class
if kubectl get volumesnapshotclass csi-snapshot-class >/dev/null 2>&1; then
log_success "✓ Volume snapshot class created"
else
log_error "✗ Volume snapshot class creation failed"
exit 1
fi
# Check Velero binary
if command -v velero >/dev/null 2>&1; then
VELERO_VER=$(velero version --client-only 2>/dev/null | grep Version | cut -d: -f2 | tr -d ' ')
log_success "✓ Velero installed: $VELERO_VER"
else
log_error "✗ Velero installation failed"
exit 1
fi
# Check test application
if kubectl get pvc test-pvc -o jsonpath='{.status.phase}' | grep -q "Bound"; then
log_success "✓ Test application PVC is bound"
else
log_warning "⚠ Test application PVC is not yet bound"
fi
# Cleanup temporary files
rm -f /tmp/*.yaml
# Final instructions
echo ""
log_success "Installation completed successfully!"
echo ""
echo "Next steps:"
echo "1. Configure Velero with your cloud provider:"
echo " velero install --provider <provider> --bucket <bucket> ..."
echo ""
echo "2. Create a manual snapshot:"
echo " kubectl create -f - <<EOF"
echo " apiVersion: snapshot.storage.k8s.io/v1"
echo " kind: VolumeSnapshot"
echo " metadata:"
echo " name: test-snapshot"
echo " namespace: default"
echo " spec:"
echo " volumeSnapshotClassName: csi-snapshot-class"
echo " source:"
echo " persistentVolumeClaimName: test-pvc"
echo " EOF"
echo ""
echo "3. Create automated backups with Velero schedules"
echo ""
log_info "CSI Driver: $CSI_DRIVER"
log_info "Storage Class: $STORAGE_CLASS"
log_info "Velero Version: $VELERO_VERSION"
Review the script before running. Execute with: bash install.sh