Set up Kubernetes persistent volume snapshots and backup automation

Advanced 45 min May 19, 2026 133 views
Ubuntu 24.04 Debian 12 AlmaLinux 9 Rocky Linux 9

Configure CSI snapshot controllers, create persistent volume snapshots, and implement automated backup strategies with Velero for production Kubernetes environments.

Prerequisites

  • Running Kubernetes cluster with CSI-compatible storage driver
  • kubectl configured with cluster admin access
  • S3-compatible storage bucket for backups
  • Basic understanding of Kubernetes persistent volumes

What this solves

Kubernetes persistent volumes contain critical application data that needs protection against data loss, corruption, or accidental deletion. This tutorial shows you how to implement production-grade backup and recovery using CSI volume snapshots and Velero automation. You'll create point-in-time snapshots of persistent volumes and automate backup workflows across multiple storage backends.

Step-by-step configuration

Install CSI snapshot controller and CRDs

The CSI snapshot controller manages volume snapshots across different storage drivers. Install the controller and custom resource definitions first.

kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/release-6.3/client/config/crd/snapshot.storage.k8s.io_volumesnapshotclasses.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/release-6.3/client/config/crd/snapshot.storage.k8s.io_volumesnapshotcontents.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/release-6.3/client/config/crd/snapshot.storage.k8s.io_volumesnapshots.yaml

Deploy the snapshot controller

Create the snapshot controller deployment to handle snapshot operations across your cluster.

kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/release-6.3/deploy/kubernetes/snapshot-controller/rbac-snapshot-controller.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/release-6.3/deploy/kubernetes/snapshot-controller/setup-snapshot-controller.yaml

Create volume snapshot class

Define a snapshot class that specifies which CSI driver handles snapshots and retention policies.

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: csi-snapshot-class
  annotations:
    snapshot.storage.kubernetes.io/is-default-class: "true"
driver: ebs.csi.aws.com  # Replace with your CSI driver
deletionPolicy: Delete
parameters:
  incremental: "true"
  encrypted: "true"
kubectl apply -f snapshot-class.yaml

Create a test persistent volume and data

Set up a test application with persistent storage to demonstrate snapshot functionality.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-pvc
  namespace: default
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: gp3  # Replace with your storage class
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-app
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: test-app
  template:
    metadata:
      labels:
        app: test-app
    spec:
      containers:
      - name: test-container
        image: nginx:1.24
        volumeMounts:
        - name: data-volume
          mountPath: /data
        command: ["/bin/sh", "-c"]
        args:
        - |
          echo "Initial data: $(date)" > /data/test.txt
          nginx -g "daemon off;"
      volumes:
      - name: data-volume
        persistentVolumeClaim:
          claimName: test-pvc
kubectl apply -f test-app.yaml

Create manual volume snapshot

Take a manual snapshot of the persistent volume to capture the current state.

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: test-snapshot-manual
  namespace: default
spec:
  volumeSnapshotClassName: csi-snapshot-class
  source:
    persistentVolumeClaimName: test-pvc
kubectl apply -f manual-snapshot.yaml

Install Velero for automated backups

Velero provides comprehensive backup automation including persistent volumes, application manifests, and cross-cluster restore capabilities.

wget https://github.com/vmware-tanzu/velero/releases/download/v1.12.0/velero-v1.12.0-linux-amd64.tar.gz
tar -xzf velero-v1.12.0-linux-amd64.tar.gz
sudo mv velero-v1.12.0-linux-amd64/velero /usr/local/bin/
sudo chmod +x /usr/local/bin/velero
wget https://github.com/vmware-tanzu/velero/releases/download/v1.12.0/velero-v1.12.0-linux-amd64.tar.gz
tar -xzf velero-v1.12.0-linux-amd64.tar.gz
sudo mv velero-v1.12.0-linux-amd64/velero /usr/local/bin/
sudo chmod +x /usr/local/bin/velero

Configure S3-compatible backup storage

Set up credentials for your backup storage backend. This example uses AWS S3 but works with MinIO or other S3-compatible storage.

[default]
aws_access_key_id = YOUR_ACCESS_KEY
aws_secret_access_key = YOUR_SECRET_KEY
kubectl create secret generic cloud-credentials \
  --namespace velero \
  --from-file cloud=credentials-velero

Install Velero server components

Deploy Velero to your cluster with CSI snapshot plugin support and S3 backend configuration.

velero install \
  --provider aws \
  --plugins velero/velero-plugin-for-aws:v1.8.0,velero/velero-plugin-for-csi:v0.6.0 \
  --bucket your-backup-bucket \
  --backup-location-config region=us-west-2 \
  --snapshot-location-config region=us-west-2 \
  --secret-file ./credentials-velero \
  --features=EnableCSI

Create backup schedule for persistent volumes

Configure automated daily backups that include both Kubernetes manifests and persistent volume snapshots.

apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: daily-backup
  namespace: velero
spec:
  schedule: "0 2   *"  # Daily at 2 AM UTC
  template:
    includedNamespaces:
    - default
    - production
    - staging
    excludedResources:
    - events
    - events.events.k8s.io
    - backups.velero.io
    - restores.velero.io
    snapshotVolumes: true
    includeClusterResources: true
    ttl: 720h  # 30 days retention
    csiSnapshotTimeout: 10m
    itemOperationTimeout: 4h
kubectl apply -f backup-schedule.yaml

Create backup policy with retention

Configure advanced backup policies with different retention periods for different backup types.

apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: weekly-longterm-backup
  namespace: velero
spec:
  schedule: "0 3   0"  # Weekly on Sunday at 3 AM UTC
  template:
    includedNamespaces:
    - production
    snapshotVolumes: true
    includeClusterResources: true
    ttl: 2160h  # 90 days retention
    storageLocation: longterm-storage
    csiSnapshotTimeout: 15m
    metadata:
      labels:
        backup-type: longterm
kubectl apply -f retention-policy.yaml

Configure backup monitoring and alerts

Set up monitoring for backup success and failure notifications using Velero's built-in metrics.

apiVersion: v1
kind: ServiceMonitor
metadata:
  name: velero-metrics
  namespace: velero
  labels:
    app.kubernetes.io/name: velero
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: velero
  endpoints:
  - port: monitoring
    path: /metrics
    interval: 30s
---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: velero-backup-alerts
  namespace: velero
spec:
  groups:
  - name: velero.rules
    rules:
    - alert: VeleroBackupFailed
      expr: velero_backup_failure_total > 0
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Velero backup failed"
        description: "Backup {{ $labels.schedule }} failed with {{ $value }} failures"
kubectl apply -f backup-monitoring.yaml

Test snapshot restore functionality

Verify that snapshots work correctly by restoring data from a previous snapshot.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: restored-pvc
  namespace: default
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  dataSource:
    name: test-snapshot-manual
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
  storageClassName: gp3
kubectl apply -f restore-test.yaml
kubectl get pvc restored-pvc

Verify your setup

Check that all components are running and snapshots are being created successfully.

# Check CSI snapshot controller
kubectl get pods -n kube-system | grep snapshot-controller

Verify snapshot classes

kubectl get volumesnapshotclass

Check manual snapshot status

kubectl get volumesnapshot test-snapshot-manual -o yaml

Verify Velero installation

velero version kubectl get pods -n velero

Check backup schedules

velero schedule get

List recent backups

velero backup get

Check backup locations

velero backup-location get

Automate cross-namespace backups

Create namespace-specific backup policies

Configure different backup strategies for development, staging, and production environments.

apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: production-backup
  namespace: velero
spec:
  schedule: "0 1,13   *"  # Twice daily
  template:
    includedNamespaces:
    - production
    snapshotVolumes: true
    ttl: 168h  # 7 days
---
apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: development-backup
  namespace: velero
spec:
  schedule: "0 4   1-5"  # Weekdays only
  template:
    includedNamespaces:
    - development
    - staging
    snapshotVolumes: false  # Skip volume snapshots for dev
    ttl: 72h  # 3 days
kubectl apply -f namespace-backups.yaml

Common issues

SymptomCauseFix
Snapshot creation failsCSI driver doesn't support snapshotsVerify driver compatibility: kubectl get csidriver
Velero backup hangsCSI timeout too shortIncrease csiSnapshotTimeout in backup spec
Restore fails with permissionsRBAC issues with CSI driverCheck service account permissions: kubectl describe clusterrole velero
Snapshots consume too much storageNo retention policy configuredSet ttl in backup template and configure storage lifecycle
Backup location unreachableInvalid S3 credentials or bucketTest connectivity: velero backup-location get -o yaml
Monitoring alerts not firingServiceMonitor not discoveredCheck Prometheus operator labels and selectors

Next steps

Running this in production?

Want this handled for you? Running this at scale adds a second layer of work: capacity planning, failover drills, cost control, and on-call. See how we run infrastructure like this for European teams.

Automated install script

Run this to automate the entire setup

Need help?

Don't want to manage this yourself?

We handle managed devops services for businesses that depend on uptime. From initial setup to ongoing operations.