FigJam Diagram: Velero — K3s Kubernetes Object Backup (expires 2026-04-13)
Velero provides cluster-wide backup of all Kubernetes objects (manifests, CRDs, Secrets, RBAC, etc.) to S3. This is Tier 4 of the backup pipeline — it complements etcd snapshots (Tier 1), PostgreSQL pg_dump (Tier 2), and Longhorn volume backups (Tier 3).
| Namespace | velero |
| S3 Bucket | k3s-homelab-backups-855878721457 (prefix: velero-backups/) |
| Backend | AWS S3 plugin |
| Credentials | cloud-credentials secret in velero namespace |
| Included | Excluded |
|---|---|
| All namespaces except system namespaces | velero, kube-system, kube-node-lease, kube-public |
| All resource types | nodes, events, events.events.k8s.io, Velero internal resources |
| Cluster-scoped resources (CRDs, ClusterRoles, etc.) | — |
Note: Velero backs up Kubernetes objects, not volume data. PVC contents are backed up separately via Longhorn (Tier 3) and pg_dump (Tier 2). For a complete restore you need all four tiers.
| Schedule | Cron | TTL | Retention |
|---|---|---|---|
k3s-daily-backup |
0 2 * * * (2:00 AM UTC) |
720h | 30 days |
k3s-weekly-backup |
0 3 * * 0 (3:00 AM UTC, Sundays) |
8640h | ~360 days (51 weeks) |
k3s-monthly-backup |
0 4 1 * * (4:00 AM UTC, 1st of month) |
8760h | 365 days (12 months) |
velero-cleanup |
0 5 * * * (5:00 AM UTC) |
— | Enforces TTLs |
Note: The manifest comment on
k3s-weekly-backupsays "12 weeks" but the actual TTL value is 8640h = 360 days (~51 weeks). The schedule name and comment are misleading — trust the TTL value.
| Secret | Keys | Purpose |
|---|---|---|
cloud-credentials |
AWS credentials file format | S3 access for backup storage |
Credentials are provisioned from Terraform output — never committed to git.
ServiceMonitor scrapes Velero metrics. PrometheusRule alerts:
| Alert | Condition | Severity |
|---|---|---|
VeleroBackupFailed |
Any backup not in Completed status for 15m |
Critical |
VeleroBackupSizeHigh |
Total backup size > 400GB | Warning |
VeleroBackupStorageAlmostFull |
Total backup size > 80% of 500GB limit | Warning |
| Metric | Description |
|---|---|
velero_backup_last_status |
Last backup status per schedule (1=Completed, 0=Failed) |
velero_backup_size_bytes |
Total bytes of backup data in storage |
velero_backup_total |
Count of backups by schedule |
Grafana dashboard: No dashboard currently configured. Community dashboard gnetId: 15469 (Velero monitoring by mxpbt) can be added to kubernetes/apps/monitoring/prometheus-helm-values.yaml under grafana.dashboards.infrastructure. Covers backup success rate, duration, and size metrics — compatible with the existing Prometheus/ServiceMonitor setup.
# In kubernetes/apps/monitoring/prometheus-helm-values.yaml
grafana:
dashboards:
infrastructure:
velero:
gnetId: 15469
revision: 1
datasource: Prometheus
# List all backups
kubectl exec -n velero deploy/velero -- velero backup get
# Check a specific backup
kubectl exec -n velero deploy/velero -- velero backup describe k3s-daily-backup-20260405020000
# List schedules
kubectl exec -n velero deploy/velero -- velero schedule get
# Trigger an immediate backup
kubectl exec -n velero deploy/velero -- velero backup create manual-$(date +%Y%m%d) \
--include-namespaces='*' \
--exclude-namespaces=velero,kube-system,kube-node-lease,kube-public
# Restore from a backup (dry-run first!)
kubectl exec -n velero deploy/velero -- velero restore create --from-backup k3s-daily-backup-20260405020000 \
--include-namespaces=cardboard \
--wait
Velero restores Kubernetes objects only. Restore PVC data via Longhorn (volume restore from S3 snapshot) and databases via pg_dump restore separately.
velero backup getvelero restore create --from-backup <name>velero restore describe <name>kubernetes/apps/velero/backup-schedules.yaml -- Schedule CRDs (daily/weekly/monthly),
velero-cleanup CronJob,
PrometheusRule, ServiceMonitor