K3s etcd Snapshots
Overview
K3s uses an embedded etcd database to store cluster state. Regular snapshots ensure you can recover the control plane in case of cluster failure. This layer protects your Kubernetes API objects, cluster configuration, and control plane state.
Configuration
The etcd backup is configured via Ansible automation. The configuration is
stored in /etc/rancher/k3s/config.yaml on each control plane node:
etcd-s3: true
etcd-s3-bucket: k3s-backup-repository
etcd-s3-folder: k3s-etcd-snapshots
etcd-s3-endpoint: '<YOUR_ACCOUNT_ID>.r2.cloudflarestorage.com'
etcd-s3-access-key: 'YOUR_R2_ACCESS_KEY_ID'
etcd-s3-secret-key: 'YOUR_R2_SECRET_ACCESS_KEY'
etcd-snapshot-schedule-cron: '0 1 * * *'
etcd-snapshot-retention: 5
Configuration Parameters
etcd-s3: Enable S3-compatible storage for etcd snapshotsetcd-s3-bucket: Your Cloudflare R2 bucket nameetcd-s3-folder: Folder path within the bucket for snapshotsetcd-s3-endpoint: R2 endpoint URLetcd-s3-access-key: R2 access key IDetcd-s3-secret-key: R2 secret access keyetcd-snapshot-schedule-cron: Cron expression for automatic snapshots (default: daily at 1:00 AM)etcd-snapshot-retention: Number of snapshots to retain (default: 5)
Ansible Automation
If you use Ansible for automation, create a playbook to configure all master nodes. Here's an example playbook structure:
---
- name: Configure K3s server node
hosts: master_nodes
become: true
tasks:
- name: Ensure K3s config directory exists
file:
path: /etc/rancher/k3s
state: directory
owner: root
group: root
mode: '0755'
- name: Place K3s config file with etcd backup settings
copy:
dest: /etc/rancher/k3s/config.yaml
owner: root
group: root
mode: '0644'
content: |
etcd-s3: true
etcd-s3-bucket: your-backup-bucket
etcd-s3-folder: k3s-etcd-snapshots
etcd-s3-endpoint: "<YOUR_ACCOUNT_ID>.r2.cloudflarestorage.com"
etcd-s3-access-key: "YOUR_R2_ACCESS_KEY_ID"
etcd-s3-secret-key: "YOUR_R2_SECRET_ACCESS_KEY"
etcd-snapshot-schedule-cron: "0 1 * * *"
etcd-snapshot-retention: 5
- name: Restart K3s to apply configuration
systemd:
name: k3s
state: restarted
Run the playbook:
ansible-playbook -i inventory.yml playbooks/etcd-cloudflare-r2.yaml --ask-become-pass
This playbook:
- Creates the
/etc/rancher/k3sdirectory if it doesn't exist - Places the configuration file with R2 credentials
- Restarts the k3s service to apply changes
Manual Configuration
If you prefer to configure manually on each control plane node:
-
Create the config directory:
sudo mkdir -p /etc/rancher/k3s -
Create the config file:
sudo nano /etc/rancher/k3s/config.yaml -
Add the configuration (see Configuration section above)
-
Restart k3s:
sudo systemctl restart k3s
Manual Snapshot
You can trigger a manual snapshot at any time:
sudo k3s etcd-snapshot save
This creates an immediate snapshot and uploads it to your R2 bucket.
Snapshot with Custom Name
sudo k3s etcd-snapshot save my-custom-snapshot-name
Verification
-
Check snapshot files in R2:
- Log into Cloudflare dashboard
- Navigate to your R2 bucket
- Check the
k3s-etcd-snapshotsfolder for snapshot files
-
List local snapshots:
sudo k3s etcd-snapshot list -
Check snapshot schedule:
sudo cat /etc/rancher/k3s/config.yaml | grep etcd-snapshot-schedule
Restore from etcd Snapshot
To restore a cluster from an etcd snapshot:
Prerequisites
- Fresh k3s installation (or cluster reset)
- Access to R2 bucket with snapshots
- R2 credentials
Restore Procedure
-
List available snapshots in R2:
- Check your Cloudflare R2 bucket
- Note the snapshot file name
-
On a fresh k3s installation, restore the snapshot:
sudo k3s server \
--cluster-init \
--etcd-s3 \
--etcd-s3-bucket k3s-backup-repository \
--etcd-s3-folder k3s-etcd-snapshots \
--etcd-s3-endpoint "<YOUR_ACCOUNT_ID>.r2.cloudflarestorage.com" \
--etcd-s3-access-key "YOUR_R2_ACCESS_KEY_ID" \
--etcd-s3-secret-key "YOUR_R2_SECRET_ACCESS_KEY" \
--cluster-reset-restore-path <snapshot-name> -
Verify cluster state:
sudo k3s kubectl get nodes
sudo k3s kubectl get pods --all-namespaces
Cluster Reset
If you need to reset the cluster to restore from a snapshot:
sudo k3s-killall.sh
sudo k3s-uninstall.sh
# Then reinstall and restore as shown above
Troubleshooting
Snapshot Not Created
-
Verify k3s config:
sudo cat /etc/rancher/k3s/config.yaml -
Check k3s logs:
sudo journalctl -u k3s -f -
Test R2 connectivity:
- Verify R2 credentials are correct
- Check network connectivity to R2 endpoint
- Verify bucket exists and is accessible
Snapshot Upload Fails
-
Check R2 credentials:
- Verify access key and secret key are correct
- Ensure the API token has Object Read & Write permissions
-
Verify bucket configuration:
- Check bucket name matches configuration
- Verify endpoint URL is correct
-
Check network:
curl -I https://<ACCOUNT_ID>.r2.cloudflarestorage.com
Snapshot Not Scheduled
-
Verify cron expression:
sudo cat /etc/rancher/k3s/config.yaml | grep etcd-snapshot-schedule-cron -
Check k3s service status:
sudo systemctl status k3s -
Review k3s logs for snapshot activity:
sudo journalctl -u k3s | grep etcd-snapshot
Best Practices
- Regular Testing: Periodically test restoring from snapshots to ensure they work
- Monitor Retention: Adjust retention based on your needs (default: 5 snapshots)
- Secure Credentials: Store R2 credentials securely, consider using Vault
- Document Snapshots: Keep a log of important snapshots (e.g., before major upgrades)
- Multiple Buckets: Consider separate buckets for different environments
References
- K3s etcd documentation: https://docs.k3s.io/backup-restore
- K3s backup guide: https://docs.k3s.io/backup-restore/backup