K3s etcd Snapshots

Overview

K3s uses an embedded etcd database to store cluster state. Regular snapshots ensure you can recover the control plane in case of cluster failure. This layer protects your Kubernetes API objects, cluster configuration, and control plane state.

Configuration

The etcd backup is configured via Ansible automation. The configuration is stored in /etc/rancher/k3s/config.yaml on each control plane node:

etcd-s3: true
etcd-s3-bucket: k3s-backup-repository
etcd-s3-folder: k3s-etcd-snapshots
etcd-s3-endpoint: '<YOUR_ACCOUNT_ID>.r2.cloudflarestorage.com'
etcd-s3-access-key: 'YOUR_R2_ACCESS_KEY_ID'
etcd-s3-secret-key: 'YOUR_R2_SECRET_ACCESS_KEY'
etcd-snapshot-schedule-cron: '0 1 * * *'
etcd-snapshot-retention: 5

Configuration Parameters

etcd-s3: Enable S3-compatible storage for etcd snapshots
etcd-s3-bucket: Your Cloudflare R2 bucket name
etcd-s3-folder: Folder path within the bucket for snapshots
etcd-s3-endpoint: R2 endpoint URL
etcd-s3-access-key: R2 access key ID
etcd-s3-secret-key: R2 secret access key
etcd-snapshot-schedule-cron: Cron expression for automatic snapshots (default: daily at 1:00 AM)
etcd-snapshot-retention: Number of snapshots to retain (default: 5)

Ansible Automation

If you use Ansible for automation, create a playbook to configure all master nodes. Here's an example playbook structure:

---
- name: Configure K3s server node
  hosts: master_nodes
  become: true
  tasks:
    - name: Ensure K3s config directory exists
      file:
        path: /etc/rancher/k3s
        state: directory
        owner: root
        group: root
        mode: '0755'

    - name: Place K3s config file with etcd backup settings
      copy:
        dest: /etc/rancher/k3s/config.yaml
        owner: root
        group: root
        mode: '0644'
        content: |
          etcd-s3: true
          etcd-s3-bucket: your-backup-bucket
          etcd-s3-folder: k3s-etcd-snapshots
          etcd-s3-endpoint: "<YOUR_ACCOUNT_ID>.r2.cloudflarestorage.com"
          etcd-s3-access-key: "YOUR_R2_ACCESS_KEY_ID"
          etcd-s3-secret-key: "YOUR_R2_SECRET_ACCESS_KEY"
          etcd-snapshot-schedule-cron: "0 1 * * *"
          etcd-snapshot-retention: 5

    - name: Restart K3s to apply configuration
      systemd:
        name: k3s
        state: restarted

Run the playbook:

ansible-playbook -i inventory.yml playbooks/etcd-cloudflare-r2.yaml --ask-become-pass

This playbook:

Creates the /etc/rancher/k3s directory if it doesn't exist
Places the configuration file with R2 credentials
Restarts the k3s service to apply changes

Manual Configuration

If you prefer to configure manually on each control plane node:

Create the config directory:
```
sudo mkdir -p /etc/rancher/k3s
```
Create the config file:
```
sudo nano /etc/rancher/k3s/config.yaml
```
Add the configuration (see Configuration section above)
Restart k3s:
```
sudo systemctl restart k3s
```

Manual Snapshot

You can trigger a manual snapshot at any time:

sudo k3s etcd-snapshot save

This creates an immediate snapshot and uploads it to your R2 bucket.

Snapshot with Custom Name

sudo k3s etcd-snapshot save my-custom-snapshot-name

Verification

Check snapshot files in R2:
- Log into Cloudflare dashboard
- Navigate to your R2 bucket
- Check the k3s-etcd-snapshots folder for snapshot files
List local snapshots:
```
sudo k3s etcd-snapshot list
```

Check snapshot schedule:

sudo cat /etc/rancher/k3s/config.yaml | grep etcd-snapshot-schedule

Restore from etcd Snapshot

To restore a cluster from an etcd snapshot:

Prerequisites

Fresh k3s installation (or cluster reset)
Access to R2 bucket with snapshots
R2 credentials

Restore Procedure

List available snapshots in R2:
- Check your Cloudflare R2 bucket
- Note the snapshot file name

On a fresh k3s installation, restore the snapshot:

sudo k3s server \
  --cluster-init \
  --etcd-s3 \
  --etcd-s3-bucket k3s-backup-repository \
  --etcd-s3-folder k3s-etcd-snapshots \
  --etcd-s3-endpoint "<YOUR_ACCOUNT_ID>.r2.cloudflarestorage.com" \
  --etcd-s3-access-key "YOUR_R2_ACCESS_KEY_ID" \
  --etcd-s3-secret-key "YOUR_R2_SECRET_ACCESS_KEY" \
  --cluster-reset-restore-path <snapshot-name>

Verify cluster state:

sudo k3s kubectl get nodes
sudo k3s kubectl get pods --all-namespaces

Cluster Reset

If you need to reset the cluster to restore from a snapshot:

sudo k3s-killall.sh
sudo k3s-uninstall.sh
# Then reinstall and restore as shown above

Troubleshooting

Snapshot Not Created

Verify k3s config:
```
sudo cat /etc/rancher/k3s/config.yaml
```
Check k3s logs:
```
sudo journalctl -u k3s -f
```
Test R2 connectivity:
- Verify R2 credentials are correct
- Check network connectivity to R2 endpoint
- Verify bucket exists and is accessible

Snapshot Upload Fails

Check R2 credentials:
- Verify access key and secret key are correct
- Ensure the API token has Object Read & Write permissions
Verify bucket configuration:
- Check bucket name matches configuration
- Verify endpoint URL is correct

Check network:

curl -I https://<ACCOUNT_ID>.r2.cloudflarestorage.com

Snapshot Not Scheduled

Verify cron expression:

sudo cat /etc/rancher/k3s/config.yaml | grep etcd-snapshot-schedule-cron

Check k3s service status:
```
sudo systemctl status k3s
```

Review k3s logs for snapshot activity:

sudo journalctl -u k3s | grep etcd-snapshot

Best Practices

Regular Testing: Periodically test restoring from snapshots to ensure they work
Monitor Retention: Adjust retention based on your needs (default: 5 snapshots)
Secure Credentials: Store R2 credentials securely, consider using Vault
Document Snapshots: Keep a log of important snapshots (e.g., before major upgrades)
Multiple Buckets: Consider separate buckets for different environments

References

K3s etcd documentation: https://docs.k3s.io/backup-restore
K3s backup guide: https://docs.k3s.io/backup-restore/backup

Overview​

Configuration​

Configuration Parameters​

Ansible Automation​

Manual Configuration​

Manual Snapshot​

Snapshot with Custom Name​

Verification​

Restore from etcd Snapshot​

Prerequisites​

Restore Procedure​

Cluster Reset​

Troubleshooting​

Snapshot Not Created​

Snapshot Upload Fails​

Snapshot Not Scheduled​

Best Practices​

References​