Updating K3s
Overview
Updating K3s involves safely taking each node offline (one at a time), performing the update, then bringing the node back into the cluster. This process ensures your workloads remain available during the update.
Pre-Update Checklist
Before starting any update, complete these steps:
-
Backup Your Cluster: Ensure you have recent backups of:
- etcd snapshots (for control plane nodes)
- Persistent volumes (via Longhorn or Velero)
- Important configuration files
-
Check Current Version: Verify your current K3s version:
k3s --version -
Review Release Notes: Check the K3s release notes for breaking changes or important updates.
-
Plan Update Order: For multi-node clusters:
- Update worker nodes first (if you have any)
- Then update control plane nodes one at a time
- Always maintain quorum in HA setups (e.g., 2 out of 3 nodes available)
Update Process
Step 1: Drain the Node
When performing maintenance (such as updating K3s), it's important to "drain" the node to protect your workloads and avoid interruptions.
What Does "Draining" a Node Mean?
- Draining safely evicts all non-essential pods from the node, allowing Kubernetes to reschedule them on other nodes.
- It also makes the node "unschedulable," ensuring no new pods can be assigned to the node while it's offline.
What Does "Evicting" a Pod Mean?
In Kubernetes, "evicting" refers to the process of safely terminating Pods on a node, typically to free up resources or for maintenance, allowing them to be rescheduled on other nodes.
How to Drain a Node
To drain a node, run the following command replacing <node-name> with the name
of the node you want to update:
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data
Explanation of Command Options:
--ignore-daemonsets: Prevents Kubernetes from evicting system-critical pods managed by DaemonSets (these won't be touched).--delete-emptydir-data: Deletes any storage associated withEmptyDirvolumes (used for temporary data in pods).
Example:
kubectl drain k3s-server-1.cluster --ignore-daemonsets --delete-emptydir-data
Wait for the drain to complete. You should see output indicating that pods have been evicted and the node is now unschedulable.
Step 2: Stop the K3s Service
To update K3s, we first need to stop the running K3s service on the node:
sudo systemctl stop k3s
This command stops K3s gracefully, which ensures everything halts correctly and there's no risk of corruption during the update.
Step 3: Update K3s
Now, let's update K3s to its newest version. You can use the official K3s installation script to do this in a streamlined way. Running the script below will automatically detect the current installation and update it to the latest available version:
curl -sfL https://get.k3s.io | sh -
To Update to a Specific Version:
If you need to update to a specific version (recommended for production), you can specify the version:
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.28.5+k3s1 sh -
Replace v1.28.5+k3s1 with your desired version. Check
K3s releases for available versions.
The script will download, install, and configure the new version of K3s while keeping all your configurations in place.
Step 4: Start the K3s Service
Once the update finishes, restart the K3s service on the node to bring it back online:
sudo systemctl start k3s
This will load the new K3s version and all services will resume.
Step 5: Verify the Service Started
Check that K3s started successfully:
sudo systemctl status k3s
You should see Active: active (running). If there are any errors, check the
logs:
sudo journalctl -u k3s -f
Step 6: Uncordon the Node
What Is "Uncordoning"?
After an update, we need to make the node available again for scheduling new pods, i.e., undo the "unschedulable" state created during the drain.
How to Uncordon a Node
To let Kubernetes know this node is now ready to schedule new pods again:
kubectl uncordon <node-name>
This command marks the node as "schedulable," meaning new pods can now be assigned to it.
Example:
kubectl uncordon k3s-server-1.cluster
Step 7: Verify the Update
Once the node is back online, verify the K3s version to confirm that the update was successful:
k3s --version
Check that the output shows the new version installed.
Also verify the node is ready:
kubectl get nodes
You should see the node status as Ready.
Post-Update Verification
After updating all nodes, perform these checks:
-
Check All Nodes Are Ready:
kubectl get nodes -
Verify Cluster Components:
kubectl get pods -AEnsure all system pods are running.
-
Test Application Functionality:
- Access your applications
- Verify services are responding
- Check ingress routing
-
Monitor for Issues:
- Watch logs for errors:
kubectl logs -n <namespace> <pod-name> - Check resource usage:
kubectl top nodes - Monitor for 24-48 hours after updates
- Watch logs for errors:
Updating Worker Nodes
If you have worker nodes (agents), the process is similar but uses the agent installation script:
- Drain the worker node
- Stop the K3s agent service:
sudo systemctl stop k3s-agent - Update using the agent script:
curl -sfL https://get.k3s.io | K3S_URL=https://<server-ip>:6443 K3S_TOKEN=<token> sh - - Start the service:
sudo systemctl start k3s-agent - Uncordon the node
Troubleshooting Update Issues
Node Won't Start After Update
- Check service status:
sudo systemctl status k3s - Review logs:
sudo journalctl -u k3s -n 100 - Verify configuration files in
/etc/rancher/k3s/ - Check for certificate issues:
kubectl get nodesshould show the node
Pods Not Scheduling After Uncordon
- Check node conditions:
kubectl describe node <node-name> - Verify node has resources:
kubectl describe node <node-name> | grep -A 5 "Allocated resources" - Check for taints:
kubectl describe node <node-name> | grep Taints
Cluster Connectivity Issues
- Verify network connectivity between nodes
- Check firewall rules
- Verify DNS resolution for node names
- Review etcd health (for HA clusters)
Related Documentation
- K3s Maintenance Overview - Other maintenance tasks
- Health Checks - Post-update health verification
- Troubleshooting - Resolving update issues
- Backup Strategy - Pre-update backup procedures