Node Management

Overview

As your K3s cluster grows or hardware changes, you'll need to add, remove, or replace nodes. This guide covers how to safely manage your cluster's node lifecycle.

Adding Nodes

Adding a Control Plane Node (HA Cluster)

To add an additional control plane node to an existing HA cluster:

Prepare the Node:
- Install the operating system
- Configure network (DNS, static IP if needed)
- Ensure node can reach existing cluster nodes

Get the Cluster Token:

# On an existing control plane node
sudo cat /var/lib/rancher/k3s/server/node-token

Install K3s on New Node:

# Replace with your values
curl -sfL https://get.k3s.io | K3S_TOKEN=<cluster-token> sh -s - server \
  --server https://<existing-server-ip>:6443 \
  --node-name <new-node-name>

Verify Node Joined:
```
kubectl get nodes
```
The new node should appear in the list with Ready status.

Adding a Worker Node (Agent)

To add a worker node to your cluster:

Prepare the Node:
- Install the operating system
- Configure network
- Ensure connectivity to control plane

Get Required Information:

# On control plane node
sudo cat /var/lib/rancher/k3s/server/node-token  # Agent token
# Note the server URL (usually https://<server-ip>:6443)

Install K3s Agent:

curl -sfL https://get.k3s.io | K3S_URL=https://<server-ip>:6443 K3S_TOKEN=<agent-token> sh -

Verify Node Joined:
```
kubectl get nodes
```

Removing Nodes

Removing a Worker Node

Drain the Node:

kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data

This safely evicts all pods from the node.

Delete the Node:
```
kubectl delete node <node-name>
```

Stop K3s on the Node:

# On the node being removed
sudo systemctl stop k3s-agent  # For worker nodes
# or
sudo systemctl stop k3s  # If it was a server node

Uninstall K3s (Optional):

# On the node
/usr/local/bin/k3s-uninstall.sh  # For agent
# or
/usr/local/bin/k3s-killall.sh    # For server

Removing a Control Plane Node (HA Cluster)

Important: In an HA cluster, ensure you maintain quorum. For a 3-node cluster, you need at least 2 nodes running.

Verify Cluster Health:

kubectl get nodes
kubectl get pods -n kube-system | grep etcd

Ensure other etcd nodes are healthy.

Drain the Node:

kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data

Remove from etcd Cluster (if needed):
- For embedded etcd, the node should be automatically removed
- Monitor etcd health after removal
Delete the Node:
```
kubectl delete node <node-name>
```

Stop and Uninstall on Node:

# On the node
sudo systemctl stop k3s
/usr/local/bin/k3s-killall.sh

Replacing Nodes

Replacing a Failed Node

When a node fails and needs replacement:

Remove the Failed Node:

# If node is still accessible
kubectl drain <failed-node-name> --ignore-daemonsets --delete-emptydir-data --force
kubectl delete node <failed-node-name>

If node is not accessible:

# Force delete (use with caution)
kubectl delete node <failed-node-name> --force --grace-period=0

Prepare Replacement Node:
- Use same hostname if possible (or update DNS)
- Configure with same network settings
Add Replacement Node:
- Follow "Adding Nodes" procedure above
- Use same node name if possible
Verify Replacement:
```
kubectl get nodes
kubectl get pods -A
```
Ensure pods reschedule and cluster is healthy.

Replacing Control Plane Node (HA)

Ensure Quorum:
- Verify remaining control plane nodes are healthy
- For 3-node cluster, need at least 2 nodes
Remove Failed Node:
```
kubectl delete node <failed-node-name>
```
Add Replacement:
- Follow "Adding Control Plane Node" procedure
- Use same node name and configuration

Verify etcd Health:

kubectl get pods -n kube-system | grep etcd
# Should show expected number of etcd pods

Node Labeling and Tainting

Adding Labels

Labels help organize and select nodes:

# Add label to node
kubectl label nodes <node-name> <key>=<value>

# Example: Label node by type
kubectl label nodes k3s-worker-1.cluster node-type=worker
kubectl label nodes k3s-server-1.cluster node-type=control-plane

Using Node Selectors

Use node selectors in pod specs to schedule on specific nodes:

apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  nodeSelector:
    node-type: worker
  containers:
    - name: app
      image: nginx

Tainting Nodes

Taints prevent pods from scheduling on nodes (unless they have matching tolerations):

# Add taint
kubectl taint nodes <node-name> <key>=<value>:<effect>

# Example: Make node dedicated for specific workload
kubectl taint nodes k3s-worker-1.cluster dedicated=app:NoSchedule

# Remove taint
kubectl taint nodes <node-name> <key>-

Taint Effects:

NoSchedule: Pods without toleration won't be scheduled
PreferNoSchedule: Prefer not to schedule, but allow if needed
NoExecute: Evict existing pods without toleration

Adding Tolerations

To allow pods to schedule on tainted nodes:

apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  tolerations:
    - key: 'dedicated'
      operator: 'Equal'
      value: 'app'
      effect: 'NoSchedule'
  containers:
    - name: app
      image: nginx

Node Maintenance Mode

Cordon/Uncordon

Temporarily prevent scheduling on a node:

# Prevent new pods from scheduling
kubectl cordon <node-name>

# Allow scheduling again
kubectl uncordon <node-name>

Drain for Maintenance

Safely prepare node for maintenance:

# Drain node (evicts pods, marks unschedulable)
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data

# Perform maintenance...

# Make node schedulable again
kubectl uncordon <node-name>

Node Configuration

Viewing Node Configuration

# Get node details
kubectl get node <node-name> -o yaml

# Get node status
kubectl describe node <node-name>

Updating Node Configuration

Most node configuration is done via K3s installation parameters. To change:

Stop K3s:
```
sudo systemctl stop k3s
```
Edit Configuration:
```
sudo nano /etc/rancher/k3s/config.yaml
```
Restart K3s:
```
sudo systemctl start k3s
```

Best Practices

Always Drain Before Removal:
- Safely evict pods before removing nodes
- Prevents data loss and service interruption
Maintain Quorum (HA Clusters):
- Never remove nodes that would break etcd quorum
- For 3-node cluster, always keep at least 2 nodes
Use Consistent Naming:
- Use DNS names instead of IPs
- Maintain consistent hostnames
Label Nodes Appropriately:
- Use labels for organization
- Helps with pod scheduling and management
Monitor After Changes:
- Watch cluster health after node changes
- Verify pods reschedule correctly
Backup Before Major Changes:
- Take etcd snapshots before removing control plane nodes
- Backup persistent volumes if needed

Troubleshooting Node Issues

Node Won't Join Cluster

Check Network Connectivity:
```
ping <server-ip>
telnet <server-ip> 6443
```
Verify Token:
- Ensure token is correct
- Check token hasn't expired
Check Firewall:
- Ensure ports 6443, 10250 are open
- Verify node can reach server
Review Logs:
```
sudo journalctl -u k3s -n 100
```

Node Shows as NotReady

Check K3s Service:
```
sudo systemctl status k3s
```
Verify Network:
- Check node can reach other nodes
- Verify DNS resolution
Check Resources:
- Verify sufficient disk space
- Check memory availability

K3s Maintenance Overview - Maintenance overview
Updating K3s - Node update procedures
Health Checks - Node health verification
K3s Setup - Initial cluster setup

Overview​

Adding Nodes​

Adding a Control Plane Node (HA Cluster)​

Adding a Worker Node (Agent)​

Removing Nodes​

Removing a Worker Node​

Removing a Control Plane Node (HA Cluster)​

Replacing Nodes​

Replacing a Failed Node​

Replacing Control Plane Node (HA)​

Node Labeling and Tainting​

Adding Labels​

Using Node Selectors​

Tainting Nodes​

Adding Tolerations​

Node Maintenance Mode​

Cordon/Uncordon​

Drain for Maintenance​

Node Configuration​

Viewing Node Configuration​

Updating Node Configuration​

Best Practices​

Troubleshooting Node Issues​

Node Won't Join Cluster​

Node Shows as NotReady​

Related Documentation​