One of the most frustrating issues when working with Kubernetes is when a node enters the “Not Ready” state, which can disrupt workloads and compromise cluster reliability. This guide will walk you through the steps to identify and resolve issues when a node is in a "Not Ready" state.
Start by querying the node details to see if it reports anything that could point to the issue.
Run the following command to display node details:
$ kubectl describe nodes <NODE_NAME>
This command provides useful information, including the node's conditions, capacity, and allocatable resources. Let’s examine the section relevant to troubleshooting.
The Conditions section reports the status of disk and memory. The fields have the following meanings:
These fields show the resources available to the node, such as CPU, memory, and the number of pods it can host. Make sure that the available resources meet the needs of your cluster.
Ensure that the allocatable resources match the node's actual capacity. Any major discrepancies could indicate resource exhaustion or improper configuration.
If the node information doesn't provide clear insights, you can SSH into the affected node and check its Kubelet logs. Kubelet is responsible for managing the node's lifecycle, and problems with it often result in nodes being marked as "Not Ready".
Connect to the node with the following command:
$ ssh <NODE_IP_ADDRESS>
Once inside the node, examine the Kubelet logs for errors, such as authentication, certificate, or other critical issues. If Kubelet is running as a systemd service, use the following command to access its logs:
$ journalctl -u kubelet
This command will display logs generated by the Kubelet service, where you can look for common issues such as:
An example log output could look like this:
Dec 10 123541 node-name kubelet[1256]: E1210 123541.123456 kubelet_node_status.go:92] Unable to register node "nodename" with API server: Unauthorized
In this case, Kubelet is experiencing authentication issues when registering the node with the API server. This would likely cause the node to appear in a "Not Ready" state.
Once you've identified the root cause, address the detected issues. Here are some common solutions based on the type of problem:
Was this article helpful?
Discover our offerings, including virtual instances starting from 3.7 euro/mo, bare metal servers, AI Infrastructure, load balancers, Managed Kubernetes, Function as a Service, and Centralized Logging solutions.