API
Edge Cloud
Edge Cloud
OverviewBillingTerraformAnsible
API
Chosen image
Home/Edge Cloud

Troubleshooting a Node in a "Not Ready" State

One of the most frustrating issues when working with Kubernetes is when a node enters the “Not Ready” state, which can disrupt workloads and compromise cluster reliability. This guide will walk you through the steps to identify and resolve issues when a node is in a "Not Ready" state.

Step 1. Query Node Information

Start by querying the node details to see if it reports anything that could point to the issue.

Run the following command to display node details:

$ kubectl describe nodes <NODE_NAME>

This command provides useful information, including the node's conditions, capacity, and allocatable resources. Let’s examine the section relevant to troubleshooting.

Conditions

The Conditions section reports the status of disk and memory. The fields have the following meanings:

  • OutOfDisk indicates whether the node has run out of disk space.
  • MemoryPressure shows if the node is under memory pressure.
  • DiskPressure indicates if disk usage has reached a critical level.
  • Ready is the main indicator you're concerned with. If the node is in a "Not Ready" state, this field will show "False".

Capacity and Allocatable Resources

These fields show the resources available to the node, such as CPU, memory, and the number of pods it can host. Make sure that the available resources meet the needs of your cluster.

  • Capacity resources are resources the node has physically available.
  • Allocatable resources are the resources the node can allocate to pods after subtracting the overhead from the capacity (i.e., resources used by Kubernetes to manage the node).

Ensure that the allocatable resources match the node's actual capacity. Any major discrepancies could indicate resource exhaustion or improper configuration.

Step 2. Check Kubelet Logs

If the node information doesn't provide clear insights, you can SSH into the affected node and check its Kubelet logs. Kubelet is responsible for managing the node's lifecycle, and problems with it often result in nodes being marked as "Not Ready".

Connect to the node with the following command:

$ ssh <NODE_IP_ADDRESS> 

Once inside the node, examine the Kubelet logs for errors, such as authentication, certificate, or other critical issues. If Kubelet is running as a systemd service, use the following command to access its logs:

$ journalctl -u kubelet

This command will display logs generated by the Kubelet service, where you can look for common issues such as:

  • Certificate Errors indicate that the node may be unable to authenticate with the cluster due to expired or incorrect certificates.
  • Authentication Errors can imply misconfigured or missing service accounts or tokens.
  • Network Errors can indicate that Kubelet has trouble communicating with the control plane or other nodes.

An example log output could look like this:

Dec 10 123541 node-name kubelet[1256]: E1210 123541.123456 kubelet_node_status.go:92] Unable to register node "nodename" with API server: Unauthorized

In this case, Kubelet is experiencing authentication issues when registering the node with the API server. This would likely cause the node to appear in a "Not Ready" state.

Step 3: Address Errors

Once you've identified the root cause, address the detected issues. Here are some common solutions based on the type of problem:

  • Resource Exhaustion: If a node runs out of resources (CPU, memory, disk), you can scale the cluster by adding more nodes, upgrading hardware, or adjusting resource limits and requests for the pods.
  • Network Issues: If Kubelet cannot reach the API server or other nodes, verify the nodeʼs network configuration, DNS settings, or firewall rules that might block necessary communication.

Was this article helpful?

Not a Gcore user yet?

Discover our offerings, including virtual instances starting from 3.7 euro/mo, bare metal servers, AI Infrastructure, load balancers, Managed Kubernetes, Function as a Service, and Centralized Logging solutions.

Go to the product page