The Spherelet is based on the Kubernetes “Kubelet” and enables an ESXi hypervisor to act as a Kubernetes worker node. Sometimes you may notice that the worker nodes of your supervisor cluster are having NotReady,SchedulingDisabled status, and it maybe becuase spherelet is not running on those ESXi nodes.
Following are the steps to verify the status of spherelet service, and restart them if required.
Example:
❯ kubectx wdc-01-vcxx Switched to context "wdc-01-vcxx". ❯ kubectl get node NAME STATUS ROLES AGE VERSION 42019f7e751b2818bb0c659028d49fdc Ready control-plane,master 317d v1.22.6+vmware.wcp.2 4201b0b21aed78d8e72bfb622bb8b98b Ready control-plane,master 317d v1.22.6+vmware.wcp.2 4201c53dcef2701a8c36463942d762dc Ready control-plane,master 317d v1.22.6+vmware.wcp.2 wdc-01-rxxesx04.xxxxxxxxx.com Ready agent 317d v1.22.6-sph-db56d46 wdc-01-rxxesx05.xxxxxxxxx.com NotReady,SchedulingDisabled agent 317d v1.22.6-sph-db56d46 wdc-01-rxxesx06.xxxxxxxxx.com Ready agent 317d v1.22.6-sph-db56d46 wdc-01-rxxesx32.xxxxxxxxx.com Ready agent 317d v1.22.6-sph-db56d46 wdc-01-rxxesx33.xxxxxxxxx.com Ready agent 317d v1.22.6-sph-db56d46 wdc-01-rxxesx34.xxxxxxxxx.com Ready agent 317d v1.22.6-sph-db56d46 wdc-01-rxxesx35.xxxxxxxxx.com Ready,SchedulingDisabled agent 317d v1.22.6-sph-db56d46 wdc-01-rxxesx36.xxxxxxxxx.com Ready agent 317d v1.22.6-sph-db56d46 wdc-01-rxxesx37.xxxxxxxxx.com Ready agent 317d v1.22.6-sph-db56d46 wdc-01-rxxesx38.xxxxxxxxx.com Ready agent 317d v1.22.6-sph-db56d46 wdc-01-rxxesx39.xxxxxxxxx.com NotReady,SchedulingDisabled agent 317d v1.22.6-sph-db56d46 wdc-01-rxxesx40.xxxxxxxxx.com Ready agent 317d v1.22.6-sph-db56d46
Logs
- ssh into the ESXi worker node.
tail -f /var/log/spherelet.log
Status
- ssh into the ESXi worker node and run the following:
etc/init.d/spherelet status
- You can check status of spherelet using PowerCLI. Following is an example:
> Connect-VIServer wdc-10-vcxx > Get-VMHost | Get-VMHostService | where {$_.Key -eq "spherelet"} | select VMHost,Key,Running | ft VMHost Key Running ------ --- ------- wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet True wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet True wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet True wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet True wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet True wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet True wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet True wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet True wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet True wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet True wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet True wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet True
Restart
- ssh into the ESXi worker node and run the following:
/etc/init.d/spherelet restart
- You can also restart spherelet service using PowerCLI. Following is an example to restart spherelet service on ALL the ESXi worker nodes of a cluster:
> Get-Cluster Name HAEnabled HAFailover DrsEnabled DrsAutomationLevel Level ---- --------- ---------- ---------- ------------------ wdc-10-vcxxc01 True 1 True FullyAutomated > Get-Cluster -Name wdc-10-vcxxc01 | Get-VMHost | foreach { Restart-VMHostService -HostService ($_ | Get-VMHostService | where {$_.Key -eq "spherelet"}) }
Certificates
You may notice the ESXi worker nodes in NotReady state when the following spherelet certs expire.
- /etc/vmware/spherelet/spherelet.crt
- /etc/vmware/spherelet/client.crt
An example is given below:
❯ kg no
NAME STATUS ROLES AGE VERSION
420802008ec0d8ccaa6ac84140768375 Ready control-plane,master 70d v1.22.6+vmware.wcp.2
42087a63440b500de6cec759bb5900bf Ready control-plane,master 77d v1.22.6+vmware.wcp.2
4208e08c826dfe283c726bc573109dbb Ready control-plane,master 77d v1.22.6+vmware.wcp.2
wdc-08-rxxesx25.xxxxxxxxx.com NotReady agent 370d v1.22.6-sph-db56d46
wdc-08-rxxesx26.xxxxxxxxx.com NotReady agent 370d v1.22.6-sph-db56d46
wdc-08-rxxesx23.xxxxxxxxx.com NotReady agent 370d v1.22.6-sph-db56d46
wdc-08-rxxesx24.xxxxxxxxx.com NotReady agent 370d v1.22.6-sph-db56d46
wdc-08-rxxesx25.xxxxxxxxx.com NotReady agent 370d v1.22.6-sph-db56d46
wdc-08-rxxesx26.xxxxxxxxx.com NotReady agent 370d v1.22.6-sph-db56d46
You can ssh into the ESXi worker nodes and verify the validity of the above mentioned certs. They have a life time of one year.
Example:
[root@wdc-08-rxxesx25:~] openssl x509 -enddate -noout -in /etc/vmware/spherelet/spherelet.crt
notAfter=Sep 1 08:32:24 2023 GMT
[root@wdc-08-rxxesx25:~] openssl x509 -enddate -noout -in /etc/vmware/spherelet/client.crt
notAfter=Sep 1 08:32:24 2023 GMT
Depending on your support contract, if its a production environment you may need to open a case with VMware GSS for resolving this issue.
Ref KBs:
Verify
❯ kubectl get node NAME STATUS ROLES AGE VERSION 42017dcb669bea2962da27fc2f6c16d2 Ready control-plane,master 5d20h v1.23.12+vmware.wcp.1 4201b763c766875b77bcb9f04f8840b3 Ready control-plane,master 5d21h v1.23.12+vmware.wcp.1 4201dab068e9b2d3af3b8fde450b3d96 Ready control-plane,master 5d20h v1.23.12+vmware.wcp.1 wdc-01-rxxesx04.xxxxxxxxx.com Ready agent 5d19h v1.23.5-sph-81ef5d1 wdc-01-rxxesx05.xxxxxxxxx.com Ready agent 5d19h v1.23.5-sph-81ef5d1 wdc-01-rxxesx06.xxxxxxxxx.com Ready agent 5d19h v1.23.5-sph-81ef5d1 wdc-01-rxxesx32.xxxxxxxxx.com Ready agent 5d19h v1.23.5-sph-81ef5d1 wdc-01-rxxesx33.xxxxxxxxx.com Ready agent 5d19h v1.23.5-sph-81ef5d1 wdc-01-rxxesx34.xxxxxxxxx.com Ready agent 5d19h v1.23.5-sph-81ef5d1 wdc-01-rxxesx35.xxxxxxxxx.com Ready agent 5d19h v1.23.5-sph-81ef5d1 wdc-01-rxxesx36.xxxxxxxxx.com Ready agent 5d19h v1.23.5-sph-81ef5d1 wdc-01-rxxesx37.xxxxxxxxx.com Ready agent 5d19h v1.23.5-sph-81ef5d1 wdc-01-rxxesx38.xxxxxxxxx.com Ready agent 5d19h v1.23.5-sph-81ef5d1 wdc-01-rxxesx39.xxxxxxxxx.com Ready agent 5d19h v1.23.5-sph-81ef5d1 wdc-01-rxxesx40.xxxxxxxxx.com Ready agent 5d19h v1.23.5-sph-81ef5d1
Hope it was useful. Cheers!