vineethac.blogspot.com: Worker Nodes

Showing posts with label Worker Nodes. Show all posts

Wednesday, June 26, 2024

vSphere with Tanzu using NSX-T - Part33 - Troubleshooting intermittent connection timeouts to apiserver and workloads

In the realm of managing Tanzu Kubernetes clusters (TKCs), we have encountered several challenges that hindered the smooth functioning of our applications. In this blog post, we will discuss three such cases and the workarounds we employed to resolve them.

Case 1: TKC Control Plane Node Connectivity Issues

Symptoms:

TKC apiserver connection timeouts when attempting to connect using the kubeconfig.
Traffic was not flowing to two of the control plane nodes.
NSX-T web UI LB VS stats indicated this issue.

Case 2: TKC Worker Node Connectivity Issues

Symptoms:

Workload (example: PostgreSQL cluster) connection timeouts.
Traffic was not flowing to two of the worker nodes in the TKC.
NSX-T web UI LB VS stats indicated this issue.

Case 3: Load Balancer Connectivity Issues

Symptoms:

Connection timeouts when attempting to connect to a PostgreSQL workload through the load balancer VS IP.
This issue was observed only when creating new services of type LoadBalancer in the TKC.
We noticed datapath mempool usage for the edge nodes was above the threshold value.

Resolution/ work around

Find the T1 router that is attached to the TKC which has connectivity issues.
In an Active - Standby HA configuration, you will see that there will be one Edge node that will be Active and another one in Standby status.
First place the Standby Edge node in NSX MM, reboot it, and then exit it from NSX MM.
Now, place the Active Edge node in NSX MM, there will be a slight network disruption during this failover, once it is in NSX MM, reboot it, and then exit NSX MM.
This should resolved the issue.

In conclusion, these cases illustrate the importance of verifying NSX-T components when managing Tanzu Kubernetes clusters. By identifying the root cause of the issues and employing effective workarounds, we were able to restore functionality and maintain the health of our applications. Stay tuned for more insights and best practices in managing Kubernetes clusters.

Hope it was useful. Cheers!

Saturday, May 20, 2023

vSphere with Tanzu using NSX-T - Part25 - Spherelet

The Spherelet is based on the Kubernetes “Kubelet” and enables an ESXi hypervisor to act as a Kubernetes worker node. Sometimes you may notice that the worker nodes of your supervisor cluster are having NotReady,SchedulingDisabled status, and it maybe becuase spherelet is not running on those ESXi nodes.

Following are the steps to verify the status of spherelet service, and restart them if required.

Example:

❯ kubectx wdc-01-vcxx
Switched to context "wdc-01-vcxx".
❯ kubectl get node
NAME                               STATUS                        ROLES                  AGE    VERSION
42019f7e751b2818bb0c659028d49fdc   Ready                         control-plane,master   317d   v1.22.6+vmware.wcp.2
4201b0b21aed78d8e72bfb622bb8b98b   Ready                         control-plane,master   317d   v1.22.6+vmware.wcp.2
4201c53dcef2701a8c36463942d762dc   Ready                         control-plane,master   317d   v1.22.6+vmware.wcp.2
wdc-01-rxxesx04.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx05.xxxxxxxxx.com      NotReady,SchedulingDisabled   agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx06.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx32.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx33.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx34.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx35.xxxxxxxxx.com      Ready,SchedulingDisabled      agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx36.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx37.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx38.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx39.xxxxxxxxx.com      NotReady,SchedulingDisabled   agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx40.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46

Logs

ssh into the ESXi worker node.

tail -f /var/log/spherelet.log

Status

ssh into the ESXi worker node and run the following:

etc/init.d/spherelet status

You can check status of spherelet using PowerCLI. Following is an example:

> Connect-VIServer wdc-10-vcxx

> Get-VMHost | Get-VMHostService | where {$_.Key -eq "spherelet"}  | select VMHost,Key,Running | ft

VMHost                        Key       Running
------                        ---       -------
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True

Restart

ssh into the ESXi worker node and run the following:

/etc/init.d/spherelet restart

You can also restart spherelet service using PowerCLI. Following is an example to restart spherelet service on ALL the ESXi worker nodes of a cluster:

> Get-Cluster

Name                           HAEnabled  HAFailover DrsEnabled DrsAutomationLevel
                                          Level
----                           ---------  ---------- ---------- ------------------
wdc-10-vcxxc01                 True       1          True       FullyAutomated

> Get-Cluster -Name wdc-10-vcxxc01 | Get-VMHost | foreach { Restart-VMHostService -HostService ($_ | Get-VMHostService | where {$_.Key -eq "spherelet"}) }

Certificates

You may notice the ESXi worker nodes in NotReady state when the following spherelet certs expire.

/etc/vmware/spherelet/spherelet.crt
/etc/vmware/spherelet/client.crt

An example is given below:

❯ kg no
NAME                               STATUS     ROLES                  AGE    VERSION
420802008ec0d8ccaa6ac84140768375   Ready      control-plane,master   70d    v1.22.6+vmware.wcp.2
42087a63440b500de6cec759bb5900bf   Ready      control-plane,master   77d    v1.22.6+vmware.wcp.2
4208e08c826dfe283c726bc573109dbb   Ready      control-plane,master   77d    v1.22.6+vmware.wcp.2
wdc-08-rxxesx25.xxxxxxxxx.com      NotReady   agent                  370d   v1.22.6-sph-db56d46
wdc-08-rxxesx26.xxxxxxxxx.com      NotReady   agent                  370d   v1.22.6-sph-db56d46
wdc-08-rxxesx23.xxxxxxxxx.com      NotReady   agent                  370d   v1.22.6-sph-db56d46
wdc-08-rxxesx24.xxxxxxxxx.com      NotReady   agent                  370d   v1.22.6-sph-db56d46
wdc-08-rxxesx25.xxxxxxxxx.com      NotReady   agent                  370d   v1.22.6-sph-db56d46
wdc-08-rxxesx26.xxxxxxxxx.com      NotReady   agent                  370d   v1.22.6-sph-db56d46

You can ssh into the ESXi worker nodes and verify the validity of the above mentioned certs. They have a life time of one year.

Example:

[root@wdc-08-rxxesx25:~] openssl x509 -enddate -noout -in /etc/vmware/spherelet/spherelet.crt
notAfter=Sep  1 08:32:24 2023 GMT
[root@wdc-08-rxxesx25:~] openssl x509 -enddate -noout -in /etc/vmware/spherelet/client.crt
notAfter=Sep  1 08:32:24 2023 GMT

Depending on your support contract, if its a production environment you may need to open a case with VMware GSS for resolving this issue.

Ref KBs:

Verify

❯ kubectl get node
NAME                               STATUS   ROLES                  AGE     VERSION
42017dcb669bea2962da27fc2f6c16d2   Ready    control-plane,master   5d20h   v1.23.12+vmware.wcp.1
4201b763c766875b77bcb9f04f8840b3   Ready    control-plane,master   5d21h   v1.23.12+vmware.wcp.1
4201dab068e9b2d3af3b8fde450b3d96   Ready    control-plane,master   5d20h   v1.23.12+vmware.wcp.1
wdc-01-rxxesx04.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx05.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx06.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx32.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx33.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx34.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx35.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx36.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx37.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx38.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx39.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx40.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1

Hope it was useful. Cheers!

Saturday, August 13, 2022

vSphere with Tanzu using NSX-T - Part18 - Troubleshooting vSphere pods with ProviderFailed status

In this article, we will take a look at fixing vSphere pods with ProviderFailed status. Following is an example:

svc-opa-gatekeeper-domain-c61                 gatekeeper-controller-manager-5ccbc7fd79-5gn2n                    0/1     ProviderFailed     0          2d14h
svc-opa-gatekeeper-domain-c61                 gatekeeper-controller-manager-5ccbc7fd79-5jtvj                    0/1     ProviderFailed     0          2d13h
svc-opa-gatekeeper-domain-c61                 gatekeeper-controller-manager-5ccbc7fd79-5whtt                    0/1     ProviderFailed     0          2d14h
svc-opa-gatekeeper-domain-c61                 gatekeeper-controller-manager-5ccbc7fd79-6p2zv                    0/1     ProviderFailed     0          2d13h
svc-opa-gatekeeper-domain-c61                 gatekeeper-controller-manager-5ccbc7fd79-7r92p                    0/1     ProviderFailed     0          2d14h

When describing the pod, you can see the message "Unable to find backing for logical switch".

❯ 
❯ kd po gatekeeper-controller-manager-5ccbc7fd79-5gn2n -n svc-opa-gatekeeper-domain-c61
Name:                 gatekeeper-controller-manager-5ccbc7fd79-5gn2n
Namespace:            svc-opa-gatekeeper-domain-c61
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 esx-1.sddc-35-82-xxxxx.xxxxxxx.com/
Labels:               control-plane=controller-manager
                      gatekeeper.sh/operation=webhook
                      gatekeeper.sh/system=yes
                      pod-template-hash=5ccbc7fd79
Annotations:          attachment_id: 668b681b-fef6-43e5-8009-5ac8deb6da11
                      kubernetes.io/psp: wcp-default-psp
                      mac: 04:50:56:00:08:1e
                      vlan: None
                      vmware-system-ephemeral-disk-uuid: 6000C297-d1ba-ce8c-97ba-683a3c8f5321
                      vmware-system-image-references: {"manager":"gatekeeper-111fd0f684141bdad12c811b4f954ae3d60a6c27-v52049"}
                      vmware-system-vm-moid: vm-89777:750f38c6-3b0e-41b7-a94f-4d4aef08e19b
                      vmware-system-vm-uuid: 500c9c37-7055-1708-92d4-8ffdf932c8f9
Status:               Failed
Reason:               ProviderFailed
Message:              Unable to find backing for logical switch 03f0dcd4-a5d9-431e-ae9e-d796ddca0131: timed out waiting for the condition Unable to find backing for logical switch: 03f0dcd4-a5d9-431e-ae9e-d796ddca0131
IP:
IPs:                  <none>

A workaround for this is to restart the spherelet service on the ESXi host where you see this issue. If there are multiple ESXi nodes having same issue, you could consider restarting the spherelet service on all ESXi worker nodes. In a production setup you may want to consider placing the ESXi in maintenance mode before restarting the spherelet service. In my case, we usually restart the spherelet service directly without placing the ESXi in MM. Following is the PowerCLI way to check/ restart spherelet service on ESXi worker nodes:

> Connect-VIServer wdc-10-vc21

> Get-VMHost | Get-VMHostService | where {$_.Key -eq "spherelet"}  | select VMHost,Key,Running | ft

VMHost                        Key       Running
------                        ---       -------
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True

> $sphereletservice = Get-VMHost wdc-10-r0xxxxxxxxxxxxxxxxxxxx | Get-VMHostService | where {$_.Key -eq "spherelet"}
> Stop-VMHostService -HostService $sphereletservice

Perform operation?
Perform operation Stop host service. on spherelet?
[Y] Yes  [A] Yes to All  [N] No  [L] No to All  [S] Suspend  [?] Help (default is "Y"): Y

Key                  Label                          Policy     Running  Required
---                  -----                          ------     -------  --------
spherelet            spherelet                      on         False    False

> Get-VMHost wdc-10-r0xxxxxxxxxxxxxxxxxxxx | Get-VMHostService | where {$_.Key -eq "spherelet"}

Key                  Label                          Policy     Running  Required
---                  -----                          ------     -------  --------
spherelet            spherelet                      on         False    False

> Start-VMHostService -HostService $sphereletservice

Key                  Label                          Policy     Running  Required
---                  -----                          ------     -------  --------
spherelet            spherelet                      on         True     False

To restart spherelet service on all ESXi worker nodes of a cluster:

> Get-Cluster

Name                           HAEnabled  HAFailover DrsEnabled DrsAutomationLevel
                                          Level
----                           ---------  ---------- ---------- ------------------
wdc-10-vcxxc01                 True       1          True       FullyAutomated

> Get-Cluster -Name wdc-10-vcxxc01 | Get-VMHost | foreach { Restart-VMHostService -HostService ($_ | Get-VMHostService | where {$_.Key -eq "spherelet"}) }

After restarting the spherelet service, new pods will come up fine and be in Running status. But you may need to clean up all those pods with ProviderFailed status using kubectl.

kubectl get pods -A | grep ProviderFailed | awk '{print $2 " --namespace=" $1}' | xargs kubectl delete pod

Hope it was useful. Cheers!

Saturday, July 30, 2022

vSphere with Tanzu using NSX-T - Part17 - Troubleshooting TKCs stuck at updating phase

Ideally if everything goes well the TKCs (Tanzu Kubernetes Cluster aka Guest Cluster) should be in running phase. But sometimes due to several reasons it may be stuck at updating phase. In this article, we will take a sample case and look at troubleshooting/ fixing it.

Following is an example:

NAMESPACE              NAME                    PHASE      CREATIONTIME           VERSION                           CP    WORKER
karvea-vc17ns11        sc201vc17pace           updating   2021-11-19T12:17:24Z   v1.20.9+vmware.1-tkg.1.a4cee5b    1     4

Lets connect to this TKC. Here I have a small plugin (kubectl-gckc) that generates the TKC kubeconfig and gcc is alias to KUBECONFIG=gckubeconfig, where gckubeconfig is the TKC admin kubeconfig file.

❯ k gckc karvea-vc17ns11 sc201vc17pace
❯ gcc kg no
NAME                                           STATUS                     ROLES                  AGE    VERSION
sc201vc17pace-control-plane-zt99l              Ready                      control-plane,master   139d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-5c7d9548f-pn6vz    Ready,SchedulingDisabled   <none>                 189d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-5c7d9548f-w64lw    Ready,SchedulingDisabled   <none>                 189d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-85c4844f6c-gxmtt   Ready                      <none>                 139d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-85c4844f6c-j4wvp   Ready                      <none>                 139d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-85c4844f6c-l2dq5   Ready                      <none>                 139d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-85c4844f6c-xqlkv   Ready                      <none>                 139d   v1.20.9+vmware.1

❯ kg vm -n karvea-vc17ns11
NAME                                           POWERSTATE   AGE
sc201vc17pace-control-plane-zt99l              poweredOn    139d
sc201vc17pace-workers-jrcb6-5c7d9548f-pn6vz    poweredOn    189d
sc201vc17pace-workers-jrcb6-5c7d9548f-w64lw    poweredOn    189d
sc201vc17pace-workers-jrcb6-85c4844f6c-gxmtt   poweredOn    139d
sc201vc17pace-workers-jrcb6-85c4844f6c-j4wvp   poweredOn    139d
sc201vc17pace-workers-jrcb6-85c4844f6c-l2dq5   poweredOn    139d
sc201vc17pace-workers-jrcb6-85c4844f6c-xqlkv   poweredOn    139d



❯ kg machine -n karvea-vc17ns11
NAME                                           CLUSTER         NODENAME                                       PROVIDERID                                       PHASE      AGE    VERSION
sc201vc17pace-control-plane-zt99l              sc201vc17pace   sc201vc17pace-control-plane-zt99l              vsphere://4201e660-3124-9aa5-4ec2-6fbc2ff3ecea   Running    139d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-5c7d9548f-pn6vz    sc201vc17pace   sc201vc17pace-workers-jrcb6-5c7d9548f-pn6vz    vsphere://42010982-8b25-ad7b-2a1d-bb949def4834   Deleting   189d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-5c7d9548f-w64lw    sc201vc17pace   sc201vc17pace-workers-jrcb6-5c7d9548f-w64lw    vsphere://4201a640-2b39-3d66-5a26-db95a612f6e5   Deleting   189d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-85c4844f6c-gxmtt   sc201vc17pace   sc201vc17pace-workers-jrcb6-85c4844f6c-gxmtt   vsphere://42013a9b-dffb-4609-89d6-4ca123c4dc1e   Running    139d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-85c4844f6c-j4wvp   sc201vc17pace   sc201vc17pace-workers-jrcb6-85c4844f6c-j4wvp   vsphere://4201160b-21c9-ccc2-6826-e3545e34b490   Running    139d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-85c4844f6c-l2dq5   sc201vc17pace   sc201vc17pace-workers-jrcb6-85c4844f6c-l2dq5   vsphere://420125a8-e45c-04b7-5612-ce3149e86d74   Running    139d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-85c4844f6c-xqlkv   sc201vc17pace   sc201vc17pace-workers-jrcb6-85c4844f6c-xqlkv   vsphere://4201238f-c9a3-a9b2-9c31-4ed99318bd30   Running    139d   v1.20.9+vmware.1

As you can see above, there are two worker machines that are stuck at Deleting phase. It is because the corresponding two worker nodes are at Ready, SchedulingDisabled status. The nodes are not drained yet due to some reason. Once they get drained properly, its status will be changed to NotReady, SchedulingDisabled. Now lets try to drain those worker nodes manually.

❯ gcc k drain sc201vc17pace-workers-jrcb6-5c7d9548f-pn6vz
node/sc201vc17pace-workers-jrcb6-5c7d9548f-pn6vz already cordoned
DEPRECATED WARNING: Aborting the drain command in a list of nodes will be deprecated in v1.23.
The new behavior will make the drain command go through all nodes even if one or more nodes failed during the drain.
For now, users can try such experience via: --ignore-errors
error: unable to drain node "sc201vc17pace-workers-jrcb6-5c7d9548f-pn6vz", aborting command...

There are pending nodes to be drained:
 sc201vc17pace-workers-jrcb6-5c7d9548f-pn6vz
cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-system/calico-node-wqlmq, kube-system/kube-proxy-78z5k, nsxi-platform/nsxi-platform-fluent-bit-pdzjx, projectcontour/projectcontour-envoy-r9pg7, vmware-system-csi/vsphere-csi-node-p2gtd
cannot delete Pods with local storage (use --delete-emptydir-data to override): nsxi-platform/kafka-2
❯
❯ gcc k drain sc201vc17pace-workers-jrcb6-5c7d9548f-pn6vz --ignore-daemonsets --delete-emptydir-data
node/sc201vc17pace-workers-jrcb6-5c7d9548f-pn6vz already cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-wqlmq, kube-system/kube-proxy-78z5k, nsxi-platform/nsxi-platform-fluent-bit-pdzjx, projectcontour/projectcontour-envoy-r9pg7, vmware-system-csi/vsphere-csi-node-p2gtd
evicting pod nsxi-platform/kafka-2
error when evicting pods/"kafka-2" -n "nsxi-platform" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod nsxi-platform/kafka-2
error when evicting pods/"kafka-2" -n "nsxi-platform" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
^C
❯ gcc kg pdb
No resources found in default namespace.
❯ gcc kg pdb -A
NAMESPACE       NAME        MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
nsxi-platform   kafka       N/A             1                 0                     188d
nsxi-platform   zookeeper   N/A             1                 1                     188d

Here this worker node sc201vc17pace-workers-jrcb6-5c7d9548f-pn6vz is not getting drained because of the presence of a pod disruption budget (pdb). So, in-order to drain the node, I am taking a back up of the pdb yaml file and delete it. And once the nodes are drained, I will apply the pdb yaml back on to the cluster.

❯ gcc kg pdb -n nsxi-platform kafka -oyaml > pdb-nsxi-platform-kafka.yaml
❯ code pdb-nsxi-platform-kafka.yaml
❯ gcc kg pdb -n nsxi-platform zookeeper -oyaml > pdb-nsxi-platform-zookeeper.yaml
❯ code pdb-nsxi-platform-zookeeper.yaml

❯ gcc k delete pdb kafka -n nsxi-platform
poddisruptionbudget.policy "kafka" deleted
❯ gcc kg pdb -A
NAMESPACE       NAME        MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
nsxi-platform   zookeeper   N/A             1                 1                     188d
❯
❯ gcc k drain sc201vc17pace-workers-jrcb6-5c7d9548f-pn6vz --ignore-daemonsets --delete-emptydir-data
node/sc201vc17pace-workers-jrcb6-5c7d9548f-pn6vz already cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-wqlmq, kube-system/kube-proxy-78z5k, nsxi-platform/nsxi-platform-fluent-bit-pdzjx, projectcontour/projectcontour-envoy-r9pg7, vmware-system-csi/vsphere-csi-node-p2gtd
evicting pod nsxi-platform/kafka-2
pod/kafka-2 evicted
node/sc201vc17pace-workers-jrcb6-5c7d9548f-pn6vz evicted
❯

❯ gcc k drain sc201vc17pace-workers-jrcb6-5c7d9548f-pn6vz --ignore-daemonsets --delete-emptydir-data
node/sc201vc17pace-workers-jrcb6-5c7d9548f-pn6vz already cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-wqlmq, kube-system/kube-proxy-78z5k, nsxi-platform/nsxi-platform-fluent-bit-pdzjx, projectcontour/projectcontour-envoy-r9pg7, vmware-system-csi/vsphere-csi-node-p2gtd
node/sc201vc17pace-workers-jrcb6-5c7d9548f-pn6vz drained


❯ gcc k drain sc201vc17pace-workers-jrcb6-5c7d9548f-w64lw
node/sc201vc17pace-workers-jrcb6-5c7d9548f-w64lw already cordoned
DEPRECATED WARNING: Aborting the drain command in a list of nodes will be deprecated in v1.23.
The new behavior will make the drain command go through all nodes even if one or more nodes failed during the drain.
For now, users can try such experience via: --ignore-errors
error: unable to drain node "sc201vc17pace-workers-jrcb6-5c7d9548f-w64lw", aborting command...

There are pending nodes to be drained:
 sc201vc17pace-workers-jrcb6-5c7d9548f-w64lw
error: cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-system/calico-node-4tz4x, kube-system/kube-proxy-q726d, nsxi-platform/nsxi-platform-fluent-bit-b24nn, projectcontour/projectcontour-envoy-rppkx, vmware-system-csi/vsphere-csi-node-mpbsh
❯ gcc k drain sc201vc17pace-workers-jrcb6-5c7d9548f-w64lw --ignore-daemonsets
node/sc201vc17pace-workers-jrcb6-5c7d9548f-w64lw already cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-4tz4x, kube-system/kube-proxy-q726d, nsxi-platform/nsxi-platform-fluent-bit-b24nn, projectcontour/projectcontour-envoy-rppkx, vmware-system-csi/vsphere-csi-node-mpbsh
node/sc201vc17pace-workers-jrcb6-5c7d9548f-w64lw drained

The worker nodes are now drained.

❯ gcc kg no
NAME                                           STATUS                        ROLES                  AGE    VERSION
sc201vc17pace-control-plane-zt99l              Ready                         control-plane,master   139d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-5c7d9548f-pn6vz    NotReady,SchedulingDisabled   <none>                 189d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-5c7d9548f-w64lw    NotReady,SchedulingDisabled   <none>                 189d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-85c4844f6c-gxmtt   Ready                         <none>                 139d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-85c4844f6c-j4wvp   Ready                         <none>                 139d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-85c4844f6c-l2dq5   Ready                         <none>                 139d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-85c4844f6c-xqlkv   Ready                         <none>                 139d   v1.20.9+vmware.1

❯ gcc kg no
NAME                                           STATUS                        ROLES                  AGE    VERSION
sc201vc17pace-control-plane-zt99l              Ready                         control-plane,master   139d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-5c7d9548f-w64lw    NotReady,SchedulingDisabled   <none>                 189d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-85c4844f6c-gxmtt   Ready                         <none>                 139d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-85c4844f6c-j4wvp   Ready                         <none>                 139d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-85c4844f6c-l2dq5   Ready                         <none>                 139d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-85c4844f6c-xqlkv   Ready                         <none>                 139d   v1.20.9+vmware.1

As soon as the worker nodes are drained, one of them got successfully removed/ deleted, but the other worker node is still present. When we look at the machine resource, you can still see one of the worker machine is still stuck at Deleting phase. In this case I've manually deleted the worker node, still the corresponding worker machine is stuck at Deleting phase.

❯ kg machine -n karvea-vc17ns11
NAME                                           CLUSTER         NODENAME                                       PROVIDERID                                       PHASE      AGE    VERSION
sc201vc17pace-control-plane-zt99l              sc201vc17pace   sc201vc17pace-control-plane-zt99l              vsphere://4201e660-3124-9aa5-4ec2-6fbc2ff3ecea   Running    139d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-5c7d9548f-w64lw    sc201vc17pace   sc201vc17pace-workers-jrcb6-5c7d9548f-w64lw    vsphere://4201a640-2b39-3d66-5a26-db95a612f6e5   Deleting   189d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-85c4844f6c-gxmtt   sc201vc17pace   sc201vc17pace-workers-jrcb6-85c4844f6c-gxmtt   vsphere://42013a9b-dffb-4609-89d6-4ca123c4dc1e   Running    139d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-85c4844f6c-j4wvp   sc201vc17pace   sc201vc17pace-workers-jrcb6-85c4844f6c-j4wvp   vsphere://4201160b-21c9-ccc2-6826-e3545e34b490   Running    139d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-85c4844f6c-l2dq5   sc201vc17pace   sc201vc17pace-workers-jrcb6-85c4844f6c-l2dq5   vsphere://420125a8-e45c-04b7-5612-ce3149e86d74   Running    139d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-85c4844f6c-xqlkv   sc201vc17pace   sc201vc17pace-workers-jrcb6-85c4844f6c-xqlkv   vsphere://4201238f-c9a3-a9b2-9c31-4ed99318bd30   Running    139d   v1.20.9+vmware.1


❯ gcc k delete node sc201vc17pace-workers-jrcb6-5c7d9548f-w64lw
node "sc201vc17pace-workers-jrcb6-5c7d9548f-w64lw" deleted
❯
❯ gcc kg no
NAME                                           STATUS   ROLES                  AGE    VERSION
sc201vc17pace-control-plane-zt99l              Ready    control-plane,master   139d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-85c4844f6c-gxmtt   Ready    <none>                 139d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-85c4844f6c-j4wvp   Ready    <none>                 139d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-85c4844f6c-l2dq5   Ready    <none>                 139d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-85c4844f6c-xqlkv   Ready    <none>                 139d   v1.20.9+vmware.1

Now lets describe the worker machine stuck at Deleting. In this case you can see that there are two PVCs stuck at Terminating status. So I just edited those two PVCs yaml and set finalizer to null.

❯ kg machine -n karvea-vc17ns11
NAME                                           CLUSTER         NODENAME                                       PROVIDERID                                       PHASE      AGE    VERSION
sc201vc17pace-control-plane-zt99l              sc201vc17pace   sc201vc17pace-control-plane-zt99l              vsphere://4201e660-3124-9aa5-4ec2-6fbc2ff3ecea   Running    139d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-5c7d9548f-w64lw    sc201vc17pace   sc201vc17pace-workers-jrcb6-5c7d9548f-w64lw    vsphere://4201a640-2b39-3d66-5a26-db95a612f6e5   Deleting   189d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-85c4844f6c-gxmtt   sc201vc17pace   sc201vc17pace-workers-jrcb6-85c4844f6c-gxmtt   vsphere://42013a9b-dffb-4609-89d6-4ca123c4dc1e   Running    139d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-85c4844f6c-j4wvp   sc201vc17pace   sc201vc17pace-workers-jrcb6-85c4844f6c-j4wvp   vsphere://4201160b-21c9-ccc2-6826-e3545e34b490   Running    139d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-85c4844f6c-l2dq5   sc201vc17pace   sc201vc17pace-workers-jrcb6-85c4844f6c-l2dq5   vsphere://420125a8-e45c-04b7-5612-ce3149e86d74   Running    139d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-85c4844f6c-xqlkv   sc201vc17pace   sc201vc17pace-workers-jrcb6-85c4844f6c-xqlkv   vsphere://4201238f-c9a3-a9b2-9c31-4ed99318bd30   Running    139d   v1.20.9+vmware.1



❯ kg vm -n karvea-vc17ns11
NAME                                           POWERSTATE   AGE
sc201vc17pace-control-plane-zt99l              poweredOn    139d
sc201vc17pace-workers-jrcb6-85c4844f6c-gxmtt   poweredOn    139d
sc201vc17pace-workers-jrcb6-85c4844f6c-j4wvp   poweredOn    139d
sc201vc17pace-workers-jrcb6-85c4844f6c-l2dq5   poweredOn    139d
sc201vc17pace-workers-jrcb6-85c4844f6c-xqlkv   poweredOn    139d


❯ kd machine sc201vc17pace-workers-jrcb6-5c7d9548f-w64lw -n karvea-vc17ns11

Events:
  Type    Reason                  Age                   From                           Message
  ----    ------                  ----                  ----                           -------
  Normal  DetectedUnhealthy       13m (x2 over 17m)     machinehealthcheck-controller  Machine karvea-vc17ns11/sc201vc17pace-workers-jrcb6/sc201vc17pace-workers-jrcb6-5c7d9548f-w64lw/sc201vc17pace-workers-jrcb6-5c7d9548f-w64lw has unhealthy node sc201vc17pace-workers-jrcb6-5c7d9548f-w64lw
  Normal  SuccessfulDrainNode     13m (x2 over 19m)     machine-controller             success draining Machine's node "sc201vc17pace-workers-jrcb6-5c7d9548f-w64lw"
  Normal  NodeVolumesDetached     12m (x2 over 19m)     machine-controller             success waiting for node volumes detach Machine's node "sc201vc17pace-workers-jrcb6-5c7d9548f-w64lw"
  Normal  MachineMarkedUnhealthy  106s (x4 over 9m58s)  machinehealthcheck-controller  Machine karvea-vc17ns11/sc201vc17pace-workers-jrcb6/sc201vc17pace-workers-jrcb6-5c7d9548f-w64lw/sc201vc17pace-workers-jrcb6-5c7d9548f-w64lw has been marked as unhealthy
❯
❯ kg pvc -n karvea-vc17ns11
NAME                                                                        STATUS        VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS              AGE
a366a76b-2000-4d33-a817-a9c1b9e60b1b-1f4b5ee8-f378-445e-97d3-f4c4656863bb   Bound         pvc-1dc35d76-86c6-4a70-82e7-99609480a0b3   10Gi       RWO            sc2-01-vc17c01-wcp-mgmt   188d
a366a76b-2000-4d33-a817-a9c1b9e60b1b-3509d39d-e632-492b-a0c4-b5b3874b01a6   Bound         pvc-97e6e063-9a9e-4837-9999-284523379453   128Gi      RWO            sc2-01-vc17c01-wcp-mgmt   188d
a366a76b-2000-4d33-a817-a9c1b9e60b1b-42a0f98e-0f9c-4fc1-bc9f-862e94086624   Bound         pvc-be6bd318-140c-4cb8-9c22-daf9ec8dac65   128Gi      RWO            sc2-01-vc17c01-wcp-mgmt   188d
a366a76b-2000-4d33-a817-a9c1b9e60b1b-48b9ddc4-41bc-4228-a6b5-0aea3a470811   Bound         pvc-faa7798e-c045-420f-9d09-44674d9d2326   20Gi       RWO            sc2-01-vc17c01-wcp-mgmt   188d
a366a76b-2000-4d33-a817-a9c1b9e60b1b-8c880e33-681a-4eae-a57d-3aaf0fb9c950   Bound         pvc-cf1a6c2e-0e9e-425c-ae46-b010b086c325   10Gi       RWO            sc2-01-vc17c01-wcp-mgmt   188d
a366a76b-2000-4d33-a817-a9c1b9e60b1b-aa196378-d10f-45ed-a528-b0d691ec6447   Bound         pvc-49fca2f0-3402-429f-884f-7db9012934d6   8Gi        RWO            sc2-01-vc17c01-wcp-mgmt   188d
a366a76b-2000-4d33-a817-a9c1b9e60b1b-bbe074ee-9ba3-4839-b519-af82214a9ad0   Bound         pvc-3887e89c-0a5b-4d08-938b-c9cb0a1efaca   8Gi        RWO            sc2-01-vc17c01-wcp-mgmt   188d
a366a76b-2000-4d33-a817-a9c1b9e60b1b-bfb23073-29e8-4f0d-b2c0-934ff808ad2c   Bound         pvc-f966f803-ca92-45b6-9395-8d1d24c67f8e   10Gi       RWO            sc2-01-vc17c01-wcp-mgmt   188d
a366a76b-2000-4d33-a817-a9c1b9e60b1b-d39e8f9b-692e-46ac-a52c-2d977f0a95fa   Bound         pvc-25d7c8c2-7994-4ee8-9ef8-725ae1c8c8a1   8Gi        RWO            sc2-01-vc17c01-wcp-mgmt   188d
a366a76b-2000-4d33-a817-a9c1b9e60b1b-ef1e2362-83bc-4af4-b748-a496aa911009   Bound         pvc-7aefd3fe-3279-4e20-8a00-5ca60cc61e40   128Gi      RWO            sc2-01-vc17c01-wcp-mgmt   188d
a366a76b-2000-4d33-a817-a9c1b9e60b1b-f072ee1b-034a-4ac8-965c-f66a2d8bd61c   Bound         pvc-276acbee-ba6c-4cc9-8bc5-e18525abd256   20Gi       RWO            sc2-01-vc17c01-wcp-mgmt   188d
sc201vc17pace-workers-wswdh-2hz8w-containerd                                Bound         pvc-e67e3a6f-99d6-4e21-813d-e9c9994b25d6   42Gi       RWO            sc2-01-vc17c01-wcp-mgmt   139d
sc201vc17pace-workers-wswdh-5pjrc-containerd                                Bound         pvc-fb162388-4347-4f48-825e-c2c2d62ceb90   42Gi       RWO            sc2-01-vc17c01-wcp-mgmt   139d
sc201vc17pace-workers-wswdh-755m6-containerd                                Terminating   pvc-da2e4866-bb41-4f74-a4b7-0f74bc7061a1   42Gi       RWO            sc2-01-vc17c01-wcp-mgmt   189d
sc201vc17pace-workers-wswdh-dgmjs-containerd                                Terminating   pvc-64eac528-f160-444c-9a0f-0ed9f6393e06   42Gi       RWO            sc2-01-vc17c01-wcp-mgmt   189d
sc201vc17pace-workers-wswdh-djp2m-containerd                                Bound         pvc-a7542552-de13-4670-ac45-84ed39c3c916   42Gi       RWO            sc2-01-vc17c01-wcp-mgmt   139d
sc201vc17pace-workers-wswdh-flwtt-containerd                                Bound         pvc-1b8ee843-709a-4e2a-955d-a9a9a6a83c73   42Gi       RWO            sc2-01-vc17c01-wcp-mgmt   139d
❯

As soon as the PVCs are removed, you can see the worker machine that was stuck at Deleting got removed, and the TKC chaged its status to running.

❯ kg pvc -n karvea-vc17ns11
NAME                                                                        STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS              AGE
a366a76b-2000-4d33-a817-a9c1b9e60b1b-1f4b5ee8-f378-445e-97d3-f4c4656863bb   Bound    pvc-1dc35d76-86c6-4a70-82e7-99609480a0b3   10Gi       RWO            sc2-01-vc17c01-wcp-mgmt   188d
a366a76b-2000-4d33-a817-a9c1b9e60b1b-3509d39d-e632-492b-a0c4-b5b3874b01a6   Bound    pvc-97e6e063-9a9e-4837-9999-284523379453   128Gi      RWO            sc2-01-vc17c01-wcp-mgmt   188d
a366a76b-2000-4d33-a817-a9c1b9e60b1b-42a0f98e-0f9c-4fc1-bc9f-862e94086624   Bound    pvc-be6bd318-140c-4cb8-9c22-daf9ec8dac65   128Gi      RWO            sc2-01-vc17c01-wcp-mgmt   188d
a366a76b-2000-4d33-a817-a9c1b9e60b1b-48b9ddc4-41bc-4228-a6b5-0aea3a470811   Bound    pvc-faa7798e-c045-420f-9d09-44674d9d2326   20Gi       RWO            sc2-01-vc17c01-wcp-mgmt   188d
a366a76b-2000-4d33-a817-a9c1b9e60b1b-8c880e33-681a-4eae-a57d-3aaf0fb9c950   Bound    pvc-cf1a6c2e-0e9e-425c-ae46-b010b086c325   10Gi       RWO            sc2-01-vc17c01-wcp-mgmt   188d
a366a76b-2000-4d33-a817-a9c1b9e60b1b-aa196378-d10f-45ed-a528-b0d691ec6447   Bound    pvc-49fca2f0-3402-429f-884f-7db9012934d6   8Gi        RWO            sc2-01-vc17c01-wcp-mgmt   188d
a366a76b-2000-4d33-a817-a9c1b9e60b1b-bbe074ee-9ba3-4839-b519-af82214a9ad0   Bound    pvc-3887e89c-0a5b-4d08-938b-c9cb0a1efaca   8Gi        RWO            sc2-01-vc17c01-wcp-mgmt   188d
a366a76b-2000-4d33-a817-a9c1b9e60b1b-bfb23073-29e8-4f0d-b2c0-934ff808ad2c   Bound    pvc-f966f803-ca92-45b6-9395-8d1d24c67f8e   10Gi       RWO            sc2-01-vc17c01-wcp-mgmt   188d
a366a76b-2000-4d33-a817-a9c1b9e60b1b-d39e8f9b-692e-46ac-a52c-2d977f0a95fa   Bound    pvc-25d7c8c2-7994-4ee8-9ef8-725ae1c8c8a1   8Gi        RWO            sc2-01-vc17c01-wcp-mgmt   188d
a366a76b-2000-4d33-a817-a9c1b9e60b1b-ef1e2362-83bc-4af4-b748-a496aa911009   Bound    pvc-7aefd3fe-3279-4e20-8a00-5ca60cc61e40   128Gi      RWO            sc2-01-vc17c01-wcp-mgmt   188d
a366a76b-2000-4d33-a817-a9c1b9e60b1b-f072ee1b-034a-4ac8-965c-f66a2d8bd61c   Bound    pvc-276acbee-ba6c-4cc9-8bc5-e18525abd256   20Gi       RWO            sc2-01-vc17c01-wcp-mgmt   188d
sc201vc17pace-workers-wswdh-2hz8w-containerd                                Bound    pvc-e67e3a6f-99d6-4e21-813d-e9c9994b25d6   42Gi       RWO            sc2-01-vc17c01-wcp-mgmt   139d
sc201vc17pace-workers-wswdh-5pjrc-containerd                                Bound    pvc-fb162388-4347-4f48-825e-c2c2d62ceb90   42Gi       RWO            sc2-01-vc17c01-wcp-mgmt   139d
sc201vc17pace-workers-wswdh-djp2m-containerd                                Bound    pvc-a7542552-de13-4670-ac45-84ed39c3c916   42Gi       RWO            sc2-01-vc17c01-wcp-mgmt   139d
sc201vc17pace-workers-wswdh-flwtt-containerd                                Bound    pvc-1b8ee843-709a-4e2a-955d-a9a9a6a83c73   42Gi       RWO            sc2-01-vc17c01-wcp-mgmt   139d

❯ kg machine -n karvea-vc17ns11
NAME                                           CLUSTER         NODENAME                                       PROVIDERID                                       PHASE     AGE    VERSION
sc201vc17pace-control-plane-zt99l              sc201vc17pace   sc201vc17pace-control-plane-zt99l              vsphere://4201e660-3124-9aa5-4ec2-6fbc2ff3ecea   Running   139d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-85c4844f6c-gxmtt   sc201vc17pace   sc201vc17pace-workers-jrcb6-85c4844f6c-gxmtt   vsphere://42013a9b-dffb-4609-89d6-4ca123c4dc1e   Running   139d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-85c4844f6c-j4wvp   sc201vc17pace   sc201vc17pace-workers-jrcb6-85c4844f6c-j4wvp   vsphere://4201160b-21c9-ccc2-6826-e3545e34b490   Running   139d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-85c4844f6c-l2dq5   sc201vc17pace   sc201vc17pace-workers-jrcb6-85c4844f6c-l2dq5   vsphere://420125a8-e45c-04b7-5612-ce3149e86d74   Running   139d   v1.20.9+vmware.1
sc201vc17pace-workers-jrcb6-85c4844f6c-xqlkv   sc201vc17pace   sc201vc17pace-workers-jrcb6-85c4844f6c-xqlkv   vsphere://4201238f-c9a3-a9b2-9c31-4ed99318bd30   Running   139d   v1.20.9+vmware.1

❯ kgtkca | grep karvea
karvea-vc17ns11                             sc201vc17pace           running    2021-11-19T12:17:24Z   v1.20.9+vmware.1-tkg.1.a4cee5b    1     4

Note: The above case is a sample scenario and the reasons why the TKC is stuck at updating may vary based on several conditions. This is a generic method one can follow while approaching these kind of issues.

Hope it was useful. Cheers!

Friday, April 8, 2022

Working with Kubernetes using Python - Part 03 - Get nodes

Following code snipet uses kubeconfig python module to switch context and Python client for the kubernetes API to get cluster node details. It takes the default kubeconfig file, and switch to the required context, and get node info of the respective cluster.

kubectl commands:

kubectl config get-contexts
kubectl config current-context
kubectl config use-context <context_name>
kubectl get nodes -o json

Code:

Reference:

https://kubeconfig-python.readthedocs.io/en/latest/
https://github.com/kubernetes-client/python

Hope it was useful. Cheers!

Pages

Wednesday, June 26, 2024

vSphere with Tanzu using NSX-T - Part33 - Troubleshooting intermittent connection timeouts to apiserver and workloads

Case 1: TKC Control Plane Node Connectivity Issues

Case 2: TKC Worker Node Connectivity Issues

Case 3: Load Balancer Connectivity Issues

Resolution/ work around

Saturday, May 20, 2023

vSphere with Tanzu using NSX-T - Part25 - Spherelet

Logs

Status

Restart

Certificates

Verify

Saturday, August 13, 2022

vSphere with Tanzu using NSX-T - Part18 - Troubleshooting vSphere pods with ProviderFailed status

Saturday, July 30, 2022

vSphere with Tanzu using NSX-T - Part17 - Troubleshooting TKCs stuck at updating phase

Friday, April 8, 2022

Working with Kubernetes using Python - Part 03 - Get nodes