Following are some of the kubectl plugins that I use on a daily basis:
Saturday, June 17, 2023
Kubernetes 101 - Part10 - Plugins I use for managing K8s clusters
Friday, June 9, 2023
vSphere with Tanzu using NSX-T - Part26 - Jumpbox kubectl plugin to SSH to TKC node
For troubleshooting TKC (Tanzu Kubernetes Cluster) you may need to ssh into the TKC nodes. For doing ssh, you will need to first create a jumpbox pod under the supervisor namespace and from there you can ssh to the TKC nodes.
Here is the manual procedure: https://docs.vmware.com/en/VMware-vSphere/7.0/vmware-vsphere-with-tanzu/GUID-587E2181-199A-422A-ABBC-0A9456A70074.html
Following kubectl plugin creats a jumpbox pod under a supervisor namespace. You can exec into this jumpbox pod to ssh into the TKC VMs.
kubectl-jumpbox
#!/bin/bash Help() { # Display Help echo "Description: This plugin creats a jumpbox pod under a supervisor namespace. You can exec into this jumpbox pod to ssh into the TKC VMs." echo "Usage: kubectl jumpbox SVNAMESPACE TKCNAME" echo "Example: k exec -it jumpbox-tkc1 -n svns1 -- /usr/bin/ssh vmware-system-user@VMIP" } # Get the options while getopts ":h" option; do case $option in h) # display Help Help exit;; \?) # incorrect option echo "Error: Invalid option" exit;; esac done kubectl create -f - <<EOF apiVersion: v1 kind: Pod metadata: name: jumpbox-$2 namespace: $1 #REPLACE spec: containers: - image: "photon:3.0" name: jumpbox command: [ "/bin/bash", "-c", "--" ] args: [ "yum install -y openssh-server; mkdir /root/.ssh; cp /root/ssh/ssh-privatekey /root/.ssh/id_rsa; chmod 600 /root/.ssh/id_rsa; while true; do sleep 30; done;" ] volumeMounts: - mountPath: "/root/ssh" name: ssh-key readOnly: true resources: requests: memory: 2Gi volumes: - name: ssh-key secret: secretName: $2-ssh #REPLACE YOUR-CLUSTER-NAME-ssh EOF
Usage
- Place the plugin in the system executable path.
- I placed it in $HOME/.krew/bin directory in my laptop.
- Once you copied the plugin to the proper path, you can make it executable by: chmod 755 kubectl-jumpbox
- After that you should be able to run the plugin as: kubectl jumpbox SUPERVISORNAMESPACE TKCNAME
Example
❯ kg tkc -n vineetha-dns1-test NAME CONTROL PLANE WORKER TKR NAME AGE READY TKR COMPATIBLE UPDATES AVAILABLE tkc 1 3 v1.21.6---vmware.1-tkg.1.b3d708a 213d True True [1.22.9+vmware.1-tkg.1.cc71bc8] tkc-using-cci-ui 1 1 v1.23.8---vmware.3-tkg.1 37d True True ❯ ❯ kg po -n vineetha-dns1-test NAME READY STATUS RESTARTS AGE nginx-test 1/1 Running 0 29d ❯ ❯ ❯ kubectl jumpbox vineetha-dns1-test tkc pod/jumpbox-tkc created ❯ ❯ kg po -n vineetha-dns1-test NAME READY STATUS RESTARTS AGE jumpbox-tkc 0/1 Pending 0 8s nginx-test 1/1 Running 0 29d ❯ ❯ kg po -n vineetha-dns1-test NAME READY STATUS RESTARTS AGE jumpbox-tkc 1/1 Running 0 21s nginx-test 1/1 Running 0 29d ❯ ❯ k jumpbox -h Description: This plugin creats a jumpbox pod under a supervisor namespace. You can exec into this jumpbox pod to ssh into the TKC VMs. Usage: kubectl jumpbox SVNAMESPACE TKCNAME Example: k exec -it jumpbox-tkc1 -n svns1 -- /usr/bin/ssh vmware-system-user@VMIP ❯ ❯ kg vm -n vineetha-dns1-test -o wide NAME POWERSTATE CLASS IMAGE PRIMARY-IP AGE tkc-control-plane-8rwpk poweredOn best-effort-small ob-18900476-photon-3-k8s-v1.21.6---vmware.1-tkg.1.b3d708a 172.29.0.7 133d tkc-using-cci-ui-control-plane-z8fkt poweredOn best-effort-small ob-20953521-tkgs-ova-photon-3-v1.23.8---vmware.3-tkg.1 172.29.13.130 37d tkc-using-cci-ui-tkg-cluster-nodepool-9nf6-n6nt5-b97c86fb45mvgj poweredOn best-effort-small ob-20953521-tkgs-ova-photon-3-v1.23.8---vmware.3-tkg.1 172.29.13.131 37d tkc-workers-zbrnv-6c98dd84f9-52gn6 poweredOn best-effort-small ob-18900476-photon-3-k8s-v1.21.6---vmware.1-tkg.1.b3d708a 172.29.0.6 133d tkc-workers-zbrnv-6c98dd84f9-d9mm7 poweredOn best-effort-small ob-18900476-photon-3-k8s-v1.21.6---vmware.1-tkg.1.b3d708a 172.29.0.8 133d tkc-workers-zbrnv-6c98dd84f9-kk2dg poweredOn best-effort-small ob-18900476-photon-3-k8s-v1.21.6---vmware.1-tkg.1.b3d708a 172.29.0.3 133d ❯ ❯ k exec -it jumpbox-tkc -n vineetha-dns1-test -- /usr/bin/ssh vmware-system-user@172.29.0.7 The authenticity of host '172.29.0.7 (172.29.0.7)' can't be established. ECDSA key fingerprint is SHA256:B7ptmYm617lFzLErJm7G5IdT7y4SJYKhX/OenSgguv8. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '172.29.0.7' (ECDSA) to the list of known hosts. Welcome to Photon 3.0 (\m) - Kernel \r (\l) 13:06:06 up 133 days, 4:46, 0 users, load average: 0.23, 0.33, 0.27 36 Security notice(s) Run 'tdnf updateinfo info' to see the details. vmware-system-user@tkc-control-plane-8rwpk [ ~ ]$ sudo su root [ /home/vmware-system-user ]# root [ /home/vmware-system-user ]#
Hope it was useful. Cheers!
Saturday, May 20, 2023
vSphere with Tanzu using NSX-T - Part25 - Spherelet
Following are the steps to verify the status of spherelet service, and restart them if required.
Example:
❯ kubectx wdc-01-vcxx Switched to context "wdc-01-vcxx". ❯ kubectl get node NAME STATUS ROLES AGE VERSION 42019f7e751b2818bb0c659028d49fdc Ready control-plane,master 317d v1.22.6+vmware.wcp.2 4201b0b21aed78d8e72bfb622bb8b98b Ready control-plane,master 317d v1.22.6+vmware.wcp.2 4201c53dcef2701a8c36463942d762dc Ready control-plane,master 317d v1.22.6+vmware.wcp.2 wdc-01-rxxesx04.xxxxxxxxx.com Ready agent 317d v1.22.6-sph-db56d46 wdc-01-rxxesx05.xxxxxxxxx.com NotReady,SchedulingDisabled agent 317d v1.22.6-sph-db56d46 wdc-01-rxxesx06.xxxxxxxxx.com Ready agent 317d v1.22.6-sph-db56d46 wdc-01-rxxesx32.xxxxxxxxx.com Ready agent 317d v1.22.6-sph-db56d46 wdc-01-rxxesx33.xxxxxxxxx.com Ready agent 317d v1.22.6-sph-db56d46 wdc-01-rxxesx34.xxxxxxxxx.com Ready agent 317d v1.22.6-sph-db56d46 wdc-01-rxxesx35.xxxxxxxxx.com Ready,SchedulingDisabled agent 317d v1.22.6-sph-db56d46 wdc-01-rxxesx36.xxxxxxxxx.com Ready agent 317d v1.22.6-sph-db56d46 wdc-01-rxxesx37.xxxxxxxxx.com Ready agent 317d v1.22.6-sph-db56d46 wdc-01-rxxesx38.xxxxxxxxx.com Ready agent 317d v1.22.6-sph-db56d46 wdc-01-rxxesx39.xxxxxxxxx.com NotReady,SchedulingDisabled agent 317d v1.22.6-sph-db56d46 wdc-01-rxxesx40.xxxxxxxxx.com Ready agent 317d v1.22.6-sph-db56d46
Logs
- ssh into the ESXi worker node.
tail -f /var/log/spherelet.log
Status
- ssh into the ESXi worker node and run the following:
etc/init.d/spherelet status
- You can check status of spherelet using PowerCLI. Following is an example:
> Connect-VIServer wdc-10-vcxx > Get-VMHost | Get-VMHostService | where {$_.Key -eq "spherelet"} | select VMHost,Key,Running | ft VMHost Key Running ------ --- ------- wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet True wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet True wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet True wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet True wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet True wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet True wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet True wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet True wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet True wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet True wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet True wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet True
Restart
- ssh into the ESXi worker node and run the following:
/etc/init.d/spherelet restart
- You can also restart spherelet service using PowerCLI. Following is an example to restart spherelet service on ALL the ESXi worker nodes of a cluster:
> Get-Cluster Name HAEnabled HAFailover DrsEnabled DrsAutomationLevel Level ---- --------- ---------- ---------- ------------------ wdc-10-vcxxc01 True 1 True FullyAutomated > Get-Cluster -Name wdc-10-vcxxc01 | Get-VMHost | foreach { Restart-VMHostService -HostService ($_ | Get-VMHostService | where {$_.Key -eq "spherelet"}) }
Certificates
- /etc/vmware/spherelet/spherelet.crt
- /etc/vmware/spherelet/client.crt
❯ kg no
NAME STATUS ROLES AGE VERSION
420802008ec0d8ccaa6ac84140768375 Ready control-plane,master 70d v1.22.6+vmware.wcp.2
42087a63440b500de6cec759bb5900bf Ready control-plane,master 77d v1.22.6+vmware.wcp.2
4208e08c826dfe283c726bc573109dbb Ready control-plane,master 77d v1.22.6+vmware.wcp.2
wdc-08-rxxesx25.xxxxxxxxx.com NotReady agent 370d v1.22.6-sph-db56d46
wdc-08-rxxesx26.xxxxxxxxx.com NotReady agent 370d v1.22.6-sph-db56d46
wdc-08-rxxesx23.xxxxxxxxx.com NotReady agent 370d v1.22.6-sph-db56d46
wdc-08-rxxesx24.xxxxxxxxx.com NotReady agent 370d v1.22.6-sph-db56d46
wdc-08-rxxesx25.xxxxxxxxx.com NotReady agent 370d v1.22.6-sph-db56d46
wdc-08-rxxesx26.xxxxxxxxx.com NotReady agent 370d v1.22.6-sph-db56d46
[root@wdc-08-rxxesx25:~] openssl x509 -enddate -noout -in /etc/vmware/spherelet/spherelet.crt
notAfter=Sep 1 08:32:24 2023 GMT
[root@wdc-08-rxxesx25:~] openssl x509 -enddate -noout -in /etc/vmware/spherelet/client.crt
notAfter=Sep 1 08:32:24 2023 GMT
Verify
❯ kubectl get node NAME STATUS ROLES AGE VERSION 42017dcb669bea2962da27fc2f6c16d2 Ready control-plane,master 5d20h v1.23.12+vmware.wcp.1 4201b763c766875b77bcb9f04f8840b3 Ready control-plane,master 5d21h v1.23.12+vmware.wcp.1 4201dab068e9b2d3af3b8fde450b3d96 Ready control-plane,master 5d20h v1.23.12+vmware.wcp.1 wdc-01-rxxesx04.xxxxxxxxx.com Ready agent 5d19h v1.23.5-sph-81ef5d1 wdc-01-rxxesx05.xxxxxxxxx.com Ready agent 5d19h v1.23.5-sph-81ef5d1 wdc-01-rxxesx06.xxxxxxxxx.com Ready agent 5d19h v1.23.5-sph-81ef5d1 wdc-01-rxxesx32.xxxxxxxxx.com Ready agent 5d19h v1.23.5-sph-81ef5d1 wdc-01-rxxesx33.xxxxxxxxx.com Ready agent 5d19h v1.23.5-sph-81ef5d1 wdc-01-rxxesx34.xxxxxxxxx.com Ready agent 5d19h v1.23.5-sph-81ef5d1 wdc-01-rxxesx35.xxxxxxxxx.com Ready agent 5d19h v1.23.5-sph-81ef5d1 wdc-01-rxxesx36.xxxxxxxxx.com Ready agent 5d19h v1.23.5-sph-81ef5d1 wdc-01-rxxesx37.xxxxxxxxx.com Ready agent 5d19h v1.23.5-sph-81ef5d1 wdc-01-rxxesx38.xxxxxxxxx.com Ready agent 5d19h v1.23.5-sph-81ef5d1 wdc-01-rxxesx39.xxxxxxxxx.com Ready agent 5d19h v1.23.5-sph-81ef5d1 wdc-01-rxxesx40.xxxxxxxxx.com Ready agent 5d19h v1.23.5-sph-81ef5d1
Sunday, May 7, 2023
Kubernetes 101 - Part9 - kubeconfig certificate expiration
You can verify the expiration date of kubeconfig in the current context as follows:
kubectl config view --minify --raw --output 'jsonpath={..user.client-certificate-data}' | base64 -d | openssl x509 -noout -enddate
❯ k config current-context
sc2-01-vcxx
❯
❯ kubectl config view --minify --raw --output 'jsonpath={..user.client-certificate-data}' | base64 -d | openssl x509 -noout -enddate
notAfter=Sep 6 05:13:47 2023 GMT
❯
❯ date
Thu Sep 7 18:05:52 IST 2023
❯
Hope it was useful. Cheers!
Saturday, April 15, 2023
Kubernetes 101 - Part8 - Filter events of a specific object
You can filter events of a specific object as follows:
k get event --field-selector involvedObject.name=<object name> -n <namespace>
➜ k get pods NAME READY STATUS RESTARTS AGE new-replica-set-rx7vk 0/1 ImagePullBackOff 0 101s new-replica-set-gsxxx 0/1 ImagePullBackOff 0 101s new-replica-set-j6xcp 0/1 ImagePullBackOff 0 101s new-replica-set-q8jz5 0/1 ErrImagePull 0 101s ➜ k get event --field-selector involvedObject.name=new-replica-set-q8jz5 -n default LAST SEEN TYPE REASON OBJECT MESSAGE 3m53s Normal Scheduled pod/new-replica-set-q8jz5 Successfully assigned default/new-replica-set-q8jz5 to controlplane 2m33s Normal Pulling pod/new-replica-set-q8jz5 Pulling image "busybox777" 2m33s Warning Failed pod/new-replica-set-q8jz5 Failed to pull image "busybox777": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/busybox777:latest": failed to resolve reference "docker.io/library/busybox777:latest": pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed 2m33s Warning Failed pod/new-replica-set-q8jz5 Error: ErrImagePull 2m3s Warning Failed pod/new-replica-set-q8jz5 Error: ImagePullBackOff 110s Normal BackOff pod/new-replica-set-q8jz5 Back-off pulling image "busybox777"
Hope it was useful. Cheers!
Saturday, April 8, 2023
vSphere with Tanzu using NSX-T - Part24 - Kubernetes component certs in TKC
The Kubernetes component certificates inside a TKC (Tanzu Kubernetes Cluster) has lifetime of 1 year. If you manage to upgrade your TKC atleast once a year, these certs will get rotated automatically.
IMPORTANT NOTES:
- As per this VMware KB, if TKGS Guest Cluster certificates are expired, you will need to engage VMware support to manually rotate them.
- Following troubleshooting steps and workaround are based on studies conducted on my dev/ test/ lab setup, and I will NOT recommend anyone to follow these on your production environment.
Symptom:
❯ KUBECONFIG=tkc.kubeconfig kubectl get nodes
Unable to connect to the server: x509: certificate has expired or is not yet valid
Troubleshooting:
- Verify the certificate expiry of the tkc kubeconfig file itself.
❯ grep client-certificate-data tkc.kubeconfig | awk '{print $2}' | base64 -d | openssl x509 -noout -dates notBefore=Mar 8 18:10:15 2022 GMT notAfter=Mar 7 18:26:10 2024 GMT
- Create a jumpbox pod and ssh to TKC control plane nodes.
- Verify system pods and check logs from apiserver and etcd pods. Sample etcd pod logs are given below:
2023-04-11 07:09:00.268792 W | rafthttp: health check for peer b5bab7da6e326a7c could not connect: x509: certificate has expired or is not yet valid: current time 2023-04-11T07:08:57Z is after 2023-04-06T06:17:56Z 2023-04-11 07:09:00.268835 W | rafthttp: health check for peer b5bab7da6e326a7c could not connect: x509: certificate has expired or is not yet valid: current time 2023-04-11T07:08:57Z is after 2023-04-06T06:17:56Z 2023-04-11 07:09:00.268841 W | rafthttp: health check for peer 19b6b0bf00e81f0b could not connect: remote error: tls: bad certificate 2023-04-11 07:09:00.268869 W | rafthttp: health check for peer 19b6b0bf00e81f0b could not connect: remote error: tls: bad certificate 2023-04-11 07:09:00.310030 I | embed: rejected connection from "172.31.20.27:35362" (error "remote error: tls: bad certificate", ServerName "") 2023-04-11 07:09:00.312806 I | embed: rejected connection from "172.31.20.27:35366" (error "remote error: tls: bad certificate", ServerName "") 2023-04-11 07:09:00.321449 I | embed: rejected connection from "172.31.20.19:35034" (error "remote error: tls: bad certificate", ServerName "") 2023-04-11 07:09:00.322192 I | embed: rejected connection from "172.31.20.19:35036" (error "remote error: tls: bad certificate", ServerName "")
- Verify whether admin.conf inside the control plane node has expired.
root [ /etc/kubernetes ]# grep client-certificate-data admin.conf | awk '{print $2}' | base64 -d | openssl x509 -noout -dates notBefore=Mar 8 18:10:15 2022 GMT notAfter=Apr 6 06:05:46 2023 GMT
- Verify Kubernetes component certs in all the control plane nodes.
root [ /etc/kubernetes ]# kubeadm certs check-expiration [check-expiration] Reading configuration from the cluster... [check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' [check-expiration] Error reading configuration from the Cluster. Falling back to default configuration CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED admin.conf Apr 06, 2023 06:05 UTC <invalid> no apiserver Apr 06, 2023 06:05 UTC <invalid> ca no apiserver-etcd-client Apr 06, 2023 06:05 UTC <invalid> etcd-ca no apiserver-kubelet-client Apr 06, 2023 06:05 UTC <invalid> ca no controller-manager.conf Apr 06, 2023 06:05 UTC <invalid> no etcd-healthcheck-client Apr 06, 2023 06:05 UTC <invalid> etcd-ca no etcd-peer Apr 06, 2023 06:05 UTC <invalid> etcd-ca no etcd-server Apr 06, 2023 06:05 UTC <invalid> etcd-ca no front-proxy-client Apr 06, 2023 06:05 UTC <invalid> front-proxy-ca no scheduler.conf Apr 06, 2023 06:05 UTC <invalid> no CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED ca Mar 05, 2032 18:15 UTC 8y no etcd-ca Mar 05, 2032 18:15 UTC 8y no front-proxy-ca Mar 05, 2032 18:15 UTC 8y no
Workaround:
- Renew Kubernetes component certs on control plane nodes if expired using
kubeadm certs renew all
.
root [ /etc/kubernetes ]# kubeadm certs renew all [renew] Reading configuration from the cluster... [renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' [renew] Error reading configuration from the Cluster. Falling back to default configuration certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed certificate for serving the Kubernetes API renewed certificate the apiserver uses to access etcd renewed certificate for the API server to connect to kubelet renewed certificate embedded in the kubeconfig file for the controller manager to use renewed certificate for liveness probes to healthcheck etcd renewed certificate for etcd nodes to communicate with each other renewed certificate for serving etcd renewed certificate for the front proxy client renewed certificate embedded in the kubeconfig file for the scheduler manager to use renewed Done renewing certificates. You must restart the kube-apiserver, kube-controller-manager, kube-scheduler and etcd, so that they can use the new certificates.
Verify:
- Verify using the following steps on all the TKC control plane nodes.
root [ /etc/kubernetes ]# grep client-certificate-data admin.conf | awk '{print $2}' | base64 -d | openssl x509 -noout -dates root [ /etc/kubernetes ]# kubeadm certs check-expiration
- Try connect to the TKC using tkc.kubeconfig.
KUBECONFIG=tkc.kubeconfig kubectl get node
References:
Saturday, March 18, 2023
Kubernetes 101 - Part7 - Restart all deployments and daemonsets in a namespace
Restart all deployments in a namespace
❯ kubectl rollout restart deployments -n <namespace>
Restart all daemonsets in a namespace
❯ kubectl rollout restart daemonsets -n <namespace>
Hope it was useful. Cheers!