Saturday, June 22, 2024

vSphere with Tanzu using NSX-T - Part31 - Troubleshooting inaccessible TKC with expired control plane certs

In the course of managing multiple Tanzu Kubernetes Clusters (TKC), I encountered an unexpected issue: the control plane certificates had expired, preventing us from accessing the cluster using the kubeconfig file. To make matters worse, we were unable to SSH into the TKC control plane Virtual Machines (VMs) due to the vmware-system-user password expiring in accordance with STIG Hardening.

The recommended workaround for updating the vmware-system-user password expiry involves applying a specific daemonset on Guest Clusters. However, this approach requires access to the TKC using its admin kubeconfig file, which was unavailable due to the expired certificates.

Warning: In case of critical production issues that affect the accessibility of your Tanzu Kubernetes Cluster (TKC), it is strongly advised to submit a product support request to our team for assistance. This will ensure that you receive expert guidance and a timely resolution to help minimize the impact on your environment.

To resolve this issue, I followed an alternative workaround: I reset the root password of the TKC control plane VMs through the vCenter VM console, as outlined in this knowledge base article. Once the root password was reset, I was able to log directly into the TKC control plane VM using the VM console.




After gaining access to the TKC control plane VM, I proceeded to renew the control plane certificates using kubeadm, as detailed in this blog post. It's essential to apply this process to all control plane nodes in your cluster to ensure proper functionality.

root [ /etc/kubernetes ]# kubeadm certs check-expiration

root [ /etc/kubernetes ]# kubeadm certs renew all
[renew] Reading configuration from the cluster...
[renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[renew] Error reading configuration from the Cluster. Falling back to default configuration

certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed
certificate for serving the Kubernetes API renewed
certificate the apiserver uses to access etcd renewed
certificate for the API server to connect to kubelet renewed
certificate embedded in the kubeconfig file for the controller manager to use renewed
certificate for liveness probes to healthcheck etcd renewed
certificate for etcd nodes to communicate with each other renewed
certificate for serving etcd renewed
certificate for the front proxy client renewed
certificate embedded in the kubeconfig file for the scheduler manager to use renewed

Done renewing certificates. You must restart the kube-apiserver, kube-controller-manager, kube-scheduler and etcd, so that they can use the new certificates.

Although this workaround required some additional steps, it ultimately allowed us to regain access to our Tanzu Kubernetes Cluster and maintain its security and functionality.

Hope it was useful. Cheers!

No comments:

Post a Comment