The Kubernetes component certificates inside a TKC (Tanzu Kubernetes Cluster) has lifetime of 1 year. If you manage to upgrade your TKC atleast once a year, these certs will get rotated automatically.
IMPORTANT NOTES:
- As per this VMware KB, if TKGS Guest Cluster certificates are expired, you will need to engage VMware support to manually rotate them.
- Following troubleshooting steps and workaround are based on studies conducted on my dev/ test/ lab setup, and I will NOT recommend anyone to follow these on your production environment.
Symptom:
❯ KUBECONFIG=tkc.kubeconfig kubectl get nodes
Unable to connect to the server: x509: certificate has expired or is not yet valid
Troubleshooting:
- Verify the certificate expiry of the tkc kubeconfig file itself.
❯ grep client-certificate-data tkc.kubeconfig | awk '{print $2}' | base64 -d | openssl x509 -noout -dates
notBefore=Mar 8 18:10:15 2022 GMT
notAfter=Mar 7 18:26:10 2024 GMT
- Create a jumpbox pod and ssh to TKC control plane nodes.
- Verify system pods and check logs from apiserver and etcd pods. Sample etcd pod logs are given below:
2023-04-11 07:09:00.268792 W | rafthttp: health check for peer b5bab7da6e326a7c could not connect: x509: certificate has expired or is not yet valid: current time 2023-04-11T07:08:57Z is after 2023-04-06T06:17:56Z
2023-04-11 07:09:00.268835 W | rafthttp: health check for peer b5bab7da6e326a7c could not connect: x509: certificate has expired or is not yet valid: current time 2023-04-11T07:08:57Z is after 2023-04-06T06:17:56Z
2023-04-11 07:09:00.268841 W | rafthttp: health check for peer 19b6b0bf00e81f0b could not connect: remote error: tls: bad certificate
2023-04-11 07:09:00.268869 W | rafthttp: health check for peer 19b6b0bf00e81f0b could not connect: remote error: tls: bad certificate
2023-04-11 07:09:00.310030 I | embed: rejected connection from "172.31.20.27:35362" (error "remote error: tls: bad certificate", ServerName "")
2023-04-11 07:09:00.312806 I | embed: rejected connection from "172.31.20.27:35366" (error "remote error: tls: bad certificate", ServerName "")
2023-04-11 07:09:00.321449 I | embed: rejected connection from "172.31.20.19:35034" (error "remote error: tls: bad certificate", ServerName "")
2023-04-11 07:09:00.322192 I | embed: rejected connection from "172.31.20.19:35036" (error "remote error: tls: bad certificate", ServerName "")
- Verify whether admin.conf inside the control plane node has expired.
root [ /etc/kubernetes ]# grep client-certificate-data admin.conf | awk '{print $2}' | base64 -d | openssl x509 -noout -dates
notBefore=Mar 8 18:10:15 2022 GMT
notAfter=Apr 6 06:05:46 2023 GMT
- Verify Kubernetes component certs in all the control plane nodes.
root [ /etc/kubernetes ]# kubeadm certs check-expiration
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[check-expiration] Error reading configuration from the Cluster. Falling back to default configuration
CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED
admin.conf Apr 06, 2023 06:05 UTC <invalid> no
apiserver Apr 06, 2023 06:05 UTC <invalid> ca no
apiserver-etcd-client Apr 06, 2023 06:05 UTC <invalid> etcd-ca no
apiserver-kubelet-client Apr 06, 2023 06:05 UTC <invalid> ca no
controller-manager.conf Apr 06, 2023 06:05 UTC <invalid> no
etcd-healthcheck-client Apr 06, 2023 06:05 UTC <invalid> etcd-ca no
etcd-peer Apr 06, 2023 06:05 UTC <invalid> etcd-ca no
etcd-server Apr 06, 2023 06:05 UTC <invalid> etcd-ca no
front-proxy-client Apr 06, 2023 06:05 UTC <invalid> front-proxy-ca no
scheduler.conf Apr 06, 2023 06:05 UTC <invalid> no
CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
ca Mar 05, 2032 18:15 UTC 8y no
etcd-ca Mar 05, 2032 18:15 UTC 8y no
front-proxy-ca Mar 05, 2032 18:15 UTC 8y no
Workaround:
- Renew Kubernetes component certs on control plane nodes if expired using
kubeadm certs renew all
.
root [ /etc/kubernetes ]# kubeadm certs renew all
[renew] Reading configuration from the cluster...
[renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[renew] Error reading configuration from the Cluster. Falling back to default configuration
certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed
certificate for serving the Kubernetes API renewed
certificate the apiserver uses to access etcd renewed
certificate for the API server to connect to kubelet renewed
certificate embedded in the kubeconfig file for the controller manager to use renewed
certificate for liveness probes to healthcheck etcd renewed
certificate for etcd nodes to communicate with each other renewed
certificate for serving etcd renewed
certificate for the front proxy client renewed
certificate embedded in the kubeconfig file for the scheduler manager to use renewed
Done renewing certificates. You must restart the kube-apiserver, kube-controller-manager, kube-scheduler and etcd, so that they can use the new certificates.
Verify:
- Verify using the following steps on all the TKC control plane nodes.
root [ /etc/kubernetes ]# grep client-certificate-data admin.conf | awk '{print $2}' | base64 -d | openssl x509 -noout -dates
root [ /etc/kubernetes ]# kubeadm certs check-expiration
- Try connect to the TKC using tkc.kubeconfig.
KUBECONFIG=tkc.kubeconfig kubectl get node
Hope it was useful. Cheers!
References: