vineethac.blogspot.com: kubectl

Showing posts with label kubectl. Show all posts

Sunday, June 23, 2024

vSphere with Tanzu using NSX-T - Part31 - Troubleshooting inaccessible TKC with expired control plane certs

In the course of managing multiple Tanzu Kubernetes Clusters (TKC), I encountered an unexpected issue: the control plane certificates had expired, preventing us from accessing the cluster using the kubeconfig file. To make matters worse, we were unable to SSH into the TKC control plane Virtual Machines (VMs) due to the vmware-system-user password expiring in accordance with STIG Hardening.

The recommended workaround for updating the vmware-system-user password expiry involves applying a specific daemonset on Guest Clusters. However, this approach requires access to the TKC using its admin kubeconfig file, which was unavailable due to the expired certificates.

Warning: In case of critical production issues that affect the accessibility of your Tanzu Kubernetes Cluster (TKC), it is strongly advised to submit a product support request to our team for assistance. This will ensure that you receive expert guidance and a timely resolution to help minimize the impact on your environment.

To resolve this issue, I followed an alternative workaround: I reset the root password of the TKC control plane VMs through the vCenter VM console, as outlined in this knowledge base article. Once the root password was reset, I was able to log directly into the TKC control plane VM using the VM console.

After gaining access to the TKC control plane VM, I proceeded to renew the control plane certificates using kubeadm, as detailed in this blog post. It's essential to apply this process to all control plane nodes in your cluster to ensure proper functionality.

root [ /etc/kubernetes ]# kubeadm certs check-expiration

root [ /etc/kubernetes ]# kubeadm certs renew all
[renew] Reading configuration from the cluster...
[renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[renew] Error reading configuration from the Cluster. Falling back to default configuration

certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed
certificate for serving the Kubernetes API renewed
certificate the apiserver uses to access etcd renewed
certificate for the API server to connect to kubelet renewed
certificate embedded in the kubeconfig file for the controller manager to use renewed
certificate for liveness probes to healthcheck etcd renewed
certificate for etcd nodes to communicate with each other renewed
certificate for serving etcd renewed
certificate for the front proxy client renewed
certificate embedded in the kubeconfig file for the scheduler manager to use renewed

Done renewing certificates. You must restart the kube-apiserver, kube-controller-manager, kube-scheduler and etcd, so that they can use the new certificates.

Although this workaround required some additional steps, it ultimately allowed us to regain access to our Tanzu Kubernetes Cluster and maintain its security and functionality.

Hope it was useful. Cheers!

Monday, January 15, 2024

Ollama - Part1 - Deploy Ollama on Kubernetes

Docker published GenAI stack around Oct 2023 which consists of large language models (LLMs) from Ollama, vector and graph databases from Neo4j, and the LangChain framework. These utilities can help developers with the resources they need to kick-start creating new applications using generative AI. Ollama can be used to deploy and run LLMs locally. In this exercise we will deploy Ollama to a Kubernetes cluster and prompt it.

In my case I am using a Tanzu Kubernetes Cluster (TKC) running on vSphere with Tanzu 7u3 platform powered by Dell PowerEdge R640 servers. The TKC nodes are using best-effort-2xlarge vmclass with 8 CPU and 64Gi Memory. Note that I am running it on a regular Kubernetes cluster without GPU. If you have GPU, additional configuration steps might be required.

Full project in my GitHub

https://github.com/vineethac/Ollama/tree/main/ollama_on_kubernetes

Hope it was useful. Cheers!

Saturday, December 30, 2023

GitOps using Argo CD - Part2 - Mini project

In the previous blog post, we discussed deploying Argo CD on a Kubernetes cluster and explored the fundamentals of application management. This time, we'll leverage Argo CD to deploy the applications from our Kubernetes mini project.

Full project in my GitHub

https://github.com/vineethac/Kubernetes/tree/main/gitops-argocd

Following are the different components of the project that will get deployed on to a Kubernetes cluster using the Argo CD application resource:

Ingress controller
Prometheus stack
FastAPI web app
FastAPI service monitor
Loki stack

Deploy each of these components by applying the corresponding YAML manifest, following the outlined steps in the GitHub repository mentioned above. After the successful deployment of all components, you can observe them in the Argo CD web UI, as illustrated below.

Hope it was useful. Cheers!

Sunday, December 3, 2023

Kubernetes mini project

In this mini project, we are going to learn the following:

Deploy a simple Python based web application on a Kubernetes cluster.
We will use Helm to deploy this app.
This web app uses FastAPI and exposes some metrics using the Prometheus Python client.
To store and visualize these metrics we will deploy Prometheus and Grafana in the K8s cluster.
We will also deploy and use an ingress controller for exposing the web app, Prometheus, and Grafana to external users.
For logging we will deploy and use Grafana Loki stack.

Full project in my GitHub

https://github.com/vineethac/Kubernetes/tree/main/mini-project

High-level steps to complete this project

Step1: Write the Python app.

Step2: Create the Dockerfile for the app.

Step3: Create the container image.

Step4: Push the container image to an image registry like Docker Hub.

Step5: Get access to a K8s cluster.

Step6: Deploy an ingress controller.

Step7: Create the Helm chart for your app and deploy it to the K8s cluster.

Step8: Deploy Prometheus stack on the K8s cluster using Helm.

Step9: Create a servicemonitor resource which defines the target to be monitored by Prometheus.

Step10: Verify targets and service discovery in Prometheus.

Step11: Configure Grafana dashboard and verify.

Step12. Deploy Grafana Loki stack using Helm.

Hope it was useful. Cheers!

Saturday, November 18, 2023

vSphere with Tanzu using NSX-T - Part29 - Logging using Loki stack

Grafana Loki is a log aggregation system that we can use for Kubernetes. In this post we will deploy Loki stack on a Tanzu Kubernetes cluster.

❯ KUBECONFIG=gc.kubeconfig kg no
NAME                                            STATUS   ROLES                  AGE    VERSION
tkc01-control-plane-k8fzb                       Ready    control-plane,master   144m   v1.23.8+vmware.3
tkc01-worker-nodepool-a1-pqq7j-76d555c9-4n5kh   Ready    <none>                 132m   v1.23.8+vmware.3
tkc01-worker-nodepool-a1-pqq7j-76d555c9-8pcc6   Ready    <none>                 128m   v1.23.8+vmware.3
tkc01-worker-nodepool-a1-pqq7j-76d555c9-rx7jf   Ready    <none>                 134m   v1.23.8+vmware.3
❯

❯ helm repo add grafana https://grafana.github.io/helm-charts
❯ helm repo update
❯ helm repo list
❯ helm search repo loki

I saved the values file using helm show values grafana/loki-stack and made necessary modifications as mentioned below.

I enabled Grafana by setting enabled: true. This will create a new Grafana instance.
I also added a section under grafana.ingress in the loki-stack/values.yaml, that will create an ingress resource for this new Grafana instance.

Here is the values.yaml file.

test_pod:
  enabled: true
  image: bats/bats:1.8.2
  pullPolicy: IfNotPresent

loki:
  enabled: true
  isDefault: true
  url: http://{{(include "loki.serviceName" .)}}:{{ .Values.loki.service.port }}
  readinessProbe:
    httpGet:
      path: /ready
      port: http-metrics
    initialDelaySeconds: 45
  livenessProbe:
    httpGet:
      path: /ready
      port: http-metrics
    initialDelaySeconds: 45
  datasource:
    jsonData: "{}"
    uid: ""


promtail:
  enabled: true
  config:
    logLevel: info
    serverPort: 3101
    clients:
      - url: http://{{ .Release.Name }}:3100/loki/api/v1/push

fluent-bit:
  enabled: false

grafana:
  enabled: true
  sidecar:
    datasources:
      label: ""
      labelValue: ""
      enabled: true
      maxLines: 1000
  image:
    tag: 8.3.5
  ingress:
    ## If true, Grafana Ingress will be created
    ##
    enabled: true

    ## IngressClassName for Grafana Ingress.
    ## Should be provided if Ingress is enable.
    ##
    ingressClassName: nginx

    ## Annotations for Grafana Ingress
    ##
    annotations: {}
      # kubernetes.io/ingress.class: nginx
      # kubernetes.io/tls-acme: "true"

    ## Labels to be added to the Ingress
    ##
    labels: {}

    ## Hostnames.
    ## Must be provided if Ingress is enable.
    ##
    # hosts:
    #   - grafana.domain.com
    hosts:
      - grafana-loki-vineethac-poc.test.com

    ## Path for grafana ingress
    path: /

    ## TLS configuration for grafana Ingress
    ## Secret must be manually created in the namespace
    ##
    tls: []
    # - secretName: grafana-general-tls
    #   hosts:
    #   - grafana.example.com

prometheus:
  enabled: false
  isDefault: false
  url: http://{{ include "prometheus.fullname" .}}:{{ .Values.prometheus.server.service.servicePort }}{{ .Values.prometheus.server.prefixURL }}
  datasource:
    jsonData: "{}"

filebeat:
  enabled: false
  filebeatConfig:
    filebeat.yml: |
      # logging.level: debug
      filebeat.inputs:
      - type: container
        paths:
          - /var/log/containers/*.log
        processors:
        - add_kubernetes_metadata:
            host: ${NODE_NAME}
            matchers:
            - logs_path:
                logs_path: "/var/log/containers/"
      output.logstash:
        hosts: ["logstash-loki:5044"]

logstash:
  enabled: false
  image: grafana/logstash-output-loki
  imageTag: 1.0.1
  filters:
    main: |-
      filter {
        if [kubernetes] {
          mutate {
            add_field => {
              "container_name" => "%{[kubernetes][container][name]}"
              "namespace" => "%{[kubernetes][namespace]}"
              "pod" => "%{[kubernetes][pod][name]}"
            }
            replace => { "host" => "%{[kubernetes][node][name]}"}
          }
        }
        mutate {
          remove_field => ["tags"]
        }
      }
  outputs:
    main: |-
      output {
        loki {
          url => "http://loki:3100/loki/api/v1/push"
          #username => "test"
          #password => "test"
        }
        # stdout { codec => rubydebug }
      }

# proxy is currently only used by loki test pod
# Note: If http_proxy/https_proxy are set, then no_proxy should include the
# loki service name, so that tests are able to communicate with the loki
# service.
proxy:
  http_proxy: ""
  https_proxy: ""
  no_proxy: ""

Deploy using Helm

❯ helm upgrade --install --atomic loki-stack grafana/loki-stack --values values.yaml --kubeconfig=gc.kubeconfig --create-namespace --namespace=loki-stack
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: gc.kubeconfig
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: gc.kubeconfig
Release "loki-stack" does not exist. Installing it now.
W1203 13:36:48.286498   31990 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W1203 13:36:48.592349   31990 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W1203 13:36:55.840670   31990 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W1203 13:36:55.849356   31990 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
NAME: loki-stack
LAST DEPLOYED: Sun Dec  3 13:36:45 2023
NAMESPACE: loki-stack
STATUS: deployed
REVISION: 1
NOTES:
The Loki stack has been deployed to your cluster. Loki can now be added as a datasource in Grafana.

See http://docs.grafana.org/features/datasources/loki/ for more detail.

Verify

❯ KUBECONFIG=gc.kubeconfig kg all -n loki-stack
NAME                                     READY   STATUS    RESTARTS   AGE
pod/loki-stack-0                         1/1     Running   0          89s
pod/loki-stack-grafana-dff58c989-jdq2l   2/2     Running   0          89s
pod/loki-stack-promtail-5xmrj            1/1     Running   0          89s
pod/loki-stack-promtail-cts5j            1/1     Running   0          89s
pod/loki-stack-promtail-frwvw            1/1     Running   0          89s
pod/loki-stack-promtail-wn4dw            1/1     Running   0          89s

NAME                            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/loki-stack              ClusterIP   10.110.208.35    <none>        3100/TCP   90s
service/loki-stack-grafana      ClusterIP   10.104.222.214   <none>        80/TCP     90s
service/loki-stack-headless     ClusterIP   None             <none>        3100/TCP   90s
service/loki-stack-memberlist   ClusterIP   None             <none>        7946/TCP   90s

NAME                                 DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/loki-stack-promtail   4         4         4       4            4           <none>          90s

NAME                                 READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/loki-stack-grafana   1/1     1            1           90s

NAME                                           DESIRED   CURRENT   READY   AGE
replicaset.apps/loki-stack-grafana-dff58c989   1         1         1       90s

NAME                          READY   AGE
statefulset.apps/loki-stack   1/1     91s

❯ KUBECONFIG=gc.kubeconfig kg ing -n loki-stack
NAME                 CLASS   HOSTS                                 ADDRESS        PORTS   AGE
loki-stack-grafana   nginx   grafana-loki-vineethac-poc.test.com   10.216.24.45   80      7m16s
❯

Now in my case I've an ingress controller and dns resolution in place. If you don't have those configured, you can just port forward the loki-stack-grafana service to view the Grafana dashboard.

To get the username and password you should decode the following secret:

❯ KUBECONFIG=gc.kubeconfig kg secrets -n loki-stack loki-stack-grafana -oyaml

Login to the Grafana instance and verify the Data Sources section, and it must be already configured. Now click on explore option and use the log browser to query logs.

Hope it was useful. Cheers!

Sunday, October 29, 2023

Kubernetes 101 - Part12 - Debug pod

When it comes to troubleshooting application connectivity and name resolution issues in Kubernetes, having the right tools at your disposal can make all the difference. One of the most common challenges is accessing essential utilities like ping, nslookup, dig, traceroute, and more. To simplify this process, we've created a container image that packs a range of these utilities, making it easy to quickly identify and resolve connectivity issues.

The Container Image: A Swiss Army Knife for Troubleshooting

This container image, designed specifically for Kubernetes troubleshooting, comes pre-installed with the following essential utilities:

ping: A classic network diagnostic tool for testing connectivity.
dig: A DNS lookup tool for resolving domain names to IP addresses.
nslookup: A network troubleshooting tool for resolving hostnames to IP addresses.
traceroute: A network diagnostic tool for tracing the path of packets across a network.
curl: A command-line tool for transferring data to and from a web server using HTTP, HTTPS, SCP, SFTP, TFTP, and more.
wget: A command-line tool for downloading files from the web.
nc: A command-line tool for reading and writing data to a network socket.
netstat: A command-line tool for displaying network connections, routing tables, and interface statistics.
ifconfig: A command-line tool for configuring network interfaces.
route: A command-line tool for displaying and modifying the IP routing table.
host: A command-line tool for performing DNS lookups and resolving hostnames.
arp: A command-line tool for displaying and modifying the ARP cache.
iostat: A command-line tool for displaying disk I/O statistics.
top: A command-line tool for displaying system resource usage.
free: A command-line tool for displaying free memory and swap space.
vmstat: A command-line tool for displaying virtual memory statistics.
pmap: A command-line tool for displaying process memory maps.
mpstat: A command-line tool for displaying multiprocessor statistics.
python3: A programming language and interpreter.
pip: A package installer for Python.

Run as a pod on Kubernetes

❯ kubectl run debug --image=vineethac/debug -n default -- sleep infinity

Exec into the debug pod

❯ kubectl exec -it debug -n default -- bash 
root@debug:/# ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=46 time=49.3 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=45 time=57.4 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=46 time=49.4 ms
^C
--- 8.8.8.8 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 49.334/52.030/57.404/3.799 ms
root@debug:/#
root@debug:/# nslookup google.com
Server:		10.96.0.10
Address:	10.96.0.10#53

Non-authoritative answer:
Name:	google.com
Address: 142.250.72.206
Name:	google.com
Address: 2607:f8b0:4005:80c::200e

root@debug:/# exit
exit
❯

Reference

https://github.com/vineethac/Docker/tree/main/debug-image

By having these essential utilities at your fingertips, you'll be better equipped to quickly identify and resolve connectivity issues in your Kubernetes cluster, saving you time and reducing the complexity of troubleshooting.

Hope it was useful. Cheers!

Sunday, July 23, 2023

Kubernetes 101 - Part11 - Find Kubernetes nodes with DiskPressure

Following are two quick and easy ways to find Kubernetes nodes with disk pressure:

jq:

kubectl get nodes -o json | jq -r '.items[] | select(.status.conditions[].reason=="KubeletHasDiskPressure") | .metadata.name'

jsonpath:

kubectl get nodes -o jsonpath='{range .items[*]} {.metadata.name} {" "} {.status.conditions[?(@.type=="DiskPressure")].status} {" "} {"\n"}'

❯ kubectl get no
NAME                                 STATUS   ROLES                  AGE     VERSION
tkc-btvsm-72hz2                      Ready    control-plane,master   124d    v1.23.8+vmware.3
tkc-btvsm-79xtn                      Ready    control-plane,master   124d    v1.23.8+vmware.3
tkc-btvsm-klmjz                      Ready    control-plane,master   124d    v1.23.8+vmware.3
tkc-workers-2cmvm-5bfcc5c9cd-gmv6m   Ready    <none>                 5d17h   v1.23.8+vmware.3
tkc-workers-2cmvm-5bfcc5c9cd-m44sq   Ready    <none>                 5d17h   v1.23.8+vmware.3
tkc-workers-2cmvm-5bfcc5c9cd-mjjlk   Ready    <none>                 5d17h   v1.23.8+vmware.3
tkc-workers-2cmvm-5bfcc5c9cd-wflrl   Ready    <none>                 5d17h   v1.23.8+vmware.3
tkc-workers-2cmvm-5bfcc5c9cd-xnqvk   Ready    <none>                 5d17h   v1.23.8+vmware.3
❯
❯
❯ kubectl get nodes -o json | jq -r '.items[] | select(.status.conditions[].reason=="KubeletHasDiskPressure") | .metadata.name'
tkc-workers-2cmvm-5bfcc5c9cd-m44sq
tkc-workers-2cmvm-5bfcc5c9cd-wflrl
❯
❯ kubectl get nodes -o jsonpath='{range .items[*]} {.metadata.name} {" "} {.status.conditions[?(@.type=="DiskPressure")].status} {" "} {"\n"}'
 tkc-btvsm-72hz2   False
 tkc-btvsm-79xtn   False
 tkc-btvsm-klmjz   False
 tkc-workers-2cmvm-5bfcc5c9cd-gmv6m   False
 tkc-workers-2cmvm-5bfcc5c9cd-m44sq   True
 tkc-workers-2cmvm-5bfcc5c9cd-mjjlk   False
 tkc-workers-2cmvm-5bfcc5c9cd-wflrl   True
 tkc-workers-2cmvm-5bfcc5c9cd-xnqvk   False
 %
❯

Hope it was useful. Cheers!

Sunday, July 9, 2023

vSphere with Tanzu using NSX-T - Part27 - nullfinalizer kubectl plugin

I have seen many cases where the supervisor namespace gets stuck at Terminating phase waiting on finalization on some of its child resources. This plugin can be used for setting finalizer to null for all objects of a specified api resource under a supervisor namespace. It will be helpful in cleaning up supervisor namespaces stuck terminating phase and can be also used to clean up stale resources under a supervisor namespace.

kubectl-nullfinalizer

#!/bin/bash

Help()
{
   # Display Help
   echo "This plugin sets finalizer to null for specified resource in a namespace."
   echo "Usage: kubectl nullfinalizer SVNAMESPACE RESOURCENAME"
   echo "Example: kubectl nullfinalizer vineetha-svns01 pvc"
}

# Get the options
while getopts ":h" option; do
   case $option in
      h) # display Help
         Help
         exit;;
     \?) # incorrect option
         echo "Error: Invalid option"
         exit;;
   esac
done

kubectl get -n $1 $2 --no-headers | awk '{print $1}' | xargs -I{} kubectl patch -n $1 $2 {} -p '{"metadata":{"finalizers": null}}' --type=merge

Usage

Place the plugin in the system executable path.
I placed it in $HOME/.krew/bin in my laptop.
Once you copied the plugin to the proper path, you can make it executable by: chmod 755 kubectl-nullfinalizer .
After that you should be able to run the plugin as: kubectl nullfinalizer SUPERVISORNAMESPACE RESOURCENAME .

Example

Following is an exmaple of a supervisor namespace stuck at Terminating phase. While describe you can see that it is waiting on finalization.

❯ k config current-context
wdc-08-vc07
❯ kg ns svc-sct-bot-dogfooding
NAME                     STATUS        AGE
svc-sct-bot-dogfooding   Terminating   584d
❯
❯ kg ns svc-sct-bot-dogfooding -oyaml

status:
  conditions:
  - lastTransitionTime: "2023-09-26T04:45:21Z"
    message: All resources successfully discovered
    reason: ResourcesDiscovered
    status: "False"
    type: NamespaceDeletionDiscoveryFailure
  - lastTransitionTime: "2023-09-26T04:45:21Z"
    message: All legacy kube types successfully parsed
    reason: ParsedGroupVersions
    status: "False"
    type: NamespaceDeletionGroupVersionParsingFailure
  - lastTransitionTime: "2023-09-26T04:45:21Z"
    message: All content successfully deleted, may be waiting on finalization
    reason: ContentDeleted
    status: "False"
    type: NamespaceDeletionContentFailure
  - lastTransitionTime: "2023-09-26T04:45:21Z"
    message: 'Some resources are remaining: clusters.cluster.x-k8s.io has 1 resource
      instances, kubeadmcontrolplanes.controlplane.cluster.x-k8s.io has 1 resource
      instances, machines.cluster.x-k8s.io has 4 resource instances, persistentvolumeclaims.
      has 9 resource instances, projects.registryagent.vmware.com has 1 resource instances,
      tanzukubernetesclusters.run.tanzu.vmware.com has 1 resource instances'
    reason: SomeResourcesRemain
    status: "True"
    type: NamespaceContentRemaining
  - lastTransitionTime: "2023-09-26T04:45:21Z"
    message: 'Some content in the namespace has finalizers remaining: cluster.cluster.x-k8s.io
      in 1 resource instances, cns.vmware.com/pvc-protection in 9 resource instances,
      controller-finalizer in 1 resource instances, kubeadm.controlplane.cluster.x-k8s.io
      in 1 resource instances, machine.cluster.x-k8s.io in 4 resource instances, tanzukubernetescluster.run.tanzu.vmware.com
      in 1 resource instances'
    reason: SomeFinalizersRemain
    status: "True"
    type: NamespaceFinalizersRemaining
  phase: Terminating

❯ kg pvc -n svc-sct-bot-dogfooding
NAME                                 STATUS        VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS              AGE
gc1-workers-r9jvb-4sfjc-containerd   Terminating   pvc-0d9f4a38-86ad-41d8-ab11-08707780fd85   70Gi       RWO            wdc-08-vc07c01-wcp-mgmt   538d
gc1-workers-r9jvb-szg9r-containerd   Terminating   pvc-ca6b6ec4-85fa-464c-abc6-683358994f3f   70Gi       RWO            wdc-08-vc07c01-wcp-mgmt   538d
gc1-workers-r9jvb-zbdt8-containerd   Terminating   pvc-8f2b0683-ebba-46cb-a691-f79a0e94d0e2   70Gi       RWO            wdc-08-vc07c01-wcp-mgmt   538d
gc2-workers-vpzl2-ffkgx-containerd   Terminating   pvc-69e64099-42c8-44b5-bef2-2737eca49c36   70Gi       RWO            wdc-08-vc07c01-wcp-mgmt   510d
gc2-workers-vpzl2-hww5v-containerd   Terminating   pvc-5a909482-4c95-42c7-b55a-57372f72e75f   70Gi       RWO            wdc-08-vc07c01-wcp-mgmt   510d
gc2-workers-vpzl2-stsnh-containerd   Terminating   pvc-ed7de540-72f4-4832-8439-da471bf4c892   70Gi       RWO            wdc-08-vc07c01-wcp-mgmt   510d
gc3-workers-2qr4c-64xpz-containerd   Terminating   pvc-38478f19-8180-4b9b-b5a9-8c06f17d0fbc   70Gi       RWO            wdc-08-vc07c01-wcp-mgmt   510d
gc3-workers-2qr4c-dpng5-containerd   Terminating   pvc-a8b12657-10bd-4993-b08e-51b7e9b259f9   70Gi       RWO            wdc-08-vc07c01-wcp-mgmt   538d
gc3-workers-2qr4c-wfvvd-containerd   Terminating   pvc-01c6b224-9dc0-4e03-b87e-641d4a4d0d95   70Gi       RWO            wdc-08-vc07c01-wcp-mgmt   538d
❯
❯ k nullfinalizer -h
This plugin sets finalizer to null for specified resource in a namespace.
Usage: kubectl nullfinalizer SVNAMESPACE RESOURCENAME
Example: kubectl nullfinalizer vineetha-svns01 pvc
❯
❯
❯ k nullfinalizer svc-sct-bot-dogfooding pvc
persistentvolumeclaim/gc1-workers-r9jvb-4sfjc-containerd patched
persistentvolumeclaim/gc1-workers-r9jvb-szg9r-containerd patched
persistentvolumeclaim/gc1-workers-r9jvb-zbdt8-containerd patched
persistentvolumeclaim/gc2-workers-vpzl2-ffkgx-containerd patched
persistentvolumeclaim/gc2-workers-vpzl2-hww5v-containerd patched
persistentvolumeclaim/gc2-workers-vpzl2-stsnh-containerd patched
persistentvolumeclaim/gc3-workers-2qr4c-64xpz-containerd patched
persistentvolumeclaim/gc3-workers-2qr4c-dpng5-containerd patched
persistentvolumeclaim/gc3-workers-2qr4c-wfvvd-containerd patched
❯
❯
❯ kg projects.registryagent.vmware.com -n svc-sct-bot-dogfooding
NAME                     AGE
svc-sct-bot-dogfooding   584d
❯
❯ k nullfinalizer -h
This plugin sets finalizer to null for specified resource in a namespace.
Usage: kubectl nullfinalizer SVNAMESPACE RESOURCENAME
Example: kubectl nullfinalizer vineetha-svns01 pvc
❯
❯ k nullfinalizer svc-sct-bot-dogfooding projects.registryagent.vmware.com
project.registryagent.vmware.com/svc-sct-bot-dogfooding patched
❯
❯
❯ kg ns svc-sct-bot-dogfooding
Error from server (NotFound): namespaces "svc-sct-bot-dogfooding" not found
❯

Hope it was useful. Cheers!

Saturday, June 17, 2023

Kubernetes 101 - Part10 - Plugins I use for managing K8s clusters

Following are some of the kubectl plugins that I use on a daily basis:

kubectx

kail

kubectl tree

kubectl images

kubectl allctx

kubectl view-cert -A -E

kubectl get-all

kubectl neat

You can find many useful kubectl plugins here: https://krew.sigs.k8s.io/plugins/

Saturday, June 10, 2023

vSphere with Tanzu using NSX-T - Part26 - Jumpbox kubectl plugin to SSH to TKC node

For troubleshooting TKC (Tanzu Kubernetes Cluster) you may need to ssh into the TKC nodes. For doing ssh, you will need to first create a jumpbox pod under the supervisor namespace and from there you can ssh to the TKC nodes.

Here is the manual procedure: https://docs.vmware.com/en/VMware-vSphere/7.0/vmware-vsphere-with-tanzu/GUID-587E2181-199A-422A-ABBC-0A9456A70074.html

Following kubectl plugin creats a jumpbox pod under a supervisor namespace. You can exec into this jumpbox pod to ssh into the TKC VMs.

kubectl-jumpbox

#!/bin/bash

Help()
{
   # Display Help
   echo "Description: This plugin creats a jumpbox pod under a supervisor namespace. You can exec into this jumpbox pod to ssh into the TKC VMs."
   echo "Usage: kubectl jumpbox SVNAMESPACE TKCNAME"
   echo "Example: k exec -it jumpbox-tkc1 -n svns1 -- /usr/bin/ssh vmware-system-user@VMIP"
}

# Get the options
while getopts ":h" option; do
   case $option in
      h) # display Help
         Help
         exit;;
     \?) # incorrect option
         echo "Error: Invalid option"
         exit;;
   esac
done

kubectl create -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: jumpbox-$2
  namespace: $1           #REPLACE
spec:
  containers:
  - image: "photon:3.0"
    name: jumpbox
    command: [ "/bin/bash", "-c", "--" ]
    args: [ "yum install -y openssh-server; mkdir /root/.ssh; cp /root/ssh/ssh-privatekey /root/.ssh/id_rsa; chmod 600 /root/.ssh/id_rsa; while true; do sleep 30; done;" ]
    volumeMounts:
      - mountPath: "/root/ssh"
        name: ssh-key
        readOnly: true
    resources:
      requests:
        memory: 2Gi
  
  volumes:
    - name: ssh-key
      secret:
        secretName: $2-ssh     #REPLACE YOUR-CLUSTER-NAME-ssh 

  
EOF

Usage

Place the plugin in the system executable path.
I placed it in $HOME/.krew/bin directory in my laptop.
Once you copied the plugin to the proper path, you can make it executable by: chmod 755 kubectl-jumpbox
After that you should be able to run the plugin as: kubectl jumpbox SUPERVISORNAMESPACE TKCNAME

Example

❯ kg tkc -n vineetha-dns1-test
NAME               CONTROL PLANE   WORKER   TKR NAME                           AGE    READY   TKR COMPATIBLE   UPDATES AVAILABLE
tkc                1               3        v1.21.6---vmware.1-tkg.1.b3d708a   213d   True    True             [1.22.9+vmware.1-tkg.1.cc71bc8]
tkc-using-cci-ui   1               1        v1.23.8---vmware.3-tkg.1           37d    True    True
❯
❯ kg po -n vineetha-dns1-test
NAME         READY   STATUS    RESTARTS   AGE
nginx-test   1/1     Running   0          29d
❯
❯
❯ kubectl jumpbox vineetha-dns1-test tkc
pod/jumpbox-tkc created
❯
❯ kg po -n vineetha-dns1-test
NAME          READY   STATUS    RESTARTS   AGE
jumpbox-tkc   0/1     Pending   0          8s
nginx-test    1/1     Running   0          29d
❯
❯ kg po -n vineetha-dns1-test
NAME          READY   STATUS    RESTARTS   AGE
jumpbox-tkc   1/1     Running   0          21s
nginx-test    1/1     Running   0          29d
❯
❯ k jumpbox -h
Description: This plugin creats a jumpbox pod under a supervisor namespace. You can exec into this jumpbox pod to ssh into the TKC VMs.
Usage: kubectl jumpbox SVNAMESPACE TKCNAME
Example: k exec -it jumpbox-tkc1 -n svns1 -- /usr/bin/ssh vmware-system-user@VMIP
❯
❯ kg vm -n vineetha-dns1-test -o wide
NAME                                                              POWERSTATE   CLASS               IMAGE                                                       PRIMARY-IP      AGE
tkc-control-plane-8rwpk                                           poweredOn    best-effort-small   ob-18900476-photon-3-k8s-v1.21.6---vmware.1-tkg.1.b3d708a   172.29.0.7      133d
tkc-using-cci-ui-control-plane-z8fkt                              poweredOn    best-effort-small   ob-20953521-tkgs-ova-photon-3-v1.23.8---vmware.3-tkg.1      172.29.13.130   37d
tkc-using-cci-ui-tkg-cluster-nodepool-9nf6-n6nt5-b97c86fb45mvgj   poweredOn    best-effort-small   ob-20953521-tkgs-ova-photon-3-v1.23.8---vmware.3-tkg.1      172.29.13.131   37d
tkc-workers-zbrnv-6c98dd84f9-52gn6                                poweredOn    best-effort-small   ob-18900476-photon-3-k8s-v1.21.6---vmware.1-tkg.1.b3d708a   172.29.0.6      133d
tkc-workers-zbrnv-6c98dd84f9-d9mm7                                poweredOn    best-effort-small   ob-18900476-photon-3-k8s-v1.21.6---vmware.1-tkg.1.b3d708a   172.29.0.8      133d
tkc-workers-zbrnv-6c98dd84f9-kk2dg                                poweredOn    best-effort-small   ob-18900476-photon-3-k8s-v1.21.6---vmware.1-tkg.1.b3d708a   172.29.0.3      133d
❯
❯ k exec -it jumpbox-tkc -n vineetha-dns1-test -- /usr/bin/ssh vmware-system-user@172.29.0.7
The authenticity of host '172.29.0.7 (172.29.0.7)' can't be established.
ECDSA key fingerprint is SHA256:B7ptmYm617lFzLErJm7G5IdT7y4SJYKhX/OenSgguv8.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '172.29.0.7' (ECDSA) to the list of known hosts.
Welcome to Photon 3.0 (\m) - Kernel \r (\l)
 13:06:06 up 133 days,  4:46,  0 users,  load average: 0.23, 0.33, 0.27

36 Security notice(s)
Run 'tdnf updateinfo info' to see the details.
vmware-system-user@tkc-control-plane-8rwpk [ ~ ]$ sudo su
root [ /home/vmware-system-user ]#
root [ /home/vmware-system-user ]#

Hope it was useful. Cheers!

Saturday, May 20, 2023

vSphere with Tanzu using NSX-T - Part25 - Spherelet

The Spherelet is based on the Kubernetes “Kubelet” and enables an ESXi hypervisor to act as a Kubernetes worker node. Sometimes you may notice that the worker nodes of your supervisor cluster are having NotReady,SchedulingDisabled status, and it maybe becuase spherelet is not running on those ESXi nodes.

Following are the steps to verify the status of spherelet service, and restart them if required.

Example:

❯ kubectx wdc-01-vcxx
Switched to context "wdc-01-vcxx".
❯ kubectl get node
NAME                               STATUS                        ROLES                  AGE    VERSION
42019f7e751b2818bb0c659028d49fdc   Ready                         control-plane,master   317d   v1.22.6+vmware.wcp.2
4201b0b21aed78d8e72bfb622bb8b98b   Ready                         control-plane,master   317d   v1.22.6+vmware.wcp.2
4201c53dcef2701a8c36463942d762dc   Ready                         control-plane,master   317d   v1.22.6+vmware.wcp.2
wdc-01-rxxesx04.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx05.xxxxxxxxx.com      NotReady,SchedulingDisabled   agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx06.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx32.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx33.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx34.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx35.xxxxxxxxx.com      Ready,SchedulingDisabled      agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx36.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx37.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx38.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx39.xxxxxxxxx.com      NotReady,SchedulingDisabled   agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx40.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46

Logs

ssh into the ESXi worker node.

tail -f /var/log/spherelet.log

Status

ssh into the ESXi worker node and run the following:

etc/init.d/spherelet status

You can check status of spherelet using PowerCLI. Following is an example:

> Connect-VIServer wdc-10-vcxx

> Get-VMHost | Get-VMHostService | where {$_.Key -eq "spherelet"}  | select VMHost,Key,Running | ft

VMHost                        Key       Running
------                        ---       -------
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True

Restart

ssh into the ESXi worker node and run the following:

/etc/init.d/spherelet restart

You can also restart spherelet service using PowerCLI. Following is an example to restart spherelet service on ALL the ESXi worker nodes of a cluster:

> Get-Cluster

Name                           HAEnabled  HAFailover DrsEnabled DrsAutomationLevel
                                          Level
----                           ---------  ---------- ---------- ------------------
wdc-10-vcxxc01                 True       1          True       FullyAutomated

> Get-Cluster -Name wdc-10-vcxxc01 | Get-VMHost | foreach { Restart-VMHostService -HostService ($_ | Get-VMHostService | where {$_.Key -eq "spherelet"}) }

Certificates

You may notice the ESXi worker nodes in NotReady state when the following spherelet certs expire.

/etc/vmware/spherelet/spherelet.crt
/etc/vmware/spherelet/client.crt

An example is given below:

❯ kg no
NAME                               STATUS     ROLES                  AGE    VERSION
420802008ec0d8ccaa6ac84140768375   Ready      control-plane,master   70d    v1.22.6+vmware.wcp.2
42087a63440b500de6cec759bb5900bf   Ready      control-plane,master   77d    v1.22.6+vmware.wcp.2
4208e08c826dfe283c726bc573109dbb   Ready      control-plane,master   77d    v1.22.6+vmware.wcp.2
wdc-08-rxxesx25.xxxxxxxxx.com      NotReady   agent                  370d   v1.22.6-sph-db56d46
wdc-08-rxxesx26.xxxxxxxxx.com      NotReady   agent                  370d   v1.22.6-sph-db56d46
wdc-08-rxxesx23.xxxxxxxxx.com      NotReady   agent                  370d   v1.22.6-sph-db56d46
wdc-08-rxxesx24.xxxxxxxxx.com      NotReady   agent                  370d   v1.22.6-sph-db56d46
wdc-08-rxxesx25.xxxxxxxxx.com      NotReady   agent                  370d   v1.22.6-sph-db56d46
wdc-08-rxxesx26.xxxxxxxxx.com      NotReady   agent                  370d   v1.22.6-sph-db56d46

You can ssh into the ESXi worker nodes and verify the validity of the above mentioned certs. They have a life time of one year.

Example:

[root@wdc-08-rxxesx25:~] openssl x509 -enddate -noout -in /etc/vmware/spherelet/spherelet.crt
notAfter=Sep  1 08:32:24 2023 GMT
[root@wdc-08-rxxesx25:~] openssl x509 -enddate -noout -in /etc/vmware/spherelet/client.crt
notAfter=Sep  1 08:32:24 2023 GMT

Depending on your support contract, if its a production environment you may need to open a case with VMware GSS for resolving this issue.

Ref KBs:

Verify

❯ kubectl get node
NAME                               STATUS   ROLES                  AGE     VERSION
42017dcb669bea2962da27fc2f6c16d2   Ready    control-plane,master   5d20h   v1.23.12+vmware.wcp.1
4201b763c766875b77bcb9f04f8840b3   Ready    control-plane,master   5d21h   v1.23.12+vmware.wcp.1
4201dab068e9b2d3af3b8fde450b3d96   Ready    control-plane,master   5d20h   v1.23.12+vmware.wcp.1
wdc-01-rxxesx04.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx05.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx06.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx32.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx33.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx34.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx35.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx36.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx37.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx38.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx39.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx40.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1

Hope it was useful. Cheers!

Sunday, May 7, 2023

Kubernetes 101 - Part9 - kubeconfig certificate expiration

You can verify the expiration date of kubeconfig in the current context as follows:

kubectl config view --minify --raw --output 'jsonpath={..user.client-certificate-data}' | base64 -d | openssl x509 -noout -enddate

❯ k config current-context
sc2-01-vcxx
❯
❯ kubectl config view --minify --raw --output 'jsonpath={..user.client-certificate-data}' | base64 -d | openssl x509 -noout -enddate
notAfter=Sep  6 05:13:47 2023 GMT
❯
❯ date
Thu Sep  7 18:05:52 IST 2023
❯

Hope it was useful. Cheers!

Saturday, April 15, 2023

Kubernetes 101 - Part8 - Filter events of a specific object

You can filter events of a specific object as follows:

k get event --field-selector involvedObject.name=<object name> -n <namespace>

➜  k get pods
NAME                    READY   STATUS             RESTARTS   AGE
new-replica-set-rx7vk   0/1     ImagePullBackOff   0          101s
new-replica-set-gsxxx   0/1     ImagePullBackOff   0          101s
new-replica-set-j6xcp   0/1     ImagePullBackOff   0          101s
new-replica-set-q8jz5   0/1     ErrImagePull       0          101s

➜  k get event --field-selector involvedObject.name=new-replica-set-q8jz5 -n default
LAST SEEN   TYPE      REASON      OBJECT                      MESSAGE
3m53s       Normal    Scheduled   pod/new-replica-set-q8jz5   Successfully assigned default/new-replica-set-q8jz5 to controlplane
2m33s       Normal    Pulling     pod/new-replica-set-q8jz5   Pulling image "busybox777"
2m33s       Warning   Failed      pod/new-replica-set-q8jz5   Failed to pull image "busybox777": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/busybox777:latest": failed to resolve reference "docker.io/library/busybox777:latest": pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed
2m33s       Warning   Failed      pod/new-replica-set-q8jz5   Error: ErrImagePull
2m3s        Warning   Failed      pod/new-replica-set-q8jz5   Error: ImagePullBackOff
110s        Normal    BackOff     pod/new-replica-set-q8jz5   Back-off pulling image "busybox777"

Hope it was useful. Cheers!

Saturday, April 8, 2023

vSphere with Tanzu using NSX-T - Part24 - Kubernetes component certs in TKC

The Kubernetes component certificates inside a TKC (Tanzu Kubernetes Cluster) has lifetime of 1 year. If you manage to upgrade your TKC atleast once a year, these certs will get rotated automatically.

IMPORTANT NOTES:

As per this VMware KB, if TKGS Guest Cluster certificates are expired, you will need to engage VMware support to manually rotate them.
Following troubleshooting steps and workaround are based on studies conducted on my dev/ test/ lab setup, and I will NOT recommend anyone to follow these on your production environment.

Symptom:

❯ KUBECONFIG=tkc.kubeconfig kubectl get nodes
Unable to connect to the server: x509: certificate has expired or is not yet valid

Troubleshooting:

Verify the certificate expiry of the tkc kubeconfig file itself.

❯ grep client-certificate-data tkc.kubeconfig | awk '{print $2}' | base64 -d | openssl x509 -noout -dates
notBefore=Mar  8 18:10:15 2022 GMT
notAfter=Mar  7 18:26:10 2024 GMT

Create a jumpbox pod and ssh to TKC control plane nodes.
Verify system pods and check logs from apiserver and etcd pods. Sample etcd pod logs are given below:

2023-04-11 07:09:00.268792 W | rafthttp: health check for peer b5bab7da6e326a7c could not connect: x509: certificate has expired or is not yet valid: current time 2023-04-11T07:08:57Z is after 2023-04-06T06:17:56Z
2023-04-11 07:09:00.268835 W | rafthttp: health check for peer b5bab7da6e326a7c could not connect: x509: certificate has expired or is not yet valid: current time 2023-04-11T07:08:57Z is after 2023-04-06T06:17:56Z
2023-04-11 07:09:00.268841 W | rafthttp: health check for peer 19b6b0bf00e81f0b could not connect: remote error: tls: bad certificate
2023-04-11 07:09:00.268869 W | rafthttp: health check for peer 19b6b0bf00e81f0b could not connect: remote error: tls: bad certificate
2023-04-11 07:09:00.310030 I | embed: rejected connection from "172.31.20.27:35362" (error "remote error: tls: bad certificate", ServerName "")
2023-04-11 07:09:00.312806 I | embed: rejected connection from "172.31.20.27:35366" (error "remote error: tls: bad certificate", ServerName "")
2023-04-11 07:09:00.321449 I | embed: rejected connection from "172.31.20.19:35034" (error "remote error: tls: bad certificate", ServerName "")
2023-04-11 07:09:00.322192 I | embed: rejected connection from "172.31.20.19:35036" (error "remote error: tls: bad certificate", ServerName "")

Verify whether admin.conf inside the control plane node has expired.

root [ /etc/kubernetes ]# grep client-certificate-data admin.conf | awk '{print $2}' | base64 -d | openssl x509 -noout -dates
notBefore=Mar  8 18:10:15 2022 GMT
notAfter=Apr  6 06:05:46 2023 GMT

Verify Kubernetes component certs in all the control plane nodes.

root [ /etc/kubernetes ]# kubeadm certs check-expiration
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[check-expiration] Error reading configuration from the Cluster. Falling back to default configuration

CERTIFICATE                EXPIRES                  RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
admin.conf                 Apr 06, 2023 06:05 UTC   <invalid>                               no
apiserver                  Apr 06, 2023 06:05 UTC   <invalid>       ca                      no
apiserver-etcd-client      Apr 06, 2023 06:05 UTC   <invalid>       etcd-ca                 no
apiserver-kubelet-client   Apr 06, 2023 06:05 UTC   <invalid>       ca                      no
controller-manager.conf    Apr 06, 2023 06:05 UTC   <invalid>                               no
etcd-healthcheck-client    Apr 06, 2023 06:05 UTC   <invalid>       etcd-ca                 no
etcd-peer                  Apr 06, 2023 06:05 UTC   <invalid>       etcd-ca                 no
etcd-server                Apr 06, 2023 06:05 UTC   <invalid>       etcd-ca                 no
front-proxy-client         Apr 06, 2023 06:05 UTC   <invalid>       front-proxy-ca          no
scheduler.conf             Apr 06, 2023 06:05 UTC   <invalid>                               no

CERTIFICATE AUTHORITY   EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
ca                      Mar 05, 2032 18:15 UTC   8y              no
etcd-ca                 Mar 05, 2032 18:15 UTC   8y              no
front-proxy-ca          Mar 05, 2032 18:15 UTC   8y              no

Workaround:

Renew Kubernetes component certs on control plane nodes if expired using kubeadm certs renew all.

root [ /etc/kubernetes ]# kubeadm certs renew all
[renew] Reading configuration from the cluster...
[renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[renew] Error reading configuration from the Cluster. Falling back to default configuration

certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed
certificate for serving the Kubernetes API renewed
certificate the apiserver uses to access etcd renewed
certificate for the API server to connect to kubelet renewed
certificate embedded in the kubeconfig file for the controller manager to use renewed
certificate for liveness probes to healthcheck etcd renewed
certificate for etcd nodes to communicate with each other renewed
certificate for serving etcd renewed
certificate for the front proxy client renewed
certificate embedded in the kubeconfig file for the scheduler manager to use renewed

Done renewing certificates. You must restart the kube-apiserver, kube-controller-manager, kube-scheduler and etcd, so that they can use the new certificates.

Verify:

Verify using the following steps on all the TKC control plane nodes.

root [ /etc/kubernetes ]# grep client-certificate-data admin.conf | awk '{print $2}' | base64 -d | openssl x509 -noout -dates

root [ /etc/kubernetes ]# kubeadm certs check-expiration

Try connect to the TKC using tkc.kubeconfig.

KUBECONFIG=tkc.kubeconfig kubectl get node

Hope it was useful. Cheers!

References:

https://kb.vmware.com/s/article/86251

https://kb.vmware.com/s/article/89324

Pages

Sunday, June 23, 2024

Monday, January 15, 2024

Full project in my GitHub

Saturday, December 30, 2023

Full project in my GitHub

Sunday, December 3, 2023

Full project in my GitHub

High-level steps to complete this project

Saturday, November 18, 2023

Deploy using Helm

Verify

Sunday, October 29, 2023

The Container Image: A Swiss Army Knife for Troubleshooting

Run as a pod on Kubernetes

Exec into the debug pod

Reference

Sunday, July 23, 2023

jq:

kubectl get nodes -o json | jq -r '.items[] | select(.status.conditions[].reason=="KubeletHasDiskPressure") | .metadata.name'

jsonpath:

kubectl get nodes -o jsonpath='{range .items[*]} {.metadata.name} {" "} {.status.conditions[?(@.type=="DiskPressure")].status} {" "} {"\n"}'

Sunday, July 9, 2023

kubectl-nullfinalizer

Usage

Example

Saturday, July 8, 2023

Saturday, June 17, 2023

Saturday, June 10, 2023

Usage

Example

Saturday, May 20, 2023

Logs

Status

Restart

Certificates

Verify

Sunday, May 7, 2023

Saturday, April 15, 2023

Saturday, April 8, 2023

IMPORTANT NOTES:

Symptom:

Troubleshooting:

Workaround:

Verify:

References: