Showing posts with label NSX-T. Show all posts
Showing posts with label NSX-T. Show all posts

Saturday, May 21, 2022

vSphere with Tanzu using NSX-T - Part15 - Working with etcd on TKC with one control plane

In this article, we will see how to work with etcd database of a Tanzu Kubernetes Cluster (TKC) with one control plane node and perform some basic operations. Following is a TKC with one control plane node and three worker nodes:

Get K8s cluster nodes
❯ gcc kg no
NAME STATUS ROLES AGE VERSION
gc-control-plane-6g9gk Ready control-plane,master 3d7h v1.21.6+vmware.1
gc-workers-rmgkm-78cf46d595-n5qp8 Ready <none> 7d19h v1.21.6+vmware.1
gc-workers-rmgkm-78cf46d595-wds2m Ready <none> 7d19h v1.21.6+vmware.1
gc-workers-rmgkm-78cf46d595-z2wvt Ready <none> 7d19h v1.21.6+vmware.1
Get the etcd pod and describe it
❯ gcc kg pod -A | grep etcd
kube-system etcd-gc-control-plane-6g9gk 1/1 Running 0 3d7h

❯ gcc kd pod etcd-gc-control-plane-6g9gk -n kube-system
Name: etcd-gc-control-plane-6g9gk
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Node: gc-control-plane-6g9gk/100.68.36.38
Start Time: Tue, 19 Jul 2022 11:14:25 +0530
Labels: component=etcd
tier=control-plane
Annotations: kubeadm.kubernetes.io/etcd.advertise-client-urls: https://100.68.36.38:2379
kubernetes.io/config.hash: 6e7bc05d35060112913f78af2043683f
kubernetes.io/config.mirror: 6e7bc05d35060112913f78af2043683f
kubernetes.io/config.seen: 2022-07-19T05:44:19.416549595Z
kubernetes.io/config.source: file
kubernetes.io/psp: vmware-system-privileged
Status: Running
IP: 100.68.36.38
IPs:
IP: 100.68.36.38
Controlled By: Node/gc-control-plane-6g9gk
Containers:
etcd:
Container ID: containerd://253c7b25bd60ea78dfccad52d03534785f0d7b7a1fa7105dbd55d7727f8785c3
Image: localhost:5000/vmware.io/etcd:v3.4.13_vmware.22
Image ID: sha256:78661ebbe1adaee60336a0f8ff031c4537ff309ef51feab6e840e7dbb3cbf47d
Port: <none>
Host Port: <none>
Command:
etcd
--advertise-client-urls=https://100.68.36.38:2379
--cert-file=/etc/kubernetes/pki/etcd/server.crt
--client-cert-auth=true
--data-dir=/var/lib/etcd
--initial-advertise-peer-urls=https://100.68.36.38:2380
--initial-cluster=gc-control-plane-6g9gk=https://100.68.36.38:2380,gc-control-plane-64lq5=https://100.68.36.34:2380
--initial-cluster-state=existing
--key-file=/etc/kubernetes/pki/etcd/server.key
--listen-client-urls=https://127.0.0.1:2379,https://100.68.36.38:2379
--listen-metrics-urls=http://127.0.0.1:2381
--listen-peer-urls=https://100.68.36.38:2380
--name=gc-control-plane-6g9gk
--peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
--peer-client-cert-auth=true
--peer-key-file=/etc/kubernetes/pki/etcd/peer.key
--peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
--snapshot-count=10000
--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
State: Running
Started: Tue, 19 Jul 2022 11:14:27 +0530
Ready: True
Restart Count: 0
Requests:
cpu: 100m
memory: 100Mi
Liveness: http-get http://127.0.0.1:2381/health delay=10s timeout=15s period=10s #success=1 #failure=8
Startup: http-get http://127.0.0.1:2381/health delay=10s timeout=15s period=10s #success=1 #failure=24
Environment: <none>
Mounts:
/etc/kubernetes/pki/etcd from etcd-certs (rw)
/var/lib/etcd from etcd-data (rw)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
etcd-certs:
Type: HostPath (bare host directory volume)
Path: /etc/kubernetes/pki/etcd
HostPathType: DirectoryOrCreate
etcd-data:
Type: HostPath (bare host directory volume)
Path: /var/lib/etcd
HostPathType: DirectoryOrCreate
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: :NoExecute op=Exists
Events: <none>

Exec into the etcd pod and run etcdctl commands

You can use etcdctl and you need to provide cacert, cert, and key details. All these info you will get while describing the etcd pod. 

❯ gcc k exec -it etcd-gc-control-plane-6g9gk -n kube-system -- sh -c "ETCDCTL_API=3 etcdctl member list --endpoints=127.0.0.1:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key"
c5c44d96f675add8, started, gc-control-plane-6g9gk, https://100.68.36.38:2380, https://100.68.36.38:2379, false


❯ gcc k exec -it etcd-gc-control-plane-6g9gk -n kube-system -- sh -c "ETCDCTL_API=3 etcdctl endpoint health --endpoints=127.0.0.1:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key --write-out json"
[{"endpoint":"127.0.0.1:2379","health":true,"took":"9.689387ms"}]

❯ gcc k exec -it etcd-gc-control-plane-6g9gk -n kube-system -- sh -c "ETCDCTL_API=3 etcdctl
endpoint status --endpoints=127.0.0.1:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key -w json"
[{"Endpoint":"127.0.0.1:2379","Status":{"header":{"cluster_id":4073335150581888229,"member_id":14250600431682432472,"revision":2153804,"raft_term":11},"version":"3.4.13","dbSize":24719360,"leader":14250600431682432472,"raftIndex":2429139,"raftTerm":11,"raftAppliedIndex":2429139,"dbSizeInUse":2678784}}]


Snapshot etcd
❯ gcc k exec -it etcd-gc-control-plane-6g9gk -n kube-system -- sh -c "ETCDCTL_API=3 etcdctl --endpoints=127.0.0.1:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key snapshot save snapshotdb-$(date +%d-%m-%y)"
{"level":"info","ts":1658651541.2698474,"caller":"snapshot/v3_snapshot.go:119","msg":"created temporary db file","path":"snapshotdb-24-07-22.part"}
{"level":"info","ts":"2022-07-24T08:32:21.277Z","caller":"clientv3/maintenance.go:200","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":1658651541.2771788,"caller":"snapshot/v3_snapshot.go:127","msg":"fetching snapshot","endpoint":"127.0.0.1:2379"}
{"level":"info","ts":"2022-07-24T08:32:21.594Z","caller":"clientv3/maintenance.go:208","msg":"completed snapshot read; closing"}
{"level":"info","ts":1658651541.621639,"caller":"snapshot/v3_snapshot.go:142","msg":"fetched snapshot","endpoint":"127.0.0.1:2379","size":"25 MB","took":0.344746859}
{"level":"info","ts":1658651541.621852,"caller":"snapshot/v3_snapshot.go:152","msg":"saved","path":"snapshotdb-24-07-22"}
Snapshot saved at snapshotdb-24-07-22

❯ gcc k exec -it etcd-gc-control-plane-6g9gk -n kube-system -- sh
# ls
bin boot dev etc home lib lib64 media mnt opt proc root run sbin snapshotdb-24-07-22 srv sys tmp usr var
#
# exit

❯ gcc k exec -it etcd-gc-control-plane-6g9gk -n kube-system -- sh -c "ETCDCTL_API=3 etcdctl --endpoints=127.0.0.1:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key snapshot status snapshotdb-24-07-22 -w table"
+----------+----------+------------+------------+
| HASH | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
|
b0910e83 | 2434362 | 1580 | 25 MB |
+----------+----------+------------+------------+

Copy snapshot file from etcd pod to local machine

Note: Even though there was an error while copying the snapshot file from the pod to local machine, you can see the file was successfully copied and I also verified the snapshot file status using etcdctl. Every field (hash, total_keys, etc.) matches with that of the source file.

❯ gcc kubectl cp kube-system/etcd-gc-control-plane-6g9gk:/snapshotdb-24-07-22 snapshotdb-24-07-22
tar: Removing leading `/' from member names
error: unexpected EOF
❯ ls snapshotdb-24-07-22
snapshotdb-24-07-22
❯ ETCDCTL_API=3 etcdctl snapshot status snapshotdb-24-07-22 -w table
Deprecated: Use `etcdutl snapshot status` instead.

+----------+----------+------------+------------+
| HASH | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| b0910e83 | 2434362 | 1580 | 25 MB |
+----------+----------+------------+------------+   

Restore etcd 

We can restore the etcd snapshot using etcdctl from the TKC control plane node. Inorder to connect to the control plane VM, we need to create a jumpbox pod under the corresponding supervisor namespace.

So, first copy the snapshot file from local machine to jumpbox pod. 

❯ ls snapshotdb-24-07-22
snapshotdb-24-07-22
❯ kubectl cp snapshotdb-24-07-22 vineetha-test04-deploy/jumpbox01:/

❯ k exec -it jumpbox01 -n vineetha-test04-deploy -- sh
sh-4.4# su
root [ / ]# ls
bin dev home lib64 mnt root sbin srv tmp var
boot etc lib media proc run snapshotdb-24-07-22 sys usr
root [ / ]#
Copy the snapshot file from jumpbox pod to control plane node.
❯ gcc kg po -A -o wide| grep etcd
kube-system etcd-gc-control-plane-6g9gk 1/1 Running 0 166m 100.68.36.38 gc-control-plane-6g9gk <none> <none>

❯ k exec -it jumpbox01 -n vineetha-test04-deploy -- scp /snapshotdb-24-07-22 vmware-system-user@100.68.36.38:/tmp
snapshotdb-24-07-22 100% 20MB 126.1MB/s 00:00

❯ k exec -it jumpbox01 -n vineetha-test04-deploy -- /usr/bin/ssh vmware-system-user@100.68.36.38
Welcome to Photon 3.0 (\m) - Kernel \r (\l)
Last login: Sun Jul 24 13:02:39 2022 from 100.68.35.210
13:14:29 up 5 days, 7:38, 0 users, load average: 0.98, 0.53, 0.31

26 Security notice(s)
Run 'tdnf updateinfo info' to see the details.
vmware-system-user@gc-control-plane-6g9gk [ ~ ]$ sudo su
root [ /home/vmware-system-user ]#
root [ /home/vmware-system-user ]# cd /tmp/
 
Install etcd on the control plane node, so that we get to access etcdctl utility.
root [ /tmp ]# tdnf install etcd
root [ /tmp ]# ETCDCTL_API=3 etcdctl member list --endpoints=127.0.0.1:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key
c5c44d96f675add8, started, gc-control-plane-6g9gk, https://100.68.36.38:2380, https://100.68.36.38:2379, false
root [ /tmp ]#
root [ /tmp ]# ETCDCTL_API=3 etcdctl snapshot status snapshotdb-24-07-22 --endpoints=127.0.0.1:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key
Deprecated: Use `etcdutl snapshot status` instead.

b0910e83, 2434362, 1580, 25 MB
root [ /tmp ]# hostname
gc-control-plane-6g9gk
root [ /tmp ]# ETCDCTL_API=3 etcdctl --endpoints=127.0.0.1:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key snapshot restore /tmp/snapshotdb-24-07-22 --data-dir=/var/lib/etcd-backup --skip-hash-check=true
Deprecated: Use `etcdutl snapshot restore` instead.

2022-07-24T13:20:44Z info snapshot/v3_snapshot.go:251 restoring snapshot {"path": "/tmp/snapshotdb-24-07-22", "wal-dir": "/var/lib/etcd-backup/member/wal", "data-dir": "/var/lib/etcd-backup", "snap-dir": "/var/lib/etcd-backup/member/snap", "stack": "go.etcd.io/etcd/etcdutl/v3/snapshot.(*v3Manager).Restore\n\t/usr/src/photon/BUILD/etcd-3.5.1/etcdutl/snapshot/v3_snapshot.go:257\ngo.etcd.io/etcd/etcdutl/v3/etcdutl.SnapshotRestoreCommandFunc\n\t/usr/src/photon/BUILD/etcd-3.5.1/etcdutl/etcdutl/snapshot_command.go:147\ngo.etcd.io/etcd/etcdctl/v3/ctlv3/command.snapshotRestoreCommandFunc\n\t/usr/src/photon/BUILD/etcd-3.5.1/etcdctl/ctlv3/command/snapshot_command.go:128\ngithub.com/spf13/cobra.(*Command).execute\n\t/usr/share/gocode/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:856\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/usr/share/gocode/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:960\ngithub.com/spf13/cobra.(*Command).Execute\n\t/usr/share/gocode/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:897\ngo.etcd.io/etcd/etcdctl/v3/ctlv3.Start\n\t/usr/src/photon/BUILD/etcd-3.5.1/etcdctl/ctlv3/ctl.go:107\ngo.etcd.io/etcd/etcdctl/v3/ctlv3.MustStart\n\t/usr/src/photon/BUILD/etcd-3.5.1/etcdctl/ctlv3/ctl.go:111\nmain.main\n\t/usr/src/photon/BUILD/etcd-3.5.1/etcdctl/main.go:59\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:250"}
2022-07-24T13:20:44Z info membership/store.go:141 Trimming membership information from the backend...
2022-07-24T13:20:44Z info membership/cluster.go:421 added member {"cluster-id": "cdf818194e3a8c32", "local-member-id": "0", "added-peer-id": "8e9e05c52164694d", "added-peer-peer-urls": ["http://localhost:2380"]}
2022-07-24T13:20:44Z info snapshot/v3_snapshot.go:272 restored snapshot {"path": "/tmp/snapshotdb-24-07-22", "wal-dir": "/var/lib/etcd-backup/member/wal", "data-dir": "/var/lib/etcd-backup", "snap-dir": "/var/lib/etcd-backup/member/snap"}
root [ /var/lib ]#
 
We have restored the database snapshot to a new location:  --data-dir=/var/lib/etcd-backup. So we need to modify the etcd-data hostpath to path: /var/lib/etcd-backup in the etcd static pod manifest file (etcd.yaml). Copy the contents of etcd.yaml file.
root [ /var/lib ]# cd /etc/kubernetes/manifests/
root [ /etc/kubernetes/manifests ]# ls
etcd.yaml kube-controller-manager.yaml registry.yaml
kube-apiserver.yaml kube-scheduler.yaml
root [ /etc/kubernetes/manifests ]# cat etcd.yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
kubeadm.kubernetes.io/etcd.advertise-client-urls: https://100.68.36.38:2379
creationTimestamp: null
labels:
component: etcd
tier: control-plane
name: etcd
namespace: kube-system
spec:
containers:
- command:
- etcd
- --advertise-client-urls=https://100.68.36.38:2379
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --client-cert-auth=true
- --data-dir=/var/lib/etcd
- --initial-advertise-peer-urls=https://100.68.36.38:2380
- --initial-cluster=gc-control-plane-6g9gk=https://100.68.36.38:2380,gc-control-plane-64lq5=https://100.68.36.34:2380
- --initial-cluster-state=existing
- --key-file=/etc/kubernetes/pki/etcd/server.key
- --listen-client-urls=https://127.0.0.1:2379,https://100.68.36.38:2379
- --listen-metrics-urls=http://127.0.0.1:2381
- --listen-peer-urls=https://100.68.36.38:2380
- --name=gc-control-plane-6g9gk
- --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
- --peer-client-cert-auth=true
- --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --snapshot-count=10000
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
image: localhost:5000/vmware.io/etcd:v3.4.13_vmware.22
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /health
port: 2381
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
name: etcd
resources:
requests:
cpu: 100m
memory: 100Mi
startupProbe:
failureThreshold: 24
httpGet:
host: 127.0.0.1
path: /health
port: 2381
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
volumeMounts:
- mountPath: /var/lib/etcd
name: etcd-data
- mountPath: /etc/kubernetes/pki/etcd
name: etcd-certs
hostNetwork: true
priorityClassName: system-node-critical
volumes:
- hostPath:
path: /etc/kubernetes/pki/etcd
type: DirectoryOrCreate
name: etcd-certs
- hostPath:
path: /var/lib/etcd
type: DirectoryOrCreate
name: etcd-data
status: {}
I was having difficulties in the terminal to modify it. So I copied the contents of etcd.yaml file locally, modified the path, removed the existing etcd.yaml file, created new etcd.yaml file, and pasted the modifed content in it.
root [ /etc/kubernetes/manifests ]# rm etcd.yaml
root [ /etc/kubernetes/manifests ]# vi etcd.yaml

<paste the above etcd.yaml file contents, with modified etcd-data hostPath, last part of the yaml will look like below:

volumes:
- hostPath:
path: /etc/kubernetes/pki/etcd
type: DirectoryOrCreate
name: etcd-certs
- hostPath:
path: /var/lib/etcd-backup
type: DirectoryOrCreate
name: etcd-data
status: {}

>
Once the etcd.yaml is saved, after few seconds you can see that etcd pod will be running.
root [ /etc/kubernetes/manifests ]# crictl ps -a
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID
92dad43a85ebc 78661ebbe1ada 1 second ago Running etcd 0 8258804cf17bb
5c704092eb4bb 25605c4ab20fe 10 seconds ago Running csi-resizer 10 1fa9feb732df5
495b5ff250cb4 05cfd9e3c3f22 10 seconds ago Running csi-provisioner 22 1fa9feb732df5
c0da43af7f1d0 a145efcc3afb4 11 seconds ago Running vsphere-syncer 10 1fa9feb732df5
4e6d67dc16f4a 5cb2119a4d797 11 seconds ago Running kube-controller-manager 11 3d1c266aa5e24
8710a7b6a8563 fa70d7ee973ad 11 seconds ago Running guest-cluster-cloud-provider 35 3f3d5eb0929e5
4e0c1e2e72682 f18cde23836f5 11 seconds ago Running csi-attacher 10 1fa9feb732df5
bf6771ca4dc4d a609b91a17410 11 seconds ago Running kube-scheduler 11 f50eada3f8127
05fa3f2f587e8 fa70d7ee973ad 3 hours ago Exited guest-cluster-cloud-provider 34 3f3d5eb0929e5
888ba7ce34d92 3f6d2884f8105 3 hours ago Running kube-apiserver 28 c368bd9937f8b
6579ffffa5f53 382a8821c56e0 3 hours ago Running metrics-server 21 622c835648008
bdd0c760d0bc7 382a8821c56e0 3 hours ago Exited metrics-server 20 622c835648008
6063485d73e38 78661ebbe1ada 3 hours ago Exited etcd 0 71e6ae04726ad
259595a330d26 3f6d2884f8105 3 hours ago Exited kube-apiserver 27 c368bd9937f8b
5034f4ea18f1f 25605c4ab20fe 4 hours ago Exited csi-resizer 9 1fa9feb732df5
21de3c4850dc3 05cfd9e3c3f22 4 hours ago Exited csi-provisioner 21 1fa9feb732df5
d623946b1d270 a145efcc3afb4 4 hours ago Exited vsphere-syncer 9 1fa9feb732df5
5f50b9e93d287 a609b91a17410 4 hours ago Exited kube-scheduler 10 f50eada3f8127
ec4e066f54fd6 5cb2119a4d797 4 hours ago Exited kube-controller-manager 10 3d1c266aa5e24
f05fee251a700 f18cde23836f5 4 hours ago Exited csi-attacher 9 1fa9feb732df5
d3577bf8477d0 b0f879c3b53ce 5 days ago Running liveness-probe 0 1fa9feb732df5
d30ba30b8c203 4251b7012fd43 5 days ago Running vsphere-csi-controller 0 1fa9feb732df5
1816ed6aada5f 02abc4bd595a0 5 days ago Running guest-cluster-auth-service 0 bff70bcd389be
3e014a6745f5a b0f879c3b53ce 5 days ago Running liveness-probe 0 66e5ca3abfe5b
ba82f29cf8939 4251b7012fd43 5 days ago Running vsphere-csi-node 0 66e5ca3abfe5b
9c29960956718 f3fe18dd8cea2 5 days ago Running node-driver-registrar 0 66e5ca3abfe5b
92aa3a904d72e 0515f8357a522 5 days ago Running antrea-agent 3 4ae63a9f8e4cb
9e0f32fa88663 0515f8357a522 5 days ago Exited antrea-agent 2 4ae63a9f8e4cb
ac5573a4d93f8 0515f8357a522 5 days ago Running antrea-ovs 0 4ae63a9f8e4cb
e14fea2c37c21 0515f8357a522 5 days ago Exited install-cni 0 4ae63a9f8e4cb
166627360a434 7fde82047d4f6 5 days ago Running docker-registry 0 c84c550a9ab90
ecfbdcd23858d f31127f4a3471 5 days ago Running kube-proxy 0 07f5b9be02414
root [ /etc/kubernetes/manifests ]# exit
exit
vmware-system-user@gc-control-plane-6g9gk [ ~ ]$
vmware-system-user@gc-control-plane-6g9gk [ ~ ]$ exit
logout

Verify

In my case I had a namespace vineethac-testing with two nginx pods running under it while the snapshot was taken. After the snapshot was taken, I deleted the two nginx pods and the namespace vineethac-testing. After restoring the etcd snapshot, I can see that the namespace vineethac-testing is active with two nginx pods under it.

❯ gcc kg ns
NAME STATUS AGE
default Active 9d
kube-node-lease Active 9d
kube-public Active 9d
kube-system Active 9d
vineethac-testing Active 4h28m
vmware-system-auth Active 9d
vmware-system-cloud-provider Active 9d
vmware-system-csi Active 9d

❯ gcc kg pods -n vineethac-testing
NAME READY STATUS RESTARTS AGE
nginx1 1/1 Running 0 4h25m
nginx2 1/1 Running 0 4h23m 

Hope it was useful. Cheers!

Note: I've tested this in a lab. This may not be the best practice procedure and may slightly vary in a real world environment.

Sunday, January 30, 2022

vSphere with Tanzu using NSX-T - Part14 - Testing TKC storage using kubestr

In the previous posts we discussed the following:

This article is about using kubestr to test storage options of Tanzu Kubernetes Cluster (TKC). Following are the steps to install kubestr on MAC:

  • wget https://github.com/kastenhq/kubestr/releases/download/v0.4.31/kubestr_0.4.31_MacOS_amd64.tar.gz
  • tar -xvf kubestr_0.4.31_MacOS_amd64.tar.gz 
  • chmod +x kubestr
  • mv kubestr /usr/local/bin

 

Now, lets do kubestr help.

% kubestr help
kubestr is a tool that will scan your k8s cluster
        and validate that the storage systems in place as well as run
        performance tests.

Usage:
  kubestr [flags]
  kubestr [command]

Available Commands:
  browse      Browse the contents of a CSI PVC via file browser
  csicheck    Runs the CSI snapshot restore check
  fio         Runs an fio test
  help        Help about any command

Flags:
  -h, --help             help for kubestr
  -e, --outfile string   The file where test results will be written
  -o, --output string    Options(json)

Use "kubestr [command] --help" for more information about a command.

 

I am going to use the following TKC for testing.

% KUBECONFIG=gc.kubeconfig kubectl get nodes                                            
NAME                               STATUS   ROLES                  AGE    VERSION
gc-control-plane-pwngg             Ready    control-plane,master   103d   v1.20.9+vmware.1
gc-workers-wrknn-f675446b6-cz766   Ready    <none>                 103d   v1.20.9+vmware.1
gc-workers-wrknn-f675446b6-f6zqs   Ready    <none>                 103d   v1.20.9+vmware.1
gc-workers-wrknn-f675446b6-rsf6n   Ready    <none>                 103d   v1.20.9+vmware.1

 

Let's run kubestr against the cluster now.

% KUBECONFIG=gc.kubeconfig kubestr                                      

**************************************
  _  ___   _ ___ ___ ___ _____ ___
  | |/ / | | | _ ) __/ __|_   _| _ \
  | ' <| |_| | _ \ _|\__ \ | | |   /
  |_|\_\\___/|___/___|___/ |_| |_|_\

Explore your Kubernetes storage options
**************************************
Kubernetes Version Check:
  Valid kubernetes version (v1.20.9+vmware.1)  -  OK

RBAC Check:
  Kubernetes RBAC is enabled  -  OK

Aggregated Layer Check:
  The Kubernetes Aggregated Layer is enabled  -  OK

W0130 14:17:16.937556   87541 warnings.go:70] storage.k8s.io/v1beta1 CSIDriver is deprecated in v1.19+, unavailable in v1.22+; use storage.k8s.io/v1 CSIDriver
Available Storage Provisioners:

  csi.vsphere.xxxx.com:
    Can't find the CSI snapshot group api version.
    This is a CSI driver!
    (The following info may not be up to date. Please check with the provider for more information.)
    Provider:            vSphere
    Website:             https://github.com/kubernetes-sigs/vsphere-csi-driver
    Description:         A Container Storage Interface (CSI) Driver for VMware vSphere
    Additional Features: Raw Block,<br/><br/>Expansion (Block Volume),<br/><br/>Topology Aware (Block Volume)

    Storage Classes:
      * sc2-01-vc16c01-wcp-mgmt

    To perform a FIO test, run-
      ./kubestr fio -s <storage class>

 

 

You can run storage tests using kubestr and it uses FIO for generating IOs. For example this is how you can run a basic storage test.

% KUBECONFIG=gc.kubeconfig kubestr fio -s sc2-01-vc16c01-wcp-mgmt -z 10G
PVC created kubestr-fio-pvc-zvdhr
Pod created kubestr-fio-pod-kdbs5
Running FIO test (default-fio) on StorageClass (sc2-01-vc16c01-wcp-mgmt) with a PVC of Size (10G)
Elapsed time- 29.290421119s
FIO test results:
 
FIO version - fio-3.20
Global options - ioengine=libaio verify=0 direct=1 gtod_reduce=1

JobName: read_iops
  blocksize=4K filesize=2G iodepth=64 rw=randread
read:
  IOPS=3987.150391 BW(KiB/s)=15965
  iops: min=3680 max=4274 avg=3992.034424
  bw(KiB/s): min=14720 max=17096 avg=15968.827148

JobName: write_iops
  blocksize=4K filesize=2G iodepth=64 rw=randwrite
write:
  IOPS=3562.628906 BW(KiB/s)=14267
  iops: min=3237 max=3750 avg=3565.896484
  bw(KiB/s): min=12950 max=15000 avg=14264.862305

JobName: read_bw
  blocksize=128K filesize=2G iodepth=64 rw=randread
read:
  IOPS=2988.549316 BW(KiB/s)=383071
  iops: min=2756 max=3252 avg=2992.344727
  bw(KiB/s): min=352830 max=416256 avg=383056.187500

JobName: write_bw
  blocksize=128k filesize=2G iodepth=64 rw=randwrite
write:
  IOPS=2754.796143 BW(KiB/s)=353151
  iops: min=2480 max=2992 avg=2759.586182
  bw(KiB/s): min=317440 max=382976 avg=353242.781250

Disk stats (read/write):
  sdd: ios=117160/105647 merge=0/1210 ticks=2100090/2039676 in_queue=4139076, util=99.608589%
  -  OK

As you can see, a PVC of 10G, a FIO pod will be created, and this will be used for the FIO test. Once the test is complete, the PVC and FIO pod will be deleted automatically. 

I hope it was useful. Cheers!


Saturday, December 18, 2021

vSphere with Tanzu using NSX-T - Part13 - Export WCP admin kubeconfig

In the previous posts we discussed the following:

This article shows the steps to export WCP admin kubeconfig file from the supervisor control plane VM. This is the admin kubeconfig file that can be used to manage the Supervisor/ WCP K8s cluster.

Step1: SSH as root to the vCenter server.

Step2: Run the script /usr/lib/vmware-wcp/decryptK8Pwd.py and make a note of the IP and PWD.

Step3: SSH as root to the IP that you noted down from previous step, and then provide the password that you got from step2.

Step4: You can now copy the admin kubeconfig file from /etc/kubernetes/admin.conf file to your local machine. Make sure to modify the field server: https://127.0.0.1:6443 in your local admin.conf file to the IP that you got from step2 (server: https://IP_from_step2:6443). 

Note: If you are managing multiple WCP clusters, you can merge all the kubeconfig files. Refer this blog by Jacob Tomlinson for more details. 

Hope it was useful. Cheers!

Friday, November 5, 2021

vSphere with Tanzu using NSX-T - Part12 - Deploy application on TKC and access it

In the previous posts we discussed the following:

This article walks you though the steps to deploy an application on Tanzu Kubernetes Cluster (TKC) and how to access it. I will try to explain how this all works under the hood.

Here I have a TKC cluster as shown below: 

% KUBECONFIG=gc.kubeconfig kg nodes                    
NAME                               STATUS   ROLES                  AGE   VERSION
gc-control-plane-pwngg             Ready    control-plane,master   49d   v1.20.9+vmware.1
gc-workers-wrknn-f675446b6-cz766   Ready    <none>                 49d   v1.20.9+vmware.1
gc-workers-wrknn-f675446b6-f6zqs   Ready    <none>                 49d   v1.20.9+vmware.1
gc-workers-wrknn-f675446b6-rsf6n   Ready    <none>                 49d   v1.20.9+vmware.1

% KUBECONFIG=gc.kubeconfig kg nodes -o wide
NAME                               STATUS   ROLES                  AGE   VERSION            INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                 KERNEL-VERSION       CONTAINER-RUNTIME
gc-control-plane-pwngg             Ready    control-plane,master   49d   v1.20.9+vmware.1   172.29.21.194   <none>        VMware Photon OS/Linux   4.19.191-4.ph3-esx   containerd://1.4.6
gc-workers-wrknn-f675446b6-cz766   Ready    <none>                 49d   v1.20.9+vmware.1   172.29.21.195   <none>        VMware Photon OS/Linux   4.19.191-4.ph3-esx   containerd://1.4.6
gc-workers-wrknn-f675446b6-f6zqs   Ready    <none>                 49d   v1.20.9+vmware.1   172.29.21.196   <none>        VMware Photon OS/Linux   4.19.191-4.ph3-esx   containerd://1.4.6
gc-workers-wrknn-f675446b6-rsf6n   Ready    <none>                 49d   v1.20.9+vmware.1   172.29.21.197   <none>        VMware Photon OS/Linux   4.19.191-4.ph3-esx   containerd://1.4.6

01 Create a namespace

% KUBECONFIG=gc.kubeconfig k create ns webserver
namespace/webserver created

% KUBECONFIG=gc.kubeconfig kg ns                
NAME                           STATUS   AGE
default                        Active   48d
kube-node-lease                Active   48d
kube-public                    Active   48d
kube-system                    Active   48d
vmware-system-auth             Active   48d
vmware-system-cloud-provider   Active   48d
vmware-system-csi              Active   48d
webserver                      Active   10s

02 Deploy nginx application

Following is the nginx-deployment.yaml spec to deploy nginx application:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-nginx
spec:
  selector:
    matchLabels:
      run: my-nginx
  replicas: 2
  template:
    metadata:
      labels:
        run: my-nginx
    spec:
      containers:
      - name: my-nginx
        image: nginx
        ports:
        - containerPort: 80

You can apply the yaml file as below:

% KUBECONFIG=gc.kubeconfig k apply -f nginx-deployment.yaml -n webserver
deployment.apps/my-nginx created

% KUBECONFIG=gc.kubeconfig kg deploy -n webserver                     
NAME       READY   UP-TO-DATE   AVAILABLE   AGE
my-nginx   0/2     0            0           3m3s

% KUBECONFIG=gc.kubeconfig kg events -n webserver
LAST SEEN   TYPE      REASON              OBJECT                           MESSAGE
26s         Warning   FailedCreate        replicaset/my-nginx-74d7c6cb98   Error creating: pods "my-nginx-74d7c6cb98-" is forbidden: PodSecurityPolicy: unable to admit pod: []
3m10s       Normal    ScalingReplicaSet   deployment/my-nginx              Scaled up replica set my-nginx-74d7c6cb98 to 2

You can see that the pods failed to get created due to PodSecurityPolicy. Following is the psp.yaml spec to create ClusterRole and ClusterRoleBinding.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: psp:privileged
rules:
- apiGroups: ['policy']
  resources: ['podsecuritypolicies']
  verbs:     ['use']
  resourceNames:
  - vmware-system-privileged
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: all:psp:privileged
roleRef:
  kind: ClusterRole
  name: psp:privileged
  apiGroup: rbac.authorization.k8s.io
subjects:
- kind: Group
  name: system:serviceaccounts
  apiGroup: rbac.authorization.k8s.io

Apply the yaml file as shown below:

% KUBECONFIG=gc.kubeconfig k apply -f psp.yaml
clusterrole.rbac.authorization.k8s.io/psp:privileged created
clusterrolebinding.rbac.authorization.k8s.io/all:psp:privileged created

Now, in few minutes you can see the deployment will get successful and two nginx pods will get deployed in the webserver namespace.

% KUBECONFIG=gc.kubeconfig kg deploy -n webserver
NAME       READY   UP-TO-DATE   AVAILABLE   AGE
my-nginx   2/2     2            2           80m

% KUBECONFIG=gc.kubeconfig kg pods -n webserver -o wide
NAME                        READY   STATUS    RESTARTS   AGE   IP                NODE                               NOMINATED NODE   READINESS GATES
my-nginx-74d7c6cb98-lzghr   1/1     Running   0          67m   192.168.213.132   gc-workers-wrknn-f675446b6-rsf6n   <none>           <none>
my-nginx-74d7c6cb98-s59dt   1/1     Running   0          67m   192.168.67.196    gc-workers-wrknn-f675446b6-f6zqs   <none>           <none>
 

03 Access the application

You can access the application in many ways depending on the usecase.

---Port-forward---

% KUBECONFIG=gc.kubeconfig kubectl port-forward deployment/my-nginx -n webserver 8080:80
Forwarding from 127.0.0.1:8080 -> 80
Forwarding from [::1]:8080 -> 80
Handling connection for 8080

The deployment is port-forwarded now. If you open another terminal and do curl localhost:8080, you can see the nginx webpage.

% curl localhost:8080
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

You can also open a web browser with http://localhost:8080/ and you will get the same nginx webpage. Well port-forwarding is fine in a local dev test scenario, but you might not want to do it in a production setup. You will need to create a service that connects the application and to access it. 

Services

There are 3 types of services in Kubernetes.

  1. NodePort: Similar to port forwarding where a port on the worker node will be forwarded to the target port of the pod where the application is running.
  2. ClusterIP: This is useful if you want to access the application from within the cluster.
  3. LoadBalancer: This is used to provide access to external users. In my case, NSX-T will be providing this access.

---Service NodePort---

Following is the yaml spec file for service of type nodeport:

% cat nginx-service-np.yaml
apiVersion: v1
kind: Service
metadata:
  name: my-nginx
  labels:
    run: my-nginx
spec:
  type: NodePort
  ports:
  - targetPort: 80
    port: 80
    protocol: TCP
  selector:
    run: my-ngin
x

Apply the above yaml file.

% KUBECONFIG=gc.kubeconfig k apply -f nginx-service-np.yaml -n webserver
service/my-nginx created 

% KUBECONFIG=gc.kubeconfig kg svc -n webserver               
NAME       TYPE       CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
my-nginx   NodePort   10.111.182.155   <none>        80:30741/TCP   4s

% KUBECONFIG=gc.kubeconfig kg ep -n webserver               
NAME       ENDPOINTS                              AGE
my-nginx   192.168.213.132:80,192.168.67.196:80   32m

As you can see, a service (my-nginx) of type NodePort is created. And, now the application should be accessible on port 30741 of any worker node. To verify it, first we need connectivity to the worker node IP. For connecting to worker nodes, we need to have a jumpbox pod deployed on the supervisor namespace. Once we have a jumpbox pod deployed on the sv namespace, we can ssh to TKC nodes from the jumpbox pod. You can follow my previous post to see how to create a jumpbox pod. Here is the link to VMware documentation for how to SSH to TKC nodes.

% KUBECONFIG=sv.kubeconfig k exec -it jumpbox -- sh
sh-4.4#     
sh-4.4# curl 172.29.21.197:30741
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
sh-4.4#

---Service ClusterIP---

Service of type ClusterIP will be accessible within the TKC. So, I will need to deploy a jumpbox pod/ test pod within the TKC and connect from there. First let me edit the svc my-nginx from NodePort to type ClusterIP.

% KUBECONFIG=gc.kubeconfig k edit svc my-nginx -n webserver
service/my-nginx edited

% KUBECONFIG=gc.kubeconfig kg svc -n webserver             
NAME       TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
my-nginx   ClusterIP   10.111.182.155   <none>        80/TCP    39m

I have already deploy a pod inside the TKC. As you can see, dnsutils is the pod that is deployed in the default namespace. We will connect to this pod and from there we can curl to the Cluster-IP of my-nginx service.

% KUBECONFIG=gc.kubeconfig kg pods                  
NAME       READY   STATUS    RESTARTS   AGE
dnsutils   1/1     Running   1          105m

% KUBECONFIG=gc.kubeconfig k exec -it dnsutils -- sh
#
# curl 10.111.182.155:80
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
#

Note: This service of type ClusterIP can be accessed only within the TKC, and not externally!

---Service LoadBalancer---

This is the way to expose your service to external users. In this case NSX-T will provide the external IP which will then internally forwarded to nginx pods through the my-nginx service.

I have edited the service my-nginx from type ClusterIP to LoadBalancer.

% KUBECONFIG=gc.kubeconfig k edit svc my-nginx -n webserver
service/my-nginx edited

% KUBECONFIG=gc.kubeconfig kg svc -n webserver             
NAME       TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
my-nginx   LoadBalancer   10.111.182.155   <pending>     80:32398/TCP   56m

% KUBECONFIG=gc.kubeconfig kg svc -n webserver
NAME       TYPE           CLUSTER-IP       EXTERNAL-IP      PORT(S)        AGE
my-nginx   LoadBalancer   10.111.182.155   10.186.148.170   80:32398/TCP   56m

You can see that now the service has got an external ip. And, the end points of the service are as shown below, which is basically the nginx pod IPs.

% KUBECONFIG=gc.kubeconfig kg ep -n webserver
NAME       ENDPOINTS                              AGE
my-nginx   192.168.213.132:80,192.168.67.196:80   58m

% curl 10.186.148.170
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

I could also use the external IP 10.186.148.170 in a web browser to access the nginx webpage.

Now lets have a look at what is in the supervisor namespace. This TKC is created under a supervisor namespace "vineetha-test04-deploy".

% kubectl get svc -n vineetha-test04-deploy
NAME                       TYPE           CLUSTER-IP      EXTERNAL-IP      PORT(S)          AGE
gc-ba320a1e3e04259514411   LoadBalancer   172.28.5.217    10.186.148.170   80:31143/TCP     40h
gc-control-plane-service   LoadBalancer   172.28.9.37     10.186.149.120   6443:31639/TCP   51d

% kubectl get ep -n vineetha-test04-deploy  
NAME                       ENDPOINTS                                                     AGE
gc-ba320a1e3e04259514411   172.29.21.195:32398,172.29.21.196:32398,172.29.21.197:32398   40h
gc-control-plane-service   172.29.21.194:6443                                            51d

So what you are seeing is, for a service of type loadbalancer created inside the TKC, a service of type loadbalancer (gc-ba320a1e3e04259514411) will be automatically created under the supervisor namespace, and the its endpoints are the IP address of TKC worker nodes.


On the NSX-T side you can see the LB for my supervisor namespace, virtual servers in it, and server pool members in the virtual server.

I hope it was useful. Cheers! 

Saturday, September 25, 2021

vSphere with Tanzu using NSX-T - Part11 - Troubleshooting Tanzu Kubernetes Clusters

In the previous posts we discussed the following:

In this article, we will go through some basic kubectl commands that may help you in troubleshooting Tanzu Kubernetes clusters. I have noticed there are cases where the guest TKCs are getting stuck at creating or updating phases.

List all TKCs that are stuck at creating/ updating:
kubectl get tanzukubernetescluster --all-namespaces --sort-by="metadata.creationTimestamp" | grep creating
kubectl get tanzukubernetescluster --all-namespaces --sort-by="metadata.creationTimestamp" | grep updating

On the newer versions of WCP, you may not see the TKC phase (creating/ updating/ running) in the kubectl output. I am using the following custom alias for it.

alias kgtkc='kubectl get tkc -A -o custom-columns=NAMESPACE:.metadata.namespace,NAME:.metadata.name,PHASE:status.phase,CREATIONTIME:metadata.creationTimestamp,VERSION:spec.distribution.fullVersion,CP:spec.topology.controlPlane.replicas,WORKER:status.totalWorkerReplicas --sort-by="metadata.creationTimestamp"'

You can add it to your ~/.zshrc file and relaunch the terminal. Example usage:

% kgtkc | grep updating
c1nsxtest1-sla                     gc                            updating   2021-01-21T08:23:37Z   v1.19.7+vmware.1-tkg.2.f52f85a    3     3
w2cei-sep20                       gc                            updating   2021-09-16T17:48:07Z   v1.20.9+vmware.1-tkg.1.a4cee5b    1     4

For TKCs that are in creating phase, some of the most common reasons might be due to lack of sufficient resources to provision the nodes, or it maybe waiting for IP allocation, etc. For the TKCs that are stuck at updating phase, it may be due to reconciliation issues, newly provisioned nodes might be waiting for IP address, old nodes may be stuck at drain phase, nodes might be in notready state, specific OVA version is not available in the contnet library, etc. You can try the following kubectl commands to get more insight into whats happening:

See events in a namespace:
kubectl get events -n <namespace>

See all events:
kubectl get events -A

Watch events in a namespace:
kubectl get events -n <namespace> -w

List the Cluster API resources supporting the clusters in the current namespace:
kubectl get cluster-api -n <namespace>

Describe TKC:
kubectl describe tkc <tkc_name> -n <namespace>

List TKC virtual machines in a namespace:
kubectl get vm -n <namespace>

List TKC virtual machines in a namespace with its IP:

kubectl get vm -n <namespace> -o json | jq -r '[.items[] | {namespace:.metadata.namespace, name:.metadata.name, internalIP: .status.vmIp}]'

List all nodes of a cluster:
kubectl get nodes -o wide

List all pods that are not running:
kubectl get pods -A | grep -vi running

List health status of different cluster components:
kubectl get --raw '/healthz?verbose'

% kubectl get --raw '/healthz?verbose'
[+]ping ok
[+]log ok
[+]etcd ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/priority-and-fairness-config-consumer ok
[+]poststarthook/priority-and-fairness-filter ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[+]poststarthook/rbac/bootstrap-roles ok
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/priority-and-fairness-config-producer ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/aggregator-reload-proxy-client-cert ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
healthz check passed

List all CRDs installed in your cluster and their API versions:
kubectl api-resources -o wide --sort-by="name"

List available Tanzu Kubernetes releases:
kubectl get tanzukubernetesreleases

List available virtual machine images:
kubectl get virtualmachineimages

List terminating namespaces:

kubectl get ns --field-selector status.phase=Terminating

You can ssh to the Tanzu Kubernetes cluster nodes as the system user following this:
https://docs.vmware.com/en/VMware-vSphere/7.0/vmware-vsphere-with-tanzu/GUID-587E2181-199A-422A-ABBC-0A9456A70074.html

Here is an example where I have a TKC under namespace: vineetha-test05-deploy

% kubectl get tkc -n vineetha-test05-deploy
NAME   CONTROL PLANE   WORKER   TKR NAME                           AGE    READY   TKR COMPATIBLE   UPDATES AVAILABLE
gc     1               3        v1.20.9---vmware.1-tkg.1.a4cee5b   4d5h   True    True             [1.21.2+vmware.1-tkg.1.ee25d55]

% kubectl get vm -n vineetha-test05-deploy -o json | jq -r '[.items[] | {namespace:.metadata.namespace, name:.metadata.name, internalIP: .status.vmIp}]'
[
  {
    "namespace": "vineetha-test05-deploy",
    "name": "gc-control-plane-ttkmt",
    "internalIP": "172.29.4.194"
  },
  {
    "namespace": "vineetha-test05-deploy",
    "name": "gc-workers-7fcql-6f984fdd59-d286z",
    "internalIP": "172.29.4.195"
  },
  {
    "namespace": "vineetha-test05-deploy",
    "name": "gc-workers-7fcql-6f984fdd59-hwr8b",
    "internalIP": "172.29.4.197"
  },
  {
    "namespace": "vineetha-test05-deploy",
    "name": "gc-workers-7fcql-6f984fdd59-r99x7",
    "internalIP": "172.29.4.196"
  }
]

 
Given below is the yaml file that deploys a pod named jumpbox under the supervisor namespace vineetha-test05-deploy, and from there you can ssh to the TKC nodes. 

apiVersion: v1
kind: Pod
metadata:
  name: jumpbox
  namespace: vineetha-test05-deploy           #REPLACE
spec:
  containers:
  - image: "photon:3.0"
    name: jumpbox
    command: [ "/bin/bash", "-c", "--" ]
    args: [ "yum install -y openssh-server; mkdir /root/.ssh; cp /root/ssh/ssh-privatekey /root/.ssh/id_rsa; chmod 600 /root/.ssh/id_rsa; while true; do sleep 30; done;" ]
    volumeMounts:
      - mountPath: "/root/ssh"
        name: ssh-key
        readOnly: true
    resources:
      requests:
        memory: 2Gi
  volumes:
    - name: ssh-key
      secret:
        secretName: gc-ssh     #REPLACE


Once you apply the above yaml, you can see the jumpbox pod.

% kubectl get pod -n vineetha-test05-deploy                                                                                                              
NAME      READY   STATUS    RESTARTS   AGE
jumpbox   1/1     Running   0          22m

Now, you can connect to the TKC node with its internal IP.

% kubectl -n vineetha-test05-deploy exec -it jumpbox -- /usr/bin/ssh vmware-system-user@172.29.4.194                                             
Welcome to Photon 3.0 (\m) - Kernel \r (\l)
Last login: Mon Nov 22 16:36:40 2021 from 172.29.4.34
 16:50:34 up 4 days,  5:49,  0 users,  load average: 2.14, 0.97, 0.65

26 Security notice(s)
Run 'tdnf updateinfo info' to see the details.
vmware-system-user@gc-control-plane-ttkmt [ ~ ]$ hostname
gc-control-plane-ttkmt

You can check the status of control plane pods using crictl ps.

vmware-system-user@gc-control-plane-ttkmt [ ~ ]$ sudo crictl ps
CONTAINER           IMAGE               CREATED             STATE               NAME                           ATTEMPT             POD ID
bde228417c55a       9000c334d9197       4 days ago          Running             guest-cluster-auth-service     0                   d7abf3db8670d
bc4b8c1bf0e33       a294c1cf07bd6       4 days ago          Running             metrics-server                 0                   2665876cf939e
46a94dcf02f3e       92cb72974660c       4 days ago          Running             coredns                        0                   7497cdf3269ab
f7d32016d6fb7       f48f23686df21       4 days ago          Running             csi-resizer                    0                   b887d394d4f80
ef80f62f3ed65       2cba51b244f27       4 days ago          Running             csi-provisioner                0                   b887d394d4f80
64b570add2859       4d2e937854849       4 days ago          Running             liveness-probe                 0                   b887d394d4f80
c0c1db3aac161       d032188289eb5       4 days ago          Running             vsphere-syncer                 0                   b887d394d4f80
e4df023ada129       e75228f70c0d6       4 days ago          Running             vsphere-csi-controller         0                   b887d394d4f80
e79b3cfdb4143       8a857a48ee57f       4 days ago          Running             csi-attacher                   0                   b887d394d4f80
96e4af8792cd0       b8bffc9e5af52       4 days ago          Running             calico-kube-controllers        0                   b5e467a43b34a
23791d5648ebb       92cb72974660c       4 days ago          Running             coredns                        0                   9bde50bbfb914
0f47d11dc211b       ab1e2f4eb3589       4 days ago          Running             guest-cluster-cloud-provider   0                   fde68175c5d95
5ddfd46647e80       4d2e937854849       4 days ago          Running             liveness-probe                 0                   1a88f26173762
578ddeeef5bdd       e75228f70c0d6       4 days ago          Running             vsphere-csi-node               0                   1a88f26173762
3fcb8a287ea48       9a3d9174ac1e7       4 days ago          Running             node-driver-registrar          0                   1a88f26173762
91b490c14d085       dc02a60cdbe40       4 days ago          Running             calico-node                    0                   35cf458eb80f8
68dbbdb779484       f7ad2965f3ac0       4 days ago          Running             kube-proxy                     0                   79f129c96e6e1
ef423f4aeb128       75bfe47a404bb       4 days ago          Running             docker-registry                0                   752724fbbcd6a
26dd8e1f521f5       9358496e81774       4 days ago          Running             kube-apiserver                 0                   814e5d2be5eab
62745db4234e2       ab8fb8e444396       4 days ago          Running             kube-controller-manager        0                   94543f93f7563
f2fc30c2854bd       9aa6da547b7eb       4 days ago          Running             etcd                           0                   f0a756a4cdc09
b8038e9f90e15       212d4c357a28e       4 days ago          Running             kube-scheduler                 0                   533a44c70e86c

You can check the status of kubelet and containerd services:
sudo systemctl status kubelet.service

vmware-system-user@gc-control-plane-ttkmt [ ~ ]$
<udo systemctl status kubelet.service                                  
WARNING: terminal is not fully functional
-  (press RETURN)● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset:>
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Thu 2021-11-18 11:01:54 UTC; 4 days ago
     Docs: http://kubernetes.io/docs/
 Main PID: 2234 (kubelet)
    Tasks: 16 (limit: 4728)
   Memory: 88.6M
   CGroup: /system.slice/kubelet.service
           └─2234 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/boots>

Nov 22 16:32:06 gc-control-plane-ttkmt kubelet[2234]: W1122 16:32:06.065785    >
Nov 22 16:32:06 gc-control-plane-ttkmt kubelet[2234]: W1122 16:32:06.067045    >


sudo systemctl status containerd.service

vmware-system-user@gc-control-plane-ttkmt [ ~ ]$
<udo systemctl status containerd.service                               
WARNING: terminal is not fully functional
-  (press RETURN)● containerd.service - containerd container runtime
   Loaded: loaded (/etc/systemd/system/containerd.service; enabled; vendor pres>
   Active: active (running) since Thu 2021-11-18 11:01:23 UTC; 4 days ago
     Docs: https://containerd.io
 Main PID: 1783 (containerd)
    Tasks: 386 (limit: 4728)
   Memory: 639.3M
   CGroup: /system.slice/containerd.service
           ├─ 1783 /usr/local/bin/containerd
           ├─ 1938 containerd-shim -namespace k8s.io -workdir /var/lib/containe>
           ├─ 1939 containerd-shim -namespace k8s.io -workdir /var/lib/containe>


If you have issues related to the provisioning/ deployment of TKC, you can check the logs present in the CP node:

vmware-system-user@gc-control-plane-ttkmt [ /var/log ]$ ls
audit                  devicelist  sa                  vmware-vgauthsvc.log.0
auth.log               journal     sgidlist            vmware-vmsvc-root.log
btmp                   kubernetes  stigreport.log      vmware-vmtoolsd-root.log
cloud-init.log         lastlog     suidlist            wtmp
cloud-init-output.log  pods        tallylog
containers             private     vmware-imc
cron                   rpmcheck    vmware-network.log


Following is a great VMware blog series/ videos covering the different resources involved in the deployment process and troubleshooting aspects of TKCs that are provisioned using the TKG service running on the supervisor cluster.

https://core.vmware.com/blog/tanzu-kubernetes-grid-service-troubleshooting-deep-dive-part-1


https://core.vmware.com/blog/tanzu-kubernetes-grid-service-troubleshooting-deep-dive-part-2


https://core.vmware.com/blog/tanzu-kubernetes-grid-service-troubleshooting-deep-dive-part-3

 



Hope it was useful. Cheers!