vineethac.blogspot.com: supervisor cluster

Showing posts with label supervisor cluster. Show all posts

Thursday, July 4, 2024

vSphere with Tanzu using NSX-T - Part35 - Monitoring supervisor cluster health with Python and vCenter APIs

vSphere with Tanzu Supervisor cluster is a Kubernetes platform that simplifies the deployment, management, and scaling of Kubernetes clusters. Monitoring the health of your WCP/ Supervisor clusters is crucial to ensure the smooth running of your Tanzu Kubernetes Clusters (TKCs) and applications. In this blog post, we'll explore how to use Python and vCenter APIs to verify the health of your Supervisor clusters.

You can access the Python script from my GitHub repository: https://github.com/vineethac/VMware/tree/main/vSphere_with_Tanzu/wcp_cluster_health

This script connects to the vCenter server, retrieves the cluster summary, and checks the Tanzu Supervisor cluster configuration info and prints the status of the cluster. By using this Python script, you can easily monitor the health of your Tanzu Supervisor clusters through vCenter APIs.

Hope it was useful. Cheers!

Monday, July 1, 2024

vSphere with Tanzu using NSX-T - Part34 - CPU and Memory utilization of a supervisor cluster

vSphere with Tanzu is a Kubernetes-based platform for deploying and managing containerized applications. As with any cloud-native platform, it's essential to monitor the performance and utilization of the underlying infrastructure to ensure optimal resource allocation and avoid any potential issues. In this blog post, we'll explore a Python script that can be used to check the CPU and memory allocation/ usage of a WCP Supervisor cluster.

You can access the Python script from my GitHub repository: https://github.com/vineethac/VMware/tree/main/vSphere_with_Tanzu/wcp_cluster_util

Sample screenshot of the output

The script uses the Kubernetes Python client library (kubernetes) to connect to the Supervisor cluster using the admin kubeconfig and retrieve information about the nodes and their resource utilization. The script then calculates the average CPU and memory utilization across all nodes and prints the results to the console.

Note: In my case instead of running it as a script every time, I made it an executable plugin and copied it to the system executable path. I placed it in $HOME/.krew/bin in my laptop.

Hope it was useful. Cheers!

Saturday, May 20, 2023

vSphere with Tanzu using NSX-T - Part25 - Spherelet

The Spherelet is based on the Kubernetes “Kubelet” and enables an ESXi hypervisor to act as a Kubernetes worker node. Sometimes you may notice that the worker nodes of your supervisor cluster are having NotReady,SchedulingDisabled status, and it maybe becuase spherelet is not running on those ESXi nodes.

Following are the steps to verify the status of spherelet service, and restart them if required.

Example:

❯ kubectx wdc-01-vcxx
Switched to context "wdc-01-vcxx".
❯ kubectl get node
NAME                               STATUS                        ROLES                  AGE    VERSION
42019f7e751b2818bb0c659028d49fdc   Ready                         control-plane,master   317d   v1.22.6+vmware.wcp.2
4201b0b21aed78d8e72bfb622bb8b98b   Ready                         control-plane,master   317d   v1.22.6+vmware.wcp.2
4201c53dcef2701a8c36463942d762dc   Ready                         control-plane,master   317d   v1.22.6+vmware.wcp.2
wdc-01-rxxesx04.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx05.xxxxxxxxx.com      NotReady,SchedulingDisabled   agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx06.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx32.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx33.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx34.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx35.xxxxxxxxx.com      Ready,SchedulingDisabled      agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx36.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx37.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx38.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx39.xxxxxxxxx.com      NotReady,SchedulingDisabled   agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx40.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46

Logs

ssh into the ESXi worker node.

tail -f /var/log/spherelet.log

Status

ssh into the ESXi worker node and run the following:

etc/init.d/spherelet status

You can check status of spherelet using PowerCLI. Following is an example:

> Connect-VIServer wdc-10-vcxx

> Get-VMHost | Get-VMHostService | where {$_.Key -eq "spherelet"}  | select VMHost,Key,Running | ft

VMHost                        Key       Running
------                        ---       -------
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True

Restart

ssh into the ESXi worker node and run the following:

/etc/init.d/spherelet restart

You can also restart spherelet service using PowerCLI. Following is an example to restart spherelet service on ALL the ESXi worker nodes of a cluster:

> Get-Cluster

Name                           HAEnabled  HAFailover DrsEnabled DrsAutomationLevel
                                          Level
----                           ---------  ---------- ---------- ------------------
wdc-10-vcxxc01                 True       1          True       FullyAutomated

> Get-Cluster -Name wdc-10-vcxxc01 | Get-VMHost | foreach { Restart-VMHostService -HostService ($_ | Get-VMHostService | where {$_.Key -eq "spherelet"}) }

Certificates

You may notice the ESXi worker nodes in NotReady state when the following spherelet certs expire.

/etc/vmware/spherelet/spherelet.crt
/etc/vmware/spherelet/client.crt

An example is given below:

❯ kg no
NAME                               STATUS     ROLES                  AGE    VERSION
420802008ec0d8ccaa6ac84140768375   Ready      control-plane,master   70d    v1.22.6+vmware.wcp.2
42087a63440b500de6cec759bb5900bf   Ready      control-plane,master   77d    v1.22.6+vmware.wcp.2
4208e08c826dfe283c726bc573109dbb   Ready      control-plane,master   77d    v1.22.6+vmware.wcp.2
wdc-08-rxxesx25.xxxxxxxxx.com      NotReady   agent                  370d   v1.22.6-sph-db56d46
wdc-08-rxxesx26.xxxxxxxxx.com      NotReady   agent                  370d   v1.22.6-sph-db56d46
wdc-08-rxxesx23.xxxxxxxxx.com      NotReady   agent                  370d   v1.22.6-sph-db56d46
wdc-08-rxxesx24.xxxxxxxxx.com      NotReady   agent                  370d   v1.22.6-sph-db56d46
wdc-08-rxxesx25.xxxxxxxxx.com      NotReady   agent                  370d   v1.22.6-sph-db56d46
wdc-08-rxxesx26.xxxxxxxxx.com      NotReady   agent                  370d   v1.22.6-sph-db56d46

You can ssh into the ESXi worker nodes and verify the validity of the above mentioned certs. They have a life time of one year.

Example:

[root@wdc-08-rxxesx25:~] openssl x509 -enddate -noout -in /etc/vmware/spherelet/spherelet.crt
notAfter=Sep  1 08:32:24 2023 GMT
[root@wdc-08-rxxesx25:~] openssl x509 -enddate -noout -in /etc/vmware/spherelet/client.crt
notAfter=Sep  1 08:32:24 2023 GMT

Depending on your support contract, if its a production environment you may need to open a case with VMware GSS for resolving this issue.

Ref KBs:

Verify

❯ kubectl get node
NAME                               STATUS   ROLES                  AGE     VERSION
42017dcb669bea2962da27fc2f6c16d2   Ready    control-plane,master   5d20h   v1.23.12+vmware.wcp.1
4201b763c766875b77bcb9f04f8840b3   Ready    control-plane,master   5d21h   v1.23.12+vmware.wcp.1
4201dab068e9b2d3af3b8fde450b3d96   Ready    control-plane,master   5d20h   v1.23.12+vmware.wcp.1
wdc-01-rxxesx04.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx05.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx06.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx32.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx33.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx34.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx35.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx36.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx37.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx38.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx39.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx40.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1

Hope it was useful. Cheers!

Saturday, February 4, 2023

vSphere with Tanzu using NSX-T - Part23 - Supervisor cluster certificates expiry

Note that the supervisor control plane component certificates will expire after one year.

Here is the VMware KB: https://kb.vmware.com/s/article/89324

NOTE: If certificates expire on the Supervisor or Guest Clusters, access and management of the clusters will fail. And, you will need to raise a case with VMware support team for assistance.

Keep a note of this cert expiry date, and if you can update the supervisor cluster atleast once in a year, these certs will get updated.

Here is a quick way to check the expiry of the supervisor control plane certs.

❯ k config current-context
sc2-06-d5165f-vc01
❯
❯ k cluster-info
Kubernetes control plane is running at https://10.43.69.117:6443
KubeDNS is running at https://10.43.69.117:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
❯
❯ echo | openssl s_client -servername 10.43.69.117 -connect  10.43.69.117:6443 | openssl x509 -noout -dates
depth=0 CN = kube-apiserver
verify error:num=20:unable to get local issuer certificate
verify return:1
depth=0 CN = kube-apiserver
verify error:num=21:unable to verify the first certificate
verify return:1
DONE
notBefore=Jun  2 09:36:17 2023 GMT
notAfter=Jun  1 09:36:18 2024 GMT
❯

Thanks to my friend Ravikrithik Udainath for the above openssl tip!

I am using the admin kubeconfig of the supervisor cluster. Here is the link to my previous article on exporting WCP admin kubeconfig file. In this case, 10.43.69.117 is the floating IP for the supervisor control plane and it is assigned to one of the supervisor control plane VMs.

This vSphere with Tanzu cluster was deployed on June 02, 2023, and as you can see above, the certificate expiry will be after one year, which in this case is June 01, 2024.

You can set up some sort of monitoring/ alerting for all your supervisor clusters to get notification on these expiry dates.

Hope it was useful. Cheers!

Sunday, July 17, 2022

vSphere with Tanzu using NSX-T - Part16 - Troubleshooting content library related issues

In this article, we will take a look at troubleshooting some of the content library related issues that you may encounter while managing/ administering vSphere with Tanzu clusters.

Case 1:

TKC (guest K8s cluster) deployments failing as VMs were not getting deployed. You can see Failed to deploy OVF package error in the VC UI. This was due to error A general system error occurred: HTTP request error: cannot authenticate SSL certificate for host wp-content.vmware.com while syncing content library.

Following is a sample log for this issue from the vmop-controller-manger:

Warning CreateFailure 5m29s (x26 over 50m) vmware-system-vmop/vmware-system-vmop-controller-manager-85484c67b7-9jncl/virtualmachine-controller deploy from content library failed for image "ob-19344082-tkgs-ova-ubuntu-2004-v1.21.6---vmware.1-tkg.1": POST https://sc2-01-vcxx.xx.xxxx.com:443/rest/com/vmware/vcenter/ovf/library-item/id:8b34e422-cc30-4d44-9d78-367528df0622?~action=deploy: 500 Internal Server Error

This can be resolved by just editing the content library and accepting new certificate thumbprint.

Case 2:

Missing TKRs. Even though CL is present in the VC and will have all required OVF Templates, on the supervisor cluster TKR resources will be missing/ not found.

❯ kubectl get tkr
No resources found

This could happen if there are duplicate content libraries present in the VC with same Subscription URL. If you find duplicate CLs, try removing them. If there are CLs that are not being used, consider deleting them. Also, try synchronize the CL.

If this doesn't resolve the issue, try to delete and recreate the CL, and make sure you select the newly created CL under Cluster > Configure > Supervisor Cluster > General > Tanzu Kubernetes Grid Service > Content Library.

You may also verify the vmware-system-vmop-controller-manager pod logs and capw-controller-manager pod logs. Check if those pods are running, or getting continuously restarted. If required you may restart those pods.

Case 3:

TKC deployments failing as VMs were not getting deployed. Sample vmop-controller-manger logs given below:

E0803 18:51:30.638787       1 vmprovider.go:155] vsphere "msg"="Clone VirtualMachine failed" "error"="deploy from content library failed for image \"ob-18900476-photon-3-k8s-v1.21.6---vmware.1-tkg.1.b3d708a\": deploy error: The operation failed due to An error occurred during host configuration." "vmName"="rkatz-testmigrationvm5/gc-lab-control-plane-kxwn2"

E0803 18:51:30.638821       1 virtualmachine_controller.go:660] VirtualMachine "msg"="Provider failed to create VirtualMachine" "error"="deploy from content library failed for image \"ob-18900476-photon-3-k8s-v1.21.6---vmware.1-tkg.1.b3d708a\": deploy error: The operation failed due to An error occurred during host configuration." "name"="rkatz-testmigrationvm5/gc-lab-control-plane-kxwn2"

E0803 18:51:30.638851       1 virtualmachine_controller.go:358] VirtualMachine "msg"="Failed to reconcile VirtualMachine" "error"="deploy from content library failed for image \"ob-18900476-photon-3-k8s-v1.21.6---vmware.1-tkg.1.b3d708a\": deploy error: The operation failed due to An error occurred during host configuration." "name"="rkatz-testmigrationvm5/gc-lab-control-plane-kxwn2"

E0803 18:51:30.639301       1 controller.go:246] controller "msg"="Reconciler error" "error"="deploy from content library failed for image \"ob-18900476-photon-3-k8s-v1.21.6---vmware.1-tkg.1.b3d708a\": deploy error: The operation failed due to An error occurred during host configuration." "controller"="virtualmachine" "name"="gc-lab-control-plane-kxwn2" "namespace"="rkatz-testmigrationvm5" "reconcilerGroup"="vmoperator.xxxx.com" "reconcilerKind"="VirtualMachine"

This could be resolved by restarting the cm-inventory service on all nsx-t manager nodes. Following are the commands to restart cm-inventory service on NSX-T manager nodes:

get service cm-inventory  
restart service cm-inventory

Case 4:

Sometimes in the WCP K8s layer you will notice some stale contentsources object entries. Contentsources are the corresponding objects of content libraries in K8s layer. Due to some reasons/ requirements you might have created multiple content libraries, and you may have delete some of them at later point of time from the vCenter, but they may not be removed properly from the WCP K8s layer and thats how these stale contentsources objects are found. You can use PowerCLI to list the current content libraries present in the VC, compare it with the contentsources and remove the stale entries.

> Get-ContentLibrary | select Name,Id | fl

Name : wdc-01-vc18c01-wcp
Id   : 17209f4b-3f7f-4bcb-aeaf-fd0b53b66d0d

> kg contentsources
NAME                                   AGE
0f00d3fa-de54-4630-bc99-aa13ccbe93db   173d
17209f4b-3f7f-4bcb-aeaf-fd0b53b66d0d   321d
451ce3f3-49d7-47d3-9a04-2839c5e5c662   242d
75e0668c-0cdc-421e-965d-fd736187cc57   173d
818c8700-efa4-416b-b78f-5f22e9555952   173d
9abbd108-aeb3-4b50-b074-9e6c00473b02   173d
a6cd1685-49bf-455f-a316-65bcdefac7cf   173d
acff9a91-0966-4793-9c3a-eb5272b802bd   242d
fcc08a43-1555-4794-a1ae-551753af9c03   173d

In the above sample case you can see multiple contentsource objects, but there is only one content library. So you can delete all the contentsource objects, except 17209f4b-3f7f-4bcb-aeaf-fd0b53b66d0d.

Hope it was useful. Cheers!

Saturday, December 18, 2021

vSphere with Tanzu using NSX-T - Part13 - Export WCP admin kubeconfig

In the previous posts we discussed the following:

Part1 - Prerequisites
Part2 - Configure NSX
Part3 - Edge Cluster
Part4 - Tier-0 Gateway and BGP peering
Part5 - Tier-1 Gateway and Segments
Part6 - Create tags, storage policy, and content library
Part7 - Enable workload management
Part8 - Create namespace and deploy Tanzu Kubernetes Cluster
Part9 - Monitoring
Part10 - Upgrade Tanzu Kubernetes Cluster
Part11 - Troubleshooting TKC
Part12 - Deploy application on TKC and access it

This article shows the steps to export WCP admin kubeconfig file from the supervisor control plane VM. This is the admin kubeconfig file that can be used to manage the Supervisor/ WCP K8s cluster.

Step1: SSH as root to the vCenter server.

Step2: Run the script /usr/lib/vmware-wcp/decryptK8Pwd.py and make a note of the IP and PWD.

Step3: SSH as root to the IP that you noted down from previous step, and then provide the password that you got from step2.

Step4: You can now copy the admin kubeconfig file from /etc/kubernetes/admin.conf file to your local machine. Make sure to modify the field server: https://127.0.0.1:6443 in your local admin.conf file to the IP that you got from step2 (server: https://IP_from_step2:6443).

Note: If you are managing multiple WCP clusters, you can merge all the kubeconfig files. Refer this blog by Jacob Tomlinson for more details.

Hope it was useful. Cheers!

Sunday, May 30, 2021

vSphere with Tanzu using NSX-T - Part8 - Create namespace and deploy Tanzu Kubernetes Cluster

In the previous posts we discussed the following:

vSphere with Tanzu using NSX-T - Part1 - Prerequisites

vSphere with Tanzu using NSX-T - Part2 - Configure NSX

vSphere with Tanzu using NSX-T - Part3 - Edge Cluster

vSphere with Tanzu using NSX-T - Part4 - Tier-0 Gateway and BGP peering

vSphere with Tanzu using NSX-T - Part5 - Tier-1 Gateway and Segments

vSphere with Tanzu using NSX-T - Part6 - Create tags, storage policy, and content library

vSphere with Tanzu using NSX-T - Part7 - Enable workload management

Now that we have enabled workload management, the next step is to create namespaces on the supervisor cluster, set resource quotas as per requirements, and then the vSphere administrator can provide access to developers to these namespaces, and they can either deploy Tanzu Kubernetes clusters or VMs or vSphere pods.

Create namespace.

Select the cluster and provide a name for the namespace.

Now the namespace is created successfully. Before handing over this namespace to the developer, you can set permissions, assign storage policies, and set resource limits.

Let's have a look at the NSX-T components that are instantiated when we created a new namespace.

A new segment is now created for the newly created namespace. This segment is connected to the T1 Gateway of the supervisor cluster.

A SNAT rule is also now in place on the supervisor cluster T1 Gateway. This helps the Kubernetes objects residing in the namespace to reach the external network/ internet. It uses the egress range 192.168.72.0/24 that we provided during the workload management configuration for address translation.

We can now assign a storage policy to this newly created namespace.

Click on Add Storage and select the storage policy. In my case, I am using Tanzu Storage Policy which uses a vsanDatastore.

Let's apply some capacity and usage limits for this namespace. Click edit limits and provide the values.

Let's set user permissions to this newly created namespace. Click add permissions.

Now we are ready to hand over this new namespace to the dev user (John).

Under the first tile, you can see copy link, you can provide this link to the dev user. And he can open it in a web browser to access the CLI tools to connect to the newly created namespace.

Download and install the CLI tools. In my case, CLI tools are installed on a CentOS 7.x VM. You can also see the user John has connected to the newly created namespace using the CLI.

The user can now verify the resource limits of the namespace using kubectl.

You can see the following limits:

cpu-limit: 21.818
memory-limit: 131072Mi
storage: 500Gi

Storage is limited at 500 GB and memory at 128 GB which is very straightforward. We (vSphere admin) had set the CPU limits to 48 GHz. And here what you see is cpu-limit of this namespace is limited to 21.818 CPU cores. Just to give some more background on this calculation, the ESXi host that I am using for this study has 20 physical cores, and the total CPU capacity of a host is 44 GHz. I have 4 such ESXi hosts in the cluster. Now, the computing power of one physical core is (44/ 20) = 2.2 GHz. So, in order to limit the CPU to 48 GHz, the number of cpu core should be limited to (48/ 2.2) = 21.818.

Apply the following cluster definition yaml file to create a Tanzu Kubernetes cluster under the ns-01-dev-john namespace.

apiVersion: run.tanzu.vmware.com/v1alpha1

kind: TanzuKubernetesCluster

metadata:

namespace: ns-01-dev-john

spec:

topology:

controlPlane:

storageClass: tanzu-storage-policy

workers:

storageClass: tanzu-storage-policy

distribution:

version: v1.18.15

settings:

network:

services:

cidrBlocks: ["198.32.1.0/12"]

pods:

cidrBlocks: ["192.1.1.0/16"]

cni:

storage:

defaultClass: tanzu-storage-policy

You can see corresponding VMs in the Center UI.

Now, let's have a look at the NSX-T side.

A Tier-1 Gateway is now available with a segment linked to it.

You can see a server load balancer with one virtual server that provides access to KubeAPI (6443) of the Tanzu Kubernetes cluster that we just deployed.

You can also find a SNAT rule. This helps the Tanzu Kubernetes cluster objects to reach the external network/ internet. It uses the egress range 192.168.72.0/24 that we provided during the workload management configuration for address translation.

Note: This architecture is explained on the basis of vSphere 7 U1. In the newer versions there are changes. With vSphere 7 U1c the architecture changed from a per-TKG cluster Tier 1 Gateway model to a per-Supervisor namespace Tier 1 Gateway model. For more details, feel free to refer the blog series published by Harikrishnan T @hari5611.

In the next part we will discuss monitoring aspects of vSphere with Tanzu environment and Tanzu Kubernetes clusters. I hope this was useful. Cheers!

Sunday, April 18, 2021

vSphere with Tanzu using NSX-T - Part7 - Enable workload management

In the previous posts we discussed the following:

Part1: Prerequisites

Part2: Configure NSX-T

Part3: Edge Cluster

Part4: Tier-0 Gateway and BGP peering

Part5: Tier-1 Gateway and Segments

Part6: Create tags, storage policy, and content library

We are all set to configure and enable workload management. Before stepping into the configurations I just want to give an overall picture of vSphere with Tanzu architecture and different components.

Once you enable workload management, the vSphere cluster will transform to a supervisor cluster. The supervisor cluster consists of 3 supervisor control plane VMs, and the ESXi hosts that act as worker nodes too. Now you can run traditional VMs, and containers side by side. You can run the containers as native vSphere pods directly running on the ESXi hosts, or you can deploy Tanzu Kubernetes clusters in VM form factor on the vSphere namespace and then run container workload on them.

Following are the steps to enable workload management:

Login vCenter - Menu - Workload Management.
Click Get started.
Select NSX-T and click next.

Select the cluster.

Select a size and click next.

Select the storage policy and click next.

Provide management network details and click next.

Provide workload network details and click next.

Add the content library and click next.

Click finish.

This process will take few minutes to configure and bring up the supervisor cluster. In my case, it took around 30 minutes to complete.
You can see the progress in the vCenter UI.

You can now see the supervisor control plane VMs are deployed.

Workload management is now enabled and the vSphere cluster is transformed to a supervisor cluster. Let's have a look at the objects that are automatically created in NSX-T.

You can see a T1 Gateway is now provisioned.

Multiple segments are now created corresponding to each namespace inside the supervisor control plane.

Multiple SNAT rules are also now in place for the newly created T1 Gateway, which helps the control plane Kubernetes objects residing in their corresponding namespaces to reach the external network/ internet. It uses the egress range 192.168.72.0/24 that we provided during the workload management configuration for address translation.

You can also see two load balancers attached to the T1 Gateway:

Distributed Load balancer: All services of type ClusterIP are implemented as distributed load balancer virtual servers. This is for east-west traffic.
Server load balancer: All services of type Loadbalancer are implemented as server load balancer L4 virtual servers. And all ingress is implemented as L7 virtual servers.

Under the server load balancer, you can see two virtual servers. One for the KubeAPI (6443) and the other for downloading the CLI tools (443) to access the cluster.

Note that this newly created T1 Gateway (domain-c8:6ea515f0-39da-431b-93bf-0d6a5e4a0f77) is connected to the T0 Gateway for external connectivity through BGP.

The next step is to create namespaces, and you can then create Tanzu Kubernetes clusters on it. Usually, the vSphere administrator will create namespaces for developers and provide the access so that they can either deploy TKG clusters, vSphere pods, or VMs on the respective namespace. We will cover all these in the next part.

Hope it was useful. Cheers!

Pages

Thursday, July 4, 2024

Monday, July 1, 2024

Saturday, May 20, 2023

Logs

Status

Restart

Certificates

Verify

Saturday, February 4, 2023

Sunday, July 17, 2022

Following is a sample log for this issue from the vmop-controller-manger:

Case 3:

Saturday, December 18, 2021

Sunday, May 30, 2021

Sunday, April 18, 2021