vineethac.blogspot.com: SPBM

Showing posts with label SPBM. Show all posts

Sunday, May 30, 2021

vSphere with Tanzu using NSX-T - Part8 - Create namespace and deploy Tanzu Kubernetes Cluster

In the previous posts we discussed the following:

vSphere with Tanzu using NSX-T - Part1 - Prerequisites

vSphere with Tanzu using NSX-T - Part2 - Configure NSX

vSphere with Tanzu using NSX-T - Part3 - Edge Cluster

vSphere with Tanzu using NSX-T - Part4 - Tier-0 Gateway and BGP peering

vSphere with Tanzu using NSX-T - Part5 - Tier-1 Gateway and Segments

vSphere with Tanzu using NSX-T - Part6 - Create tags, storage policy, and content library

vSphere with Tanzu using NSX-T - Part7 - Enable workload management

Now that we have enabled workload management, the next step is to create namespaces on the supervisor cluster, set resource quotas as per requirements, and then the vSphere administrator can provide access to developers to these namespaces, and they can either deploy Tanzu Kubernetes clusters or VMs or vSphere pods.

Create namespace.

Select the cluster and provide a name for the namespace.

Now the namespace is created successfully. Before handing over this namespace to the developer, you can set permissions, assign storage policies, and set resource limits.

Let's have a look at the NSX-T components that are instantiated when we created a new namespace.

A new segment is now created for the newly created namespace. This segment is connected to the T1 Gateway of the supervisor cluster.

A SNAT rule is also now in place on the supervisor cluster T1 Gateway. This helps the Kubernetes objects residing in the namespace to reach the external network/ internet. It uses the egress range 192.168.72.0/24 that we provided during the workload management configuration for address translation.

We can now assign a storage policy to this newly created namespace.

Click on Add Storage and select the storage policy. In my case, I am using Tanzu Storage Policy which uses a vsanDatastore.

Let's apply some capacity and usage limits for this namespace. Click edit limits and provide the values.

Let's set user permissions to this newly created namespace. Click add permissions.

Now we are ready to hand over this new namespace to the dev user (John).

Under the first tile, you can see copy link, you can provide this link to the dev user. And he can open it in a web browser to access the CLI tools to connect to the newly created namespace.

Download and install the CLI tools. In my case, CLI tools are installed on a CentOS 7.x VM. You can also see the user John has connected to the newly created namespace using the CLI.

The user can now verify the resource limits of the namespace using kubectl.

You can see the following limits:

cpu-limit: 21.818
memory-limit: 131072Mi
storage: 500Gi

Storage is limited at 500 GB and memory at 128 GB which is very straightforward. We (vSphere admin) had set the CPU limits to 48 GHz. And here what you see is cpu-limit of this namespace is limited to 21.818 CPU cores. Just to give some more background on this calculation, the ESXi host that I am using for this study has 20 physical cores, and the total CPU capacity of a host is 44 GHz. I have 4 such ESXi hosts in the cluster. Now, the computing power of one physical core is (44/ 20) = 2.2 GHz. So, in order to limit the CPU to 48 GHz, the number of cpu core should be limited to (48/ 2.2) = 21.818.

Apply the following cluster definition yaml file to create a Tanzu Kubernetes cluster under the ns-01-dev-john namespace.

apiVersion: run.tanzu.vmware.com/v1alpha1

kind: TanzuKubernetesCluster

metadata:

namespace: ns-01-dev-john

spec:

topology:

controlPlane:

storageClass: tanzu-storage-policy

workers:

storageClass: tanzu-storage-policy

distribution:

version: v1.18.15

settings:

network:

services:

cidrBlocks: ["198.32.1.0/12"]

pods:

cidrBlocks: ["192.1.1.0/16"]

cni:

storage:

defaultClass: tanzu-storage-policy

You can see corresponding VMs in the Center UI.

Now, let's have a look at the NSX-T side.

A Tier-1 Gateway is now available with a segment linked to it.

You can see a server load balancer with one virtual server that provides access to KubeAPI (6443) of the Tanzu Kubernetes cluster that we just deployed.

You can also find a SNAT rule. This helps the Tanzu Kubernetes cluster objects to reach the external network/ internet. It uses the egress range 192.168.72.0/24 that we provided during the workload management configuration for address translation.

Note: This architecture is explained on the basis of vSphere 7 U1. In the newer versions there are changes. With vSphere 7 U1c the architecture changed from a per-TKG cluster Tier 1 Gateway model to a per-Supervisor namespace Tier 1 Gateway model. For more details, feel free to refer the blog series published by Harikrishnan T @hari5611.

In the next part we will discuss monitoring aspects of vSphere with Tanzu environment and Tanzu Kubernetes clusters. I hope this was useful. Cheers!

Saturday, April 24, 2021

vSphere with Tanzu using NSX-T Blog Series

Part1 - Prerequisites
Part2 - Configure NSX
Part3 - Edge Cluster
Part4 - Tier-0 Gateway and BGP peering
Part5 - Tier-1 Gateway and Segments
Part6 - Create tags, storage policy, and content library
Part7 - Enable workload management
Part8 - Create namespace and deploy Tanzu Kubernetes Cluster
Part9 - Monitoring
Part10 - Upgrade Tanzu Kubernetes Cluster
Part11 - Troubleshooting Tanzu Kubernetes Cluster
Part12 - Deploy application on TKC and access it
Part13 - Export WCP admin kubeconfig
Part14 - Testing TKC storage using kubestr
Part15 - Working with etcd on TKC with one control plane
Part16 - Troubleshooting content library related issues
Part17 - Troubleshooting TKC stuck at updating phase
Part18 - Troubleshooting vSphere pods with ProviderFailed status
Part19 - Troubleshooting TKC stuck at creating phase
Part20 - Safely deleting NotReady nodes from a TKC
Part21 - Pointers while upgrading the stack
Part22 - Working with NGINX Ingress Controller
Part23 - Supervisor cluster certificates expiry
Part24 - Kubernetes component certs in TKC
Part25 - Spherelet
Part26 - Jumpbox kubectl plugin to SSH to TKC node
Part27 - nullfinalizer kubectl plugin
Part28 - Create a custom VM Class
Part29 - Logging using Loki stack
Part30 - Troubleshooting inaccesssible TKC with server pool members missing in the LB VS
Part31 - Troubleshooting inaccessible TKC with expired control plane certs
Part32 - Troubleshooting BGP related issues
Part33 - Troubleshooting intermittent connection timeouts to apiserver and workloads
Part34 - CPU and Memory utilization of a supervisor cluster
Part35 - Monitoring supervisor cluster health with Python and vCenter APIs

Sunday, April 18, 2021

vSphere with Tanzu using NSX-T - Part7 - Enable workload management

In the previous posts we discussed the following:

Part1: Prerequisites

Part2: Configure NSX-T

Part3: Edge Cluster

Part4: Tier-0 Gateway and BGP peering

Part5: Tier-1 Gateway and Segments

Part6: Create tags, storage policy, and content library

We are all set to configure and enable workload management. Before stepping into the configurations I just want to give an overall picture of vSphere with Tanzu architecture and different components.

Once you enable workload management, the vSphere cluster will transform to a supervisor cluster. The supervisor cluster consists of 3 supervisor control plane VMs, and the ESXi hosts that act as worker nodes too. Now you can run traditional VMs, and containers side by side. You can run the containers as native vSphere pods directly running on the ESXi hosts, or you can deploy Tanzu Kubernetes clusters in VM form factor on the vSphere namespace and then run container workload on them.

Following are the steps to enable workload management:

Login vCenter - Menu - Workload Management.
Click Get started.
Select NSX-T and click next.

Select the cluster.

Select a size and click next.

Select the storage policy and click next.

Provide management network details and click next.

Provide workload network details and click next.

Add the content library and click next.

Click finish.

This process will take few minutes to configure and bring up the supervisor cluster. In my case, it took around 30 minutes to complete.
You can see the progress in the vCenter UI.

You can now see the supervisor control plane VMs are deployed.

Workload management is now enabled and the vSphere cluster is transformed to a supervisor cluster. Let's have a look at the objects that are automatically created in NSX-T.

You can see a T1 Gateway is now provisioned.

Multiple segments are now created corresponding to each namespace inside the supervisor control plane.

Multiple SNAT rules are also now in place for the newly created T1 Gateway, which helps the control plane Kubernetes objects residing in their corresponding namespaces to reach the external network/ internet. It uses the egress range 192.168.72.0/24 that we provided during the workload management configuration for address translation.

You can also see two load balancers attached to the T1 Gateway:

Distributed Load balancer: All services of type ClusterIP are implemented as distributed load balancer virtual servers. This is for east-west traffic.
Server load balancer: All services of type Loadbalancer are implemented as server load balancer L4 virtual servers. And all ingress is implemented as L7 virtual servers.

Under the server load balancer, you can see two virtual servers. One for the KubeAPI (6443) and the other for downloading the CLI tools (443) to access the cluster.

Note that this newly created T1 Gateway (domain-c8:6ea515f0-39da-431b-93bf-0d6a5e4a0f77) is connected to the T0 Gateway for external connectivity through BGP.

The next step is to create namespaces, and you can then create Tanzu Kubernetes clusters on it. Usually, the vSphere administrator will create namespaces for developers and provide the access so that they can either deploy TKG clusters, vSphere pods, or VMs on the respective namespace. We will cover all these in the next part.

Hope it was useful. Cheers!

Saturday, April 18, 2020

vSAN performance benchmarking

In this article, I will explain briefly on performance benchmarking considerations, factors affecting performance, and some of the best practices. We do performance benchmarking to understand the capabilities and bottlenecks of a system. When I say system it could be a storage system, CPU, GPU, network switch, etc. Now let's consider a VMware vSAN cluster infrastructure. It includes multiple components and each of these contributes to the performance. In this case, the vSAN cluster is the solution under test. We will have to conduct performance benchmarking to understand the storage performance behavior of the cluster. When I say storage behavior it includes the IOPS, latency, and throughput that the cluster can produce under varying loads.

The goal of benchmarking

Identify bottlenecks

Hardware bottleneck
Software bottleneck
Application bottleneck

Compare tradeoffs
Manage expectations
Make decisions

Usually in a real-world scenario, benchmarking will be done once the cluster is deployed/ ready and before starting to host production workload on top of it. As these benchmark values define the performance maximums it will be helpful to decide on when to scale or upgrade the cluster before it hits a bottleneck.

Fundamental factors of vSAN performance

Server hardware

Compatibility as per vSAN HCL

Host

Number of hosts in the cluster
Power settings
CPU - number of cores and frequency

Storage

Hybrid or All-flash
NVMe, SAS, or SATA
Number of disk groups per host
Storage controller configuration
Compatibility of hardware devices as per vSAN HCL

Network

10/ 25/ 40 GbE
MTU
LAG

SPBM policy

FTT (Failures To Tolerate)
FTM (Mirroring/ Erasure coding)
Thin or Thick provision

Security

Encryption
Checksum

Other

Stripe width
Flash read cache reservation
IOPS limit for object

All of the above factors will affect performance. So you should know the benefits and tradeoffs.

Benchmarking methodology

Image credit: VMware

Storage benchmarking tools

IO load generation tools

FIO
Diskspd
IOmeter
HCIBench

Application-specific tools

HammerDB (MSSQL, Oracle)
Jetstress (MS Exchange)
SLOB (Oracle)
DBGen (MSSQL, Oracle)

Best practices

Understand the production performance metrics.
Test what you plan to deploy.
Workload modeling.
Plan for use case testing.
Choose an appropriate size for benchmarking
Choose the right tool.
Pre-allocate blocks while testing.
Test for a longer time duration.
Deploy multiple VMs with multiple VMDKs.

References

Best Practices for HCI Performance Benchmarking (HCI1891BU)

A Guide to vSAN Performance: A VM-Centric Approach (HCI2185BU)

Optimize vSAN performance using vRealize Operations and Reinforcement Learning (HCI1650BU)

Friday, March 15, 2019

VMFS vs VVOL

Let's start with a quick comparison.

VMFS	VVOL
LUN centric approach Pre-provisioning of LUNs Use of multiple datastores for different performance capabilities Management difficulties as a single VM may span across multiple datastores	VM centric vSphere is now aware of array capabilities through VASA provider No pre-provisioning One VVOL datastore can represent the whole array Storage policies can be applied per vmdk level Some of the vSphere operations are offloaded to array using VAAI (full cloning, snapshots) VVOL snapshots are faster (different from traditional snapshots with redo logs)

Note: The explanations below regarding SAN is based on Dell EMC Unity (hybrid array)

VMFS environment

A traditional VMFS datastore setup is given below. Say, you have two VMs. One with 3 virtual disks and the second one with 2 virtual disks. And your array has 4 storage pools with specific capabilities. Based on those capabilities the pools are classified into 4 service levels (Platinum, Gold, Silver and Bronze). In this case you need to provision 4 datastores/ LUNs from the respective pool to meet requirements of the two virtual machines.

VMFS

You can clearly see that the first VM is spanned across 3 datastores and the second VM is spanned across 2 datastores. If your environment has hundreds and thousands of VMs management becomes too complicated. In this case, as the datastores are properly named the vSphere admin can easily identify the service level/ capability of a specific datastore. But if naming convention is not followed, then vSphere admin has to contact the storage admin to know about the capabilities of that specific datastore/ LUN. Again lots of communication needed, making it a complex process!

VVOL environment

Now let’s have a look at the VVOL environment which is shown the figure below. You are having the same array, but instead of provisioning datastores/ LUNs from the respective pools, here we are creating one vvol datastore. The array has 4 storage pools, each having specific capabilities like drive type, RAID level, tiering policy, FAST Cache ON/ OFF etc. and are classified into 4 service levels as Platinum, Gold, Silver, and Bronze. So each pool has its own capability profile. You can provide any name, but here I just gave the same name as the service level of each pool.

Pool_01 -> Service level: Platinum -> Capability profile: Platinum

Pool_02 -> Service level: Silver -> Capability profile: Silver
Pool_03 -> Service level: Bronze -> Capability profile: Bronze
Pool_04 -> Service level: Gold -> Capability profile: Gold

Note: vvol datastores cannot be created without capability profile

Next steps are:

Create vvol datastore in the SAN
Provide a name for the vvol datastore
Add capability profiles that need to be part of the vvol datastore
In this case, all 4 capability profiles are part of the vvol datastore
You can also limit the amount of space that will be used from each capability profile by the vvol datastore
Once it’s done, vvol datastore is created in the SAN

VVOL

At this point, you can go ahead and configure host access to this vvol datastore. Now you have to let your Vsphere environment know about the capabilities of the storage array. That is done by registering the new storage provider on your vCenter server making use of VASA (VMware vStorage API for Storage Awareness) provider. Once registered the vSphere environment will communicate with the VASA provider through the array management interface (OOB network) which forms a control path. Next step is creating a vvol datastore on the ESXI host which you provided access earlier. For each vvol datastore created on the storage array, two protocol endpoints (PEs) will be automatically generated to communicate with an ESXI host forming a data path. If you create another vvol datastore on the array and provide access to same ESXI host, two more PEs will be created. PEs act like a target. And on the ESXI side, if you look at the storage devices, you can see 2 proxy LUNs which connects to the respective PEs.

Now you have to create VM storage policies based on service levels. Here you have 4 service levels, so you have to create 4 storage policies. Assign storage policy per vmdk basis as per requirements.

Eg: Storage_Policy_Gold -> VMDK 02 (second VM) -> it will be placed in Pool_04 automatically

So in this case, instead of having 4 VMFS datastores we needed only 1 VVOL datastore which has all the required capabilities. This means you don't need to provision more LUNs from the SAN. Storage management becomes easy with just one VVOL. There is more granularity with SPBM at each VMDK level. With VVOLs the SAN is aware of all the VMs and its corresponding files hosted on it. This makes space reclamation very easy and straight. The moment a VM or a VMDK is deleted, that space will be immediately made free as SAN is having the complete insight of virtual machines stored in it. Data mobility between different storage pools based on its service level becomes effortless as it is handled directly by the SAN internally based on SPBM. All together VVOL simplifies storage/ LUN management.

Hope it was useful. Cheers!

References: