Showing posts with label vSAN. Show all posts
Showing posts with label vSAN. Show all posts

Friday, September 6, 2024

Revisiting Storage Performance Benchmarking

Few years ago, I had the opportunity to explore the intricacies of storage performance benchmarking using tools like FIO, DISKSPD, and Iometer. Those studies provided valuable insights into the performance characteristics of various storage solutions, shaping my understanding and approach to storage performance analysis. As I prepare for an upcoming project in this domain, I find it essential to revisit my previous work, reflect on the lessons learned, and share my experiences. This blog post aims to provide a comprehensive overview of my benchmarking journey and the evolving landscape of storage performance studies.


Recent advancements 

The field of storage technology has seen significant advancements in recent years. The rise of NVMe and storage-class memory technologies has also redefined high-end storage performance, offering unprecedented speed and efficiency. These advancements highlight the dynamic nature of storage performance benchmarking and underscore the importance of staying updated with the latest tools and methodologies.

Challenges

Benchmarking storage performance is not without its challenges. One of the primary difficulties is ensuring a consistent and controlled testing environment, as variations in hardware, software, and network conditions can significantly impact results. Another challenge is the selection of appropriate benchmarks that accurately reflect real-world workloads, which requires a deep understanding of the specific use cases and performance metrics. Additionally, interpreting the results can be complex, as it involves analyzing multiple metrics such as IOPS, throughput, and latency, and understanding their interplay. These challenges necessitate meticulous planning and a thorough understanding of both the benchmarking tools and the storage systems being tested.

Prior works

Following are some of the articles on storage benchmarking that I’ve published in the past:

Custom storage benchmarking framework

While there are numerous storage benchmarking tools available, such as VMFleet and HCIBench, I wanted to highlight a custom framework I developed a few years ago. Here are some reasons why we created this custom tool:

  • Great learning experience: It provided valuable insights into how things work.
  • Customization: Being a custom framework, it allows you to add or remove features as needed.
  • Flexibility: You can modify multiple parameters to suit your requirements.
  • Custom test profiles: You can create tailored storage test profiles.
  • No IP assignment needed: There’s no need for IP assignment or DHCP for the stress test VMs.
  • Centralized log collection: It offers centralized log collection for detailed analysis.


You can access the scripts and readme on my GitHub repository:

https://github.com/vineethac/vsan_cluster_storage_benchmarking_with_diskspd


Here is an overview.

  • Profile Manifest: All storage test profiles are listed in profile_manifest.psd1. You can define as many profiles as you want.
  • VM Template: A Windows VM template should be present in the vCenter server.
  • Benchmarking Manifest: Details of vCenter, cluster name, VM template, number of stress test VMs per host, etc., are provided in benchmarking_manifest.psd1.
  • Deploy Test VMs: deploy_test_vms.ps1 will deploy all the test VMs with pre-configured parameters.
  • Start Stress Test: start_stress_test.ps1 will initiate the storage stress test process for all the profiles mentioned in profile_manifest.psd1 one by one.
  • Log Collection: All log files will be automatically copied to a central location on the host from where these scripts are running.
  • Cleanup: Use delete_test_vms.ps1 to clean up the stress test VMs from the cluster.


Note:
 These scripts were created about five years ago, and I haven’t had the opportunity to refactor them according to current best practices and new PowerShell scripting standards. I plan to enhance them in the coming months!

This overview should provide you with a clear understanding of the overall process and workflow involved in the storage benchmarking process. I hope it was useful. Cheers!

Sunday, May 30, 2021

vSphere with Tanzu using NSX-T - Part8 - Create namespace and deploy Tanzu Kubernetes Cluster

In the previous posts we discussed the following:

vSphere with Tanzu using NSX-T - Part1 - Prerequisites

vSphere with Tanzu using NSX-T - Part2 - Configure NSX

vSphere with Tanzu using NSX-T - Part3 - Edge Cluster

vSphere with Tanzu using NSX-T - Part4 - Tier-0 Gateway and BGP peering

vSphere with Tanzu using NSX-T - Part5 - Tier-1 Gateway and Segments

vSphere with Tanzu using NSX-T - Part6 - Create tags, storage policy, and content library

vSphere with Tanzu using NSX-T - Part7 - Enable workload management


Now that we have enabled workload management, the next step is to create namespaces on the supervisor cluster, set resource quotas as per requirements, and then the vSphere administrator can provide access to developers to these namespaces, and they can either deploy Tanzu Kubernetes clusters or VMs or vSphere pods. 

  • Create namespace.

  • Select the cluster and provide a name for the namespace.

  • Now the namespace is created successfully. Before handing over this namespace to the developer, you can set permissions, assign storage policies, and set resource limits.

Let's have a look at the NSX-T components that are instantiated when we created a new namespace.
  • A new segment is now created for the newly created namespace. This segment is connected to the T1 Gateway of the supervisor cluster.

  • A SNAT rule is also now in place on the supervisor cluster T1 Gateway. This helps the Kubernetes objects residing in the namespace to reach the external network/ internet. It uses the egress range 192.168.72.0/24 that we provided during the workload management configuration for address translation.

We can now assign a storage policy to this newly created namespace.

  • Click on Add Storage and select the storage policy. In my case, I am using Tanzu Storage Policy which uses a vsanDatastore.

Let's apply some capacity and usage limits for this namespace. Click edit limits and provide the values.


Let's set user permissions to this newly created namespace. Click add permissions.


Now we are ready to hand over this new namespace to the dev user (John).


Under the first tile, you can see copy link, you can provide this link to the dev user. And he can open it in a web browser to access the CLI tools to connect to the newly created namespace.


Download and install the CLI tools. In my case, CLI tools are installed on a CentOS 7.x VM. You can also see the user John has connected to the newly created namespace using the CLI.


The user can now verify the resource limits of the namespace using kubectl.


You can see the following limits:
  • cpu-limit: 21.818
  • memory-limit: 131072Mi
  • storage: 500Gi
Storage is limited at 500 GB and memory at 128 GB which is very straightforward. We (vSphere admin) had set the CPU limits to 48 GHz. And here what you see is cpu-limit of this namespace is limited to 21.818 CPU cores. Just to give some more background on this calculation, the ESXi host that I am using for this study has 20 physical cores, and the total CPU capacity of a host is 44 GHz. I have 4 such ESXi hosts in the cluster. Now, the computing power of one physical core is (44/ 20) = 2.2 GHz. So, in order to limit the CPU to 48 GHz, the number of cpu core should be limited to (48/ 2.2) = 21.818.  

Apply the following cluster definition yaml file to create a Tanzu Kubernetes cluster under the ns-01-dev-john namespace.

apiVersion: run.tanzu.vmware.com/v1alpha1
kind: TanzuKubernetesCluster
metadata:
 name: tkg-cluster-01
 namespace: ns-01-dev-john
spec:
 topology:
   controlPlane:
     count: 3
     class: guaranteed-medium
     storageClass: tanzu-storage-policy
   workers:
     count: 3
     class: guaranteed-xlarge
     storageClass: tanzu-storage-policy
 distribution:
   version: v1.18.15
 settings:
  network:
   services:
    cidrBlocks: ["198.32.1.0/12"]
   pods:
    cidrBlocks: ["192.1.1.0/16"]
   cni:
    name: calico
  storage:
   defaultClass: tanzu-storage-policy


Login to the Tanzu Kubernetes cluster directly using CLI and verify.


You can see corresponding VMs in the Center UI.


Now, let's have a look at the NSX-T side.
  • A Tier-1 Gateway is now available with a segment linked to it.


  • You can see a server load balancer with one virtual server that provides access to KubeAPI (6443) of the Tanzu Kubernetes cluster that we just deployed.


  • You can also find a SNAT rule. This helps the Tanzu Kubernetes cluster objects to reach the external network/ internet. It uses the egress range 192.168.72.0/24 that we provided during the workload management configuration for address translation.

Note: This architecture is explained on the basis of vSphere 7 U1. In the newer versions there are changes. With vSphere 7 U1c the architecture changed from a per-TKG cluster Tier 1 Gateway model to a per-Supervisor namespace Tier 1 Gateway model. For more details, feel free to refer the blog series published by Harikrishnan T @hari5611.

In the next part we will discuss monitoring aspects of vSphere with Tanzu environment and Tanzu Kubernetes clusters. I hope this was useful. Cheers!

Sunday, April 18, 2021

vSphere with Tanzu using NSX-T - Part7 - Enable workload management

In the previous posts we discussed the following: 

Part1: Prerequisites

Part2: Configure NSX-T

Part3: Edge Cluster

Part4: Tier-0 Gateway and BGP peering

Part5: Tier-1 Gateway and Segments

Part6: Create tags, storage policy, and content library


We are all set to configure and enable workload management. Before stepping into the configurations I just want to give an overall picture of vSphere with Tanzu architecture and different components. 


Once you enable workload management, the vSphere cluster will transform to a supervisor cluster. The supervisor cluster consists of 3 supervisor control plane VMs, and the ESXi hosts that act as worker nodes too. Now you can run traditional VMs, and containers side by side. You can run the containers as native vSphere pods directly running on the ESXi hosts, or you can deploy Tanzu Kubernetes clusters in VM form factor on the vSphere namespace and then run container workload on them.

Following are the steps to enable workload management:

  • Login vCenter - Menu - Workload Management.
  • Click Get started.
  • Select NSX-T and click next.

  • Select the cluster.

  • Select a size and click next.

  • Select the storage policy and click next.

  • Provide management network details and click next.

  • Provide workload network details and click next.

  • Add the content library and click next.

  • Click finish.

  • This process will take few minutes to configure and bring up the supervisor cluster. In my case, it took around 30 minutes to complete.
  • You can see the progress in the vCenter UI.



  • You can now see the supervisor control plane VMs are deployed.




Workload management is now enabled and the vSphere cluster is transformed to a supervisor cluster. Let's have a look at the objects that are automatically created in NSX-T.
  • You can see a T1 Gateway is now provisioned.

  • Multiple segments are now created corresponding to each namespace inside the supervisor control plane.

  • Multiple SNAT rules are also now in place for the newly created T1 Gateway, which helps the control plane Kubernetes objects residing in their corresponding namespaces to reach the external network/ internet. It uses the egress range 192.168.72.0/24 that we provided during the workload management configuration for address translation.

  • You can also see two load balancers attached to the T1 Gateway:
    • Distributed Load balancer: All services of type ClusterIP are implemented as distributed load balancer virtual servers. This is for east-west traffic.
    • Server load balancer: All services of type Loadbalancer are implemented as server load balancer L4 virtual servers. And all ingress is implemented as L7 virtual servers.

  • Under the server load balancer, you can see two virtual servers. One for the KubeAPI (6443) and the other for downloading the CLI tools (443) to access the cluster.

Note that this newly created T1 Gateway (domain-c8:6ea515f0-39da-431b-93bf-0d6a5e4a0f77) is connected to the T0 Gateway for external connectivity through BGP.
 
The next step is to create namespaces, and you can then create Tanzu Kubernetes clusters on it. Usually, the vSphere administrator will create namespaces for developers and provide the access so that they can either deploy TKG clusters, vSphere pods, or VMs on the respective namespace. We will cover all these in the next part. 

Hope it was useful. Cheers!

Sunday, February 7, 2021

vSphere with Tanzu using NSX-T - Part4 - Tier-0 Gateway and BGP peering

In the previous posts we discussed the following: 

Part1: Prerequisites
Part2: Configure NSX-T
Part3: Edge Cluster


The next step is to create a Tier-0 Gateway, configure its interfaces, and BGP peer with the L3 TOR switches. Following is a high-level logical representation of this configuration:


Configure Tier-0 Gateway


Before creating the T0-Gateway, we need to create two segments.

  • Add Segments.
    • Create a segment "ls-uplink-v54"
      • VLAN: 54
      • Transport Zone: "edge-vlan-tz"
    • Create a segment "ls-uplink-v55"
      • VLAN: 55
      • Transport Zone: "edge-vlan-tz"

  • Add Tier-0 Gateway.
    • Provide the necessary details as shown below.

    • Add 4 interfaces and configure them as per the logical diagram given above.
      • edge-01-uplink1 - 192.168.54.254/24 - connected via segment ls-uplink-v54
      • edge-01-uplink2 - 192.168.55.254/24 - connected via segment ls-uplink-v55

      • edge-02-uplink1 - 192.168.54.253/24 - connected via segment ls-uplink-v54
      • edge-02-uplink2 - 192.168.55.253/24 - connected via segment ls-uplink-v55
    • Verify the status is showing success for all the 4 interfaces that you added.

  • Routing and multicast settings of T0 are as follows:

    • You can see a static route is configured. The next hop for the default route 0.0.0.0/0 is set to 192.168.54.1. 

    • The next hop configuration is given below.

  • BGP settings of T0 are shown below.

    • BGP Neighbor config:

    • Verify the status is showing success for the two BGP Neighbors that you added.

  • Route re-distribution settings of T0:

    • Add route re-distribution.
    • Set route re-distribution.

Now, the T0 configuration is complete. The next step is to configure BGP on the Dell S4048-ON TOR switches.

Configure TOR Switches


---On TOR A---
conf
router bgp 65500
neighbor 192.168.54.254 remote-as 65400
#peering to T0 edge-01 interface
neighbor 192.168.54.254 no shutdown
neighbor 192.168.54.253 remote-as 65400
#peering to T0 edge-02 interface
neighbor 192.168.54.253 no shutdown
neighbor 192.168.54.3 remote-as 65500
#peering to TOR B in VLAN 54
neighbor 192.168.54.3 no shutdown
maximum-paths ebgp 4
maximum-paths ibgp 4


---On TOR B---
conf
router bgp 65500
neighbor 192.168.55.254 remote-as 65400
#peering to T0 edge-01 interface
neighbor 192.168.55.254 no shutdown
neighbor 192.168.55.253 remote-as 65400
#peering to T0 edge-02 interface
neighbor 192.168.55.253 no shutdown
neighbor 192.168.54.2 remote-as 65500
#peering to TOR A in VLAN 54
neighbor 192.168.54.2 no shutdown
maximum-paths ebgp 4
maximum-paths ibgp 4


---Advertising ESXi mgmt and VM traffic networks in BGP on both TORs---

conf
router bgp 65500
network 192.168.41.0/24
network 192.168.43.0/24


Thanks to my friend and vExpert Harikrishnan @hari5611 for helping me with the T0 configs and BGP peering on TORs. Do check out his blog https://vxplanet.com/


Verify BGP Configurations


The next step is to verify the BGP configs on TORs using the following commands:

show running-config bgp

show ip bgp summary

show ip bgp neighbors


Follow the VMware documentation to verify the BGP connections from a Tier-0 Service Router. In the below screenshot you can see that both Edge nodes have the BGP neighbors 192.168.54.2 and 192.168.55.3 with state Estab.


In the next article, I will talk about adding a T1 Gateway, adding new segments for apps, connecting VMs to the segments, and verify connectivity to different internal and external networks. I hope this was useful. Cheers!