vineethac.blogspot.com: fio

Showing posts with label fio. Show all posts

Friday, September 6, 2024

Revisiting Storage Performance Benchmarking

Few years ago, I had the opportunity to explore the intricacies of storage performance benchmarking using tools like FIO, DISKSPD, and Iometer. Those studies provided valuable insights into the performance characteristics of various storage solutions, shaping my understanding and approach to storage performance analysis. As I prepare for an upcoming project in this domain, I find it essential to revisit my previous work, reflect on the lessons learned, and share my experiences. This blog post aims to provide a comprehensive overview of my benchmarking journey and the evolving landscape of storage performance studies.

Recent advancements

The field of storage technology has seen significant advancements in recent years. The rise of NVMe and storage-class memory technologies has also redefined high-end storage performance, offering unprecedented speed and efficiency. These advancements highlight the dynamic nature of storage performance benchmarking and underscore the importance of staying updated with the latest tools and methodologies.

Challenges

Benchmarking storage performance is not without its challenges. One of the primary difficulties is ensuring a consistent and controlled testing environment, as variations in hardware, software, and network conditions can significantly impact results. Another challenge is the selection of appropriate benchmarks that accurately reflect real-world workloads, which requires a deep understanding of the specific use cases and performance metrics. Additionally, interpreting the results can be complex, as it involves analyzing multiple metrics such as IOPS, throughput, and latency, and understanding their interplay. These challenges necessitate meticulous planning and a thorough understanding of both the benchmarking tools and the storage systems being tested.

Prior works

Following are some of the articles on storage benchmarking that I’ve published in the past:

Custom storage benchmarking framework

While there are numerous storage benchmarking tools available, such as VMFleet and HCIBench, I wanted to highlight a custom framework I developed a few years ago. Here are some reasons why we created this custom tool:

Great learning experience: It provided valuable insights into how things work.
Customization: Being a custom framework, it allows you to add or remove features as needed.
Flexibility: You can modify multiple parameters to suit your requirements.
Custom test profiles: You can create tailored storage test profiles.
No IP assignment needed: There’s no need for IP assignment or DHCP for the stress test VMs.
Centralized log collection: It offers centralized log collection for detailed analysis.

You can access the scripts and readme on my GitHub repository:

https://github.com/vineethac/vsan_cluster_storage_benchmarking_with_diskspd

Here is an overview.

Profile Manifest: All storage test profiles are listed in profile_manifest.psd1. You can define as many profiles as you want.

VM Template: A Windows VM template should be present in the vCenter server.

Benchmarking Manifest: Details of vCenter, cluster name, VM template, number of stress test VMs per host, etc., are provided in benchmarking_manifest.psd1.

Deploy Test VMs: deploy_test_vms.ps1 will deploy all the test VMs with pre-configured parameters.

Start Stress Test: start_stress_test.ps1 will initiate the storage stress test process for all the profiles mentioned in profile_manifest.psd1 one by one.

Log Collection: All log files will be automatically copied to a central location on the host from where these scripts are running.

Cleanup: Use delete_test_vms.ps1 to clean up the stress test VMs from the cluster.

Note: These scripts were created about five years ago, and I haven’t had the opportunity to refactor them according to current best practices and new PowerShell scripting standards. I plan to enhance them in the coming months!

This overview should provide you with a clear understanding of the overall process and workflow involved in the storage benchmarking process. I hope it was useful. Cheers!

Sunday, January 30, 2022

vSphere with Tanzu using NSX-T - Part14 - Testing TKC storage using kubestr

In the previous posts we discussed the following:

Part1 - Prerequisites
Part2 - Configure NSX
Part3 - Edge Cluster
Part4 - Tier-0 Gateway and BGP peering
Part5 - Tier-1 Gateway and Segments
Part6 - Create tags, storage policy, and content library
Part7 - Enable workload management
Part8 - Create namespace and deploy Tanzu Kubernetes Cluster
Part9 - Monitoring
Part10 - Upgrade Tanzu Kubernetes Cluster
Part11 - Troubleshooting TKC
Part12 - Deploy application on TKC and access it
Part13 - Export WCP admin kubeconfig

This article is about using kubestr to test storage options of Tanzu Kubernetes Cluster (TKC). Following are the steps to install kubestr on MAC:

wget https://github.com/kastenhq/kubestr/releases/download/v0.4.31/kubestr_0.4.31_MacOS_amd64.tar.gz
tar -xvf kubestr_0.4.31_MacOS_amd64.tar.gz
chmod +x kubestr
mv kubestr /usr/local/bin

Now, lets do kubestr help.

% kubestr help
kubestr is a tool that will scan your k8s cluster
       and validate that the storage systems in place as well as run
       performance tests.

Usage:
kubestr [flags]
kubestr [command]

Available Commands:
browse      Browse the contents of a CSI PVC via file browser
csicheck    Runs the CSI snapshot restore check
fio         Runs an fio test
help        Help about any command

Flags:
-h, --help             help for kubestr
-e, --outfile string   The file where test results will be written
-o, --output string    Options(json)

Use "kubestr [command] --help" for more information about a command.

I am going to use the following TKC for testing.

% KUBECONFIG=gc.kubeconfig kubectl get nodes
NAME                               STATUS   ROLES                  AGE    VERSION
gc-control-plane-pwngg             Ready    control-plane,master   103d   v1.20.9+vmware.1
gc-workers-wrknn-f675446b6-cz766   Ready    <none>                 103d   v1.20.9+vmware.1
gc-workers-wrknn-f675446b6-f6zqs   Ready    <none>                 103d   v1.20.9+vmware.1
gc-workers-wrknn-f675446b6-rsf6n   Ready    <none>                 103d   v1.20.9+vmware.1

Let's run kubestr against the cluster now.

% KUBECONFIG=gc.kubeconfig kubestr

**************************************
_ ___   _ ___ ___ ___ _____ ___
| |/ / | | | _ ) __/ __|_   _| _ \
| ' <| |_| | _ \ _|\__ \ | | |   /
|_|\_\\___/|___/___|___/ |_| |_|_\

Explore your Kubernetes storage options
**************************************
Kubernetes Version Check:
Valid kubernetes version (v1.20.9+vmware.1) - OK

RBAC Check:
Kubernetes RBAC is enabled - OK

Aggregated Layer Check:
The Kubernetes Aggregated Layer is enabled - OK

W0130 14:17:16.937556   87541 warnings.go:70] storage.k8s.io/v1beta1 CSIDriver is deprecated in v1.19+, unavailable in v1.22+; use storage.k8s.io/v1 CSIDriver
Available Storage Provisioners:

csi.vsphere.xxxx.com:
    Can't find the CSI snapshot group api version.
    This is a CSI driver!
    (The following info may not be up to date. Please check with the provider for more information.)
    Provider:            vSphere
    Website:             https://github.com/kubernetes-sigs/vsphere-csi-driver
    Description:         A Container Storage Interface (CSI) Driver for VMware vSphere
    Additional Features: Raw Block,<br/><br/>Expansion (Block Volume),<br/><br/>Topology Aware (Block Volume)

    Storage Classes:
      * sc2-01-vc16c01-wcp-mgmt

    To perform a FIO test, run-
      ./kubestr fio -s <storage class>

You can run storage tests using kubestr and it uses FIO for generating IOs. For example this is how you can run a basic storage test.

% KUBECONFIG=gc.kubeconfig kubestr fio -s sc2-01-vc16c01-wcp-mgmt -z 10G
PVC created kubestr-fio-pvc-zvdhr
Pod created kubestr-fio-pod-kdbs5
Running FIO test (default-fio) on StorageClass (sc2-01-vc16c01-wcp-mgmt) with a PVC of Size (10G)
Elapsed time- 29.290421119s
FIO test results:

FIO version - fio-3.20
Global options - ioengine=libaio verify=0 direct=1 gtod_reduce=1

JobName: read_iops
blocksize=4K filesize=2G iodepth=64 rw=randread
read:
IOPS=3987.150391 BW(KiB/s)=15965
iops: min=3680 max=4274 avg=3992.034424
bw(KiB/s): min=14720 max=17096 avg=15968.827148

JobName: write_iops
blocksize=4K filesize=2G iodepth=64 rw=randwrite
write:
IOPS=3562.628906 BW(KiB/s)=14267
iops: min=3237 max=3750 avg=3565.896484
bw(KiB/s): min=12950 max=15000 avg=14264.862305

JobName: read_bw
blocksize=128K filesize=2G iodepth=64 rw=randread
read:
IOPS=2988.549316 BW(KiB/s)=383071
iops: min=2756 max=3252 avg=2992.344727
bw(KiB/s): min=352830 max=416256 avg=383056.187500

JobName: write_bw
blocksize=128k filesize=2G iodepth=64 rw=randwrite
write:
IOPS=2754.796143 BW(KiB/s)=353151
iops: min=2480 max=2992 avg=2759.586182
bw(KiB/s): min=317440 max=382976 avg=353242.781250

Disk stats (read/write):
sdd: ios=117160/105647 merge=0/1210 ticks=2100090/2039676 in_queue=4139076, util=99.608589%
- OK

As you can see, a PVC of 10G, a FIO pod will be created, and this will be used for the FIO test. Once the test is complete, the PVC and FIO pod will be deleted automatically.

I hope it was useful. Cheers!

Saturday, November 28, 2020

Storage performance benchmarking of Tanzu Kubernetes Clusters

Benchmarking of IT infrastructure is standard practice and is usually done before putting it into a production environment. It gives you baseline values about different performance aspects of the system/ solution under test. These benchmarking principles are applicable for Kubernetes clusters too. But the test cases and evaluation criteria may slightly vary compared to benchmarking a traditional IT infrastructure.

Following are some of the test considerations:

Performance of PVCs.

Time to provision PVCs.
Read/ Write IOPS and Latency of PVCs.

Pod startup latency.
The time consumed to complete the deployment of different K8s objects.

Statefulset
Deployment etc.

Performance behavior of sample application workloads.
Network performance and connectivity between different K8s nodes.

In this article, I will explain a quick and easy way to benchmark the storage system used by the Kubernetes cluster to provision PVCs for application workloads. I am using FIO to generate storage IOs. You can use the following YAML file to deploy FIO pods as a statefulset. Note that here I am using PowerFlex VVOL datastore as Cloud Native Storage (CNS) for Tanzu K8s clusters and so the storage class "powerflex-storage-policy". This may differ in your case, and you might need to modify it to match the storage class available in your setup.

This YAML file will deploy a statefulset with 15 FIO pods (as per the number of replicas mentioned) and will start the storage IO stress test (8k block size, 70% random reads, 30% random writes, 2 jobs, 16 iodepth) on the attached PVC as and when the pod is started. Total 15 PVCs will be created in this case, and one PVC will get attached to one FIO pod.

Note: If you get an error "forbidden: unable to validate against any pod security policy" after applying the above statefulset, then the pods will not get created. You will need to first create and apply Pod Security Policy (PSP) to the Tanzu Kubernetes Cluster.

Following is an overview of my vSphere with Tanzu setup:

Tanzu K8s control plane nodes/ master VMs: 3

Tanzu K8s worker nodes/ VMs: 15

Contexts, Tanzu K8s cluster nodes, and storage class.

Create a statefulset using the above YAML file.

kubectl apply -f https://gist.githubusercontent.com/vineethac/7c9f6ce2b72868b8832a4404b79ebba2/raw/980f9d6c24c10b1b7b39b20d80c15a9f2ee6c4f1/fio_ss.yaml -n <namespace name>

You can see that it took roughly 6 minutes to deploy 15 FIO pods and corresponding PVCs. The time may vary depending on whether the FIO image is locally available on the nodes, available resources on the nodes, etc.

As and when each pod is created, FIO will automatically start IO stress on it. IOs will be read/ written into the attached PVCs. As I mentioned earlier, I am using a storage class "powerflex-storage-policy" and this is associated with a VVOL datastore backed by a PowerFlex storage pool. In this case, all the PVCs are created in a PowerFlex VVOL datastore.

You can also see multiple volumes in the PowerFlex UI and all those volume names starting with "vasa" are externally managed by the PowerFlex VASA provider. The performance of each volume can be also be monitored using the PowerFlex UI.

If you would like to see the historical performance data, you can use vROps. Dell EMC has recently released a vROps management pack for PowerFlex systems. It is a monitoring and alerting solution that provides extensive visibility into the PowerFlex infrastructure. For monitoring K8s clusters and resources, you can use the vROps management pack for container monitoring.

Note: When the duration mentioned in the FIO test is over, the pods will get restarted and the IO stress will also start. To modify the FIO parameters you can use kubectl edit statefulset fiopod-statefulset-multipod -n fiogit modify required parameters and save it. After saving it the new changes will get applied automatically. Once you are done with the testing, you can delete the statefulset and the corresponding PVCs using kubectl delete command. This method is useful when you want to test something quickly or if you have only less test profiles. If you have many test profiles with varying block sizes, iodepth, etc, then you will need to build a small script or something to automate the process.

Hope it was useful. Cheers!

Monitoring Tanzu Kubernetes cluster using Prometheus and Grafana
Visualize your Kubernetes clusters and workloads using Octant
Tanzu Kubernetes Grid (TKG) on vSphere 6.7 U3 - Part3 - Deploy FIO pod with persistent storage
vSAN performance benchmarking

References

https://volumes.blog/2020/07/09/dell-technologies-powerflex-integration-with-vmware-tanzu-kubernetes-grid-tkg/

https://thenewstack.io/k-bench-a-benchmark-to-measure-kubernetes-control-and-data-plane-performance/

https://rguske.github.io/post/vsphere-7-with-kubernetes-supercharged-helm-harbor-tkg/

Thursday, July 9, 2020

Tanzu Kubernetes Grid (TKG) on vSphere 6.7 U3 - Part3

In this blog, I will explain how to deploy an FIO application pod with persistent storage on your Tanzu Kubernetes workload cluster.

Step 1: Deploy a K8s workload cluster

tkg create cluster <cluster name> --plan=dev

Now the workload K8s cluster is deployed with a Master, LB, and Worker node.

vSAN performance benchmarking

In this article, I will explain briefly on performance benchmarking considerations, factors affecting performance, and some of the best practices. We do performance benchmarking to understand the capabilities and bottlenecks of a system. When I say system it could be a storage system, CPU, GPU, network switch, etc. Now let's consider a VMware vSAN cluster infrastructure. It includes multiple components and each of these contributes to the performance. In this case, the vSAN cluster is the solution under test. We will have to conduct performance benchmarking to understand the storage performance behavior of the cluster. When I say storage behavior it includes the IOPS, latency, and throughput that the cluster can produce under varying loads.

The goal of benchmarking

Identify bottlenecks

Hardware bottleneck
Software bottleneck
Application bottleneck

Compare tradeoffs
Manage expectations
Make decisions

Usually in a real-world scenario, benchmarking will be done once the cluster is deployed/ ready and before starting to host production workload on top of it. As these benchmark values define the performance maximums it will be helpful to decide on when to scale or upgrade the cluster before it hits a bottleneck.

Fundamental factors of vSAN performance

Server hardware

Compatibility as per vSAN HCL

Host

Number of hosts in the cluster
Power settings
CPU - number of cores and frequency

Storage

Hybrid or All-flash
NVMe, SAS, or SATA
Number of disk groups per host
Storage controller configuration
Compatibility of hardware devices as per vSAN HCL

Network

10/ 25/ 40 GbE
MTU
LAG

SPBM policy

FTT (Failures To Tolerate)
FTM (Mirroring/ Erasure coding)
Thin or Thick provision

Security

Encryption
Checksum

Other

Stripe width
Flash read cache reservation
IOPS limit for object

All of the above factors will affect performance. So you should know the benefits and tradeoffs.

Benchmarking methodology

Image credit: VMware

Storage benchmarking tools

IO load generation tools

FIO
Diskspd
IOmeter
HCIBench

Application-specific tools

HammerDB (MSSQL, Oracle)
Jetstress (MS Exchange)
SLOB (Oracle)
DBGen (MSSQL, Oracle)

Best practices

Understand the production performance metrics.
Test what you plan to deploy.
Workload modeling.
Plan for use case testing.
Choose an appropriate size for benchmarking
Choose the right tool.
Pre-allocate blocks while testing.
Test for a longer time duration.
Deploy multiple VMs with multiple VMDKs.

References

Best Practices for HCI Performance Benchmarking (HCI1891BU)

A Guide to vSAN Performance: A VM-Centric Approach (HCI2185BU)

Optimize vSAN performance using vRealize Operations and Reinforcement Learning (HCI1650BU)

vineethac.blogspot.com

Pages