Showing posts with label datastore. Show all posts
Showing posts with label datastore. Show all posts

Friday, September 6, 2024

Revisiting Storage Performance Benchmarking

Few years ago, I had the opportunity to explore the intricacies of storage performance benchmarking using tools like FIO, DISKSPD, and Iometer. Those studies provided valuable insights into the performance characteristics of various storage solutions, shaping my understanding and approach to storage performance analysis. As I prepare for an upcoming project in this domain, I find it essential to revisit my previous work, reflect on the lessons learned, and share my experiences. This blog post aims to provide a comprehensive overview of my benchmarking journey and the evolving landscape of storage performance studies.


Recent advancements 

The field of storage technology has seen significant advancements in recent years. The rise of NVMe and storage-class memory technologies has also redefined high-end storage performance, offering unprecedented speed and efficiency. These advancements highlight the dynamic nature of storage performance benchmarking and underscore the importance of staying updated with the latest tools and methodologies.

Challenges

Benchmarking storage performance is not without its challenges. One of the primary difficulties is ensuring a consistent and controlled testing environment, as variations in hardware, software, and network conditions can significantly impact results. Another challenge is the selection of appropriate benchmarks that accurately reflect real-world workloads, which requires a deep understanding of the specific use cases and performance metrics. Additionally, interpreting the results can be complex, as it involves analyzing multiple metrics such as IOPS, throughput, and latency, and understanding their interplay. These challenges necessitate meticulous planning and a thorough understanding of both the benchmarking tools and the storage systems being tested.

Prior works

Following are some of the articles on storage benchmarking that I’ve published in the past:

Custom storage benchmarking framework

While there are numerous storage benchmarking tools available, such as VMFleet and HCIBench, I wanted to highlight a custom framework I developed a few years ago. Here are some reasons why we created this custom tool:

  • Great learning experience: It provided valuable insights into how things work.
  • Customization: Being a custom framework, it allows you to add or remove features as needed.
  • Flexibility: You can modify multiple parameters to suit your requirements.
  • Custom test profiles: You can create tailored storage test profiles.
  • No IP assignment needed: There’s no need for IP assignment or DHCP for the stress test VMs.
  • Centralized log collection: It offers centralized log collection for detailed analysis.


You can access the scripts and readme on my GitHub repository:

https://github.com/vineethac/vsan_cluster_storage_benchmarking_with_diskspd


Here is an overview.

  • Profile Manifest: All storage test profiles are listed in profile_manifest.psd1. You can define as many profiles as you want.
  • VM Template: A Windows VM template should be present in the vCenter server.
  • Benchmarking Manifest: Details of vCenter, cluster name, VM template, number of stress test VMs per host, etc., are provided in benchmarking_manifest.psd1.
  • Deploy Test VMs: deploy_test_vms.ps1 will deploy all the test VMs with pre-configured parameters.
  • Start Stress Test: start_stress_test.ps1 will initiate the storage stress test process for all the profiles mentioned in profile_manifest.psd1 one by one.
  • Log Collection: All log files will be automatically copied to a central location on the host from where these scripts are running.
  • Cleanup: Use delete_test_vms.ps1 to clean up the stress test VMs from the cluster.


Note:
 These scripts were created about five years ago, and I haven’t had the opportunity to refactor them according to current best practices and new PowerShell scripting standards. I plan to enhance them in the coming months!

This overview should provide you with a clear understanding of the overall process and workflow involved in the storage benchmarking process. I hope it was useful. Cheers!

Saturday, April 18, 2020

vSAN performance benchmarking

In this article, I will explain briefly on performance benchmarking considerations, factors affecting performance, and some of the best practices. We do performance benchmarking to understand the capabilities and bottlenecks of a system. When I say system it could be a storage system, CPU, GPU, network switch, etc. Now let's consider a VMware vSAN cluster infrastructure. It includes multiple components and each of these contributes to the performance. In this case, the vSAN cluster is the solution under test. We will have to conduct performance benchmarking to understand the storage performance behavior of the cluster. When I say storage behavior it includes the IOPS, latency, and throughput that the cluster can produce under varying loads.

The goal of benchmarking
  • Identify bottlenecks
    • Hardware bottleneck
    • Software bottleneck
    • Application bottleneck
  • Compare tradeoffs
  • Manage expectations
  • Make decisions

Usually in a real-world scenario, benchmarking will be done once the cluster is deployed/ ready and before starting to host production workload on top of it. As these benchmark values define the performance maximums it will be helpful to decide on when to scale or upgrade the cluster before it hits a bottleneck.

Fundamental factors of vSAN performance

Server hardware
  • Compatibility as per vSAN HCL
Host
  • Number of hosts in the cluster
  • Power settings
  • CPU - number of cores and frequency 
Storage
  • Hybrid or All-flash
  • NVMe, SAS, or SATA
  • Number of disk groups per host
  • Storage controller configuration
  • Compatibility of hardware devices as per vSAN HCL
Network
  • 10/ 25/ 40 GbE
  • MTU 
  • LAG
SPBM policy
  • FTT (Failures To Tolerate)
  • FTM (Mirroring/ Erasure coding)
  • Thin or Thick provision
Security
  • Encryption
  • Checksum
Other
  • Stripe width
  • Flash read cache reservation
  • IOPS limit for object
All of the above factors will affect performance. So you should know the benefits and tradeoffs. 

Benchmarking methodology

Image credit: VMware

Storage benchmarking tools

IO load generation tools
Application-specific tools
  • HammerDB (MSSQL, Oracle)
  • Jetstress (MS Exchange)
  • SLOB (Oracle)
  • DBGen (MSSQL, Oracle)

Best practices

  • Understand the production performance metrics.
  • Test what you plan to deploy.
  • Workload modeling.
  • Plan for use case testing.
  • Choose an appropriate size for benchmarking
  • Choose the right tool.
  • Pre-allocate blocks while testing.
  • Test for a longer time duration.
  • Deploy multiple VMs with multiple VMDKs.

References

Saturday, March 21, 2020

vRealize Operations Manager 8 - Custom views and reports

vROps has a number of inbuilt views and reports templates. In this post, I will explain how to create custom views and reports. 

To create a new view: Dashboards - Views - New view

Provide a name and description for the new view. Here, for example, I will create a view that shows datastore write performance stats.


Select List.


Select Datastore as subject and group it by cluster compute resource.


Drag and drop the selected metrics or properties to include in the view.
In the following screenshot, I selected a property "Is Local" and included in the view.


Added another property "Type" to the view.


Similarly, I added some metrics like Write IOPS, Latency and Throughout to the view.
I selected "Maximum" from transformation to get the peak value of the selected metrics.


In the below screenshot, you can see I selected "Maximum" and "Average" transformation for Write IOPS, Latency and Throughput metrics. You can also select the units as per your requirement. Here, I selected "us" for latency and "MBps" for throughput.

Click Save. 


Similarly, I created three custom views that show:
  • "Datastore read performance stats"
  • "Datastore write performance stats" 
  • "Datastore capacity usage"


To create a new report template: Dashboards - Reports - New template


Provide a name and description for the new report template.


From the views and dashboards, find the three custom datastore views that we created earlier, drag and drop them to the right pane as shown below.


Select PDF and CSV.


Select all the layout options if you like to and click Save.


Now the custom report template is created. You can select it and click Run.


Select vSpehre Hosts and Clusters and then select vSphere World and click ok.


The report will run in the background and will be available to download under the "Generated Reports" tab. You can select it and download the PDF or CSV file.


You can even configure a schedule to generate a report and email it or save it to a location automatically based on your requirements. Hope it was useful. Cheers!

Related posts


Monday, October 7, 2019

VMware PowerCLI 101 - Part5 - Real time storage IOPS and latency

It is very important to monitor and analyze the performance of storage subsystem components as it direcly affects the application performance. In this article, I will briefly explain how to use PowerCLI to get real time storage IOPS and latency of the following: 

              • Virtual disk
              • Datastore
              • Disk/ LUN 
              • Storage adapter
              • Storage path
Connect to vCenter server using:
Connect-VIServer <IP address of vCenter>

To understand the list of all available stats for a specific entity, you can use Get-StatType. For example, to list all real time stats for a virtual machine you can use:
Get-StatType -Entity <VM name> -Realtime | sort

Virtual disk

To get real-time IOPS and latency of all virtual disks of a VM named 'lustre01':
Get-Stat -Entity lustre01 -Realtime -MaxSamples 1 -Stat virtualDisk.numberReadAveraged.average,virtualDisk.numberWriteAveraged.average,virtualDisk.totalReadLatency.average,virtualDisk.totalWriteLatency.average | sort Instance,MetricId | select MetricId, Value, Unit, Instance




Datastore

To get real-time IOPS and latency of a datastore (with Uuid: 5bea72bb-5d72ed6a-1d85-246e96792988) from an ESXi host (IP: 192.168.105.10):
Get-Stat -Entity 192.168.105.10 -Stat datastore.numberReadAveraged.average,datastore.numberWriteAveraged.average,datastore.totalReadLatency.average,datastore.totalWriteLatency.average -Realtime -MaxSamples 1 -Instance 5bea72bb-5d72ed6a-1d85-246e96792988 | Select MetricId, Value, Unit, Instance | Sort-Object MetricId

Note: You can get Uuid of a datastore using (Get-Datastore vol01).ExtensionData.Info.Vmfs.Uuid


Refer my article "Real time VMware datastore performance monitoring using PowerShell" for monitoring the real time performance statistics of multiple shared VMFS datastores which are part of a multi-node VMware ESXi cluster.

Disk/ LUN

To get real-time IOPS and latency of a disk (eui.387de1af35b93f6ff0a9bef000000000): 
Get-Stat -Entity 192.168.105.10 -Disk -Realtime -Instance eui.387de1af35b93f6ff0a9bef000000000 -MaxSamples 1 -Stat disk.numberWriteAveraged.average,disk.numberReadAveraged.average,disk.totalWriteLatency.average,disk.totalReadLatency.average | Select MetricId, Value, Unit, Instance


Storage adapter

To get real-time IOPS and latency of a storage adapter: 
Get-Stat -Entity 192.168.105.10 -Realtime -MaxSamples 1 -Stat storageAdapter.totalReadLatency.average, storageAdapter.totalWriteLatency.average, storageAdapter.numberReadAveraged.average, storageAdapter.numberWriteAveraged.average -Instance vmhba64 | Select-Object MetricId, Value, Unit, Instance | Sort-Object MetricId


Storage Path

To get real-time IOPS and latency of a storage path:
Get-Stat -Entity 192.168.105.10 -Realtime -MaxSamples 1 -Stat storagePath.totalReadLatency.average, storagePath.totalWriteLatency.average, storagePath.numberReadAveraged.average, storagePath.numberWriteAveraged.average -Instance fc.300fb123ba76519c:b436362bae5b217-fc.300fb123ba76519c:b436362bae5b217-eui.387de1af35b93f6ff0a9beec00000001 | Select MetricId,Value,Unit,Instance | Sort-Object MetricId


Wednesday, September 18, 2019

vRealize Operations Manager 7.5 - Part7 - vSAN monitoring and troubleshooting

In this article, I will walk you through how to use vROps for vSAN monitoring and performance troubleshooting. It is always recommended to follow a systematic and established approach to troubleshoot problems. Before we start here is a link to one of my article which explains the scientific method of troubleshooting

Given below are some very useful content from VMware that talks about vSAN performance troubleshooting.

Performance Troubleshooting – Understanding the Different Levels of vSAN Performance Metrics
Performance Troubleshooting – Which vSAN Performance Metrics Should be Looked at First?
Troubleshooting vSAN performance

Performance is all relative and sometimes performance issues can be because of the wrong perception. So it is always good to validate it with actual numbers. Compare with a benchmark value or verify all relevant metrics before and after the issue has been reported. Now assume there is a storage issue in the environment. Given below is a systematic order to approach the problem, identify it correctly, isolate it and finally take necessary steps to resolve it. 

vSAN performance troubleshooting approach
  1. Infrastructure: Perform vSAN cluster health check
  2. Virtual machine level: Is there a storage issue observed at the application level?
  3. Virtual machine level: Is there a storage issue per vmdk level?
    1. Latency (vmdk)
    2. IOPS (vmdk)
  4. Cluster level: Look at operations overview at the cluster level
    1. Latency
    2. IOPS
  5. Host level: Identify the IO type that has a performance issue
    1. Read IO
    2. Write IO
  6. Host level: Collect/ analyze metrics of the storage objects
    1. Storage adapter (vmhba)
    2. Disk groups
    3. Cache disk
    4. Capacity disk 
  7. Host level: Collect/ analyze metrics of the network objects
    1. Physical adapter (vmnic)
    2. vSAN network (vmk)
At this point, you have a clearly defined workflow in identifying and resolving the issue. So let's have a look at the various vROps dashboards that provides you end to end visibility of your stack and helps you easily identify and isolate the issue. If there is a problem or abnormality or unusual performance behavior in your vSAN environment, vROps will notify that with alerts based on various metric values it monitors using its inbuilt intelligence and analytics capabilities. Alert generation is based on symptom and alert definitions and this will finally affect the health, risk or efficiency badge of the respective object. Status of the badges, symptoms, alerts, recommendations, historical performance data and time stamps will be very useful in the process of troubleshooting and quickly finding the actual problem.

Infrastructure: Perform vSAN cluster health check

As a starting point, you can make use of integrated health checks from vCenter to verify your vSAN infrastructure.


To understand in-depth about vSAN health checks refer: https://vxplanet.com/2019/01/30/vsan-health-checks-explained-part-1/

Now to get a high-level overview, let's have a look into the health, risk and efficiency badges of vSAN cluster in vROps. Please refer to this blog article from VMware to get a detailed understanding of badges.

Health badge


Risk badge


Alerts


Virtual machine level: Is there a storage issue observed at the application level?

You can make use of application aware operations feature in vROps 7.5 to get full stack visibility. Given below are the list of applications that can be currently monitored using vROps 7.5.


Reference to application aware monitoring: https://blogs.vmware.com/management/2019/05/application-aware-operations-with-vrealize-operations-7-5.html


If your application is not supported or if application aware monitoring is not configured, then you can go with native application performance counters/ methods to identify whether the application itself is observing/ affected by storage latency, low IOPS, etc.

Virtual machine level: Is there a storage issue per vmdk level?

As a first step, you can use the "Troubleshoot a VM" dashboard to understand and track resource usage of a virtual machine.

Troubleshoot a VM - a

Troubleshoot a VM - b

Select the VM object to get more details. Below screenshot shows metrics related to a virtual disk.


Cluster level: Look at operations overview at the cluster level

vSAN operations overview dashboard


Troubleshooting vSAN dashboard

Troubleshooting vSAN - a

Troubleshooting vSAN - b

Troubleshooting vSAN - c

Host level: Identify the IO type that has a performance issue

Host level storage metrics


Host level: Collect/ analyze metrics of the storage objects

Metrics related to a disk group


Read cache and write buffer metrics of a disk group


Performance metrics of a capacity disk


Host level: Collect/ analyze metrics of the network objects

Metrics related to vmnic (physical NIC) and vSAN vmk


Metrics related to network objects will help to determine whether the performance issue is due to resource contention, network misconfiguration, hardware issue, etc.  


References:





Friday, May 24, 2019

VMware PowerCLI 101 - Part2 - Working with vCenter server

In my previous blog, we discussed how to install PowerCLI module and connect to a stand-alone ESXi host. Now let's start with connecting to the vCenter server.

Connect-VIServer <IP of vCenter server>

Get the list of data centers present under this vCenter server: Get-Datacenter


Get the list of clusters under a datacenter: Get-Datacenter DC01 | Get-Cluster

To verify the status of HA, DRS, and vSAN of a cluster:
(Get-Cluster Cluster01) | select Name,DrsEnabled,DrsAutomationLevel,HAEnabled,HAFailoverLevel,VsanEnabled


Get the list of ESXi hosts part of a specific cluster: 
Get-Datacenter DC01 | Get-Cluster Cluster01 | Get-VMHost


To get the list of VMs hosted on a specific ESXi host:
Get-Datacenter DC01 | Get-Cluster Cluster01 | Get-VMHost 192.168.105.11 | Get-VM


To get a list of powered on and powered off VMs:
(Get-Cluster Cluster01 | Get-VM).where{$PSItem.PowerState -eq "PoweredOn"}
(Get-Cluster Cluster01 | Get-VM).where{$PSItem.PowerState -eq "PoweredOff"}


An efficient way of doing it in a single go is given below:
$vmson, $vmsoff = (Get-Cluster Cluster01 | Get-VM).where({$PSItem.PowerState -eq "PoweredOn"}, 'split')

$vmson will have the list of VMs that are PoweredOn and $vmsoff will have the list of VMs that are PoweredOff.


Cluster level inventory:
Get-Cluster | Get-Inventory

Datastore details of a cluster:
Get-Cluster | Get-Datastore | select Name, FreeSpaceGB, CapacityGB, FileSystemVersion, State


Hope this was useful. Cheers! In the next post, we will talk about performing basic VM operations using PowerCLI.

Friday, April 12, 2019

VMware VVols: Integrating Dell EMC Unity with vSphere environment

This article provides the step by step procedure to configure a Virtual Volume (VVol) based vSphere environment using Dell EMC Unity SAN storage. You can go through my previous post to get an understanding of the differences between VMFS and VVol.

Step1: Register a new storage provider in vCenter

Note: Registering a storage provider exposes all the array capabilities to vCenter through VASA API.

Select: vCenter server -> Configure -> Storage providers -> Register a new storage provider (+)

Register new storage provides in vCenter

After successful registration of storage provider


Step2: Create a VVol datastore on the storage array (Unity)

Create VVol datastore (storage container)

Provide a name

Select capability profiles for the VVol datastore

Note: Each pool has its own characteristics and is associated with a specific capability profile. Adding capability profiles to a VVOL datastore basically adds the corresponding pools to that VVOL construct. In the above figure we have added 3 capability profiles, which means pool_01/02/03 are now part of VVOL_Datastore_01.

Configure access to ESXi hosts

Summary

VVol datastore (storage container) creation completed

The storage container (VVOL datastore) has been created on the array and the next step is to add it to ESXi hosts.

Step3: Add a new datastore in the vSphere environment through vCenter

Add new VVol datastore

Select the VVol datastore and provide a name

VVol datastore creation complete

VVol datastore

Step4: Create VM storage policies

Select VM Storage Policies

Create VM storage policy

Select vCenter and provide a name



Storage type and rule

Select service level

Select the compatible storage


Now the storage policy (Platinum) has been created. Similarly, I have created Silver and Bronze policies which are shown below.

Sample storage policies

Step5: Migrate virtual machines to VVol datastore

Migrate VM01 to VVol datastore

While migrating the VM, you can choose the disk format and the VM storage policy and it will display the compatible VVOL datastores.


Select compatible storage


If you would like to apply storage policies at disk level you can edit the VM storage policies setting of the VM and apply policy as per the requirement as shown below.


Select VM storage policy per VMDK
Hope this was helpful. Cheers!

References: