Showing posts with label IOPS. Show all posts
Showing posts with label IOPS. Show all posts

Saturday, April 18, 2020

vSAN performance benchmarking

In this article, I will explain briefly on performance benchmarking considerations, factors affecting performance, and some of the best practices. We do performance benchmarking to understand the capabilities and bottlenecks of a system. When I say system it could be a storage system, CPU, GPU, network switch, etc. Now let's consider a VMware vSAN cluster infrastructure. It includes multiple components and each of these contributes to the performance. In this case, the vSAN cluster is the solution under test. We will have to conduct performance benchmarking to understand the storage performance behavior of the cluster. When I say storage behavior it includes the IOPS, latency, and throughput that the cluster can produce under varying loads.

The goal of benchmarking
  • Identify bottlenecks
    • Hardware bottleneck
    • Software bottleneck
    • Application bottleneck
  • Compare tradeoffs
  • Manage expectations
  • Make decisions

Usually in a real-world scenario, benchmarking will be done once the cluster is deployed/ ready and before starting to host production workload on top of it. As these benchmark values define the performance maximums it will be helpful to decide on when to scale or upgrade the cluster before it hits a bottleneck.

Fundamental factors of vSAN performance

Server hardware
  • Compatibility as per vSAN HCL
Host
  • Number of hosts in the cluster
  • Power settings
  • CPU - number of cores and frequency 
Storage
  • Hybrid or All-flash
  • NVMe, SAS, or SATA
  • Number of disk groups per host
  • Storage controller configuration
  • Compatibility of hardware devices as per vSAN HCL
Network
  • 10/ 25/ 40 GbE
  • MTU 
  • LAG
SPBM policy
  • FTT (Failures To Tolerate)
  • FTM (Mirroring/ Erasure coding)
  • Thin or Thick provision
Security
  • Encryption
  • Checksum
Other
  • Stripe width
  • Flash read cache reservation
  • IOPS limit for object
All of the above factors will affect performance. So you should know the benefits and tradeoffs. 

Benchmarking methodology

Image credit: VMware

Storage benchmarking tools

IO load generation tools
Application-specific tools
  • HammerDB (MSSQL, Oracle)
  • Jetstress (MS Exchange)
  • SLOB (Oracle)
  • DBGen (MSSQL, Oracle)

Best practices

  • Understand the production performance metrics.
  • Test what you plan to deploy.
  • Workload modeling.
  • Plan for use case testing.
  • Choose an appropriate size for benchmarking
  • Choose the right tool.
  • Pre-allocate blocks while testing.
  • Test for a longer time duration.
  • Deploy multiple VMs with multiple VMDKs.

References

Monday, October 7, 2019

VMware PowerCLI 101 - Part5 - Real time storage IOPS and latency

It is very important to monitor and analyze the performance of storage subsystem components as it direcly affects the application performance. In this article, I will briefly explain how to use PowerCLI to get real time storage IOPS and latency of the following: 

              • Virtual disk
              • Datastore
              • Disk/ LUN 
              • Storage adapter
              • Storage path
Connect to vCenter server using:
Connect-VIServer <IP address of vCenter>

To understand the list of all available stats for a specific entity, you can use Get-StatType. For example, to list all real time stats for a virtual machine you can use:
Get-StatType -Entity <VM name> -Realtime | sort

Virtual disk

To get real-time IOPS and latency of all virtual disks of a VM named 'lustre01':
Get-Stat -Entity lustre01 -Realtime -MaxSamples 1 -Stat virtualDisk.numberReadAveraged.average,virtualDisk.numberWriteAveraged.average,virtualDisk.totalReadLatency.average,virtualDisk.totalWriteLatency.average | sort Instance,MetricId | select MetricId, Value, Unit, Instance




Datastore

To get real-time IOPS and latency of a datastore (with Uuid: 5bea72bb-5d72ed6a-1d85-246e96792988) from an ESXi host (IP: 192.168.105.10):
Get-Stat -Entity 192.168.105.10 -Stat datastore.numberReadAveraged.average,datastore.numberWriteAveraged.average,datastore.totalReadLatency.average,datastore.totalWriteLatency.average -Realtime -MaxSamples 1 -Instance 5bea72bb-5d72ed6a-1d85-246e96792988 | Select MetricId, Value, Unit, Instance | Sort-Object MetricId

Note: You can get Uuid of a datastore using (Get-Datastore vol01).ExtensionData.Info.Vmfs.Uuid


Refer my article "Real time VMware datastore performance monitoring using PowerShell" for monitoring the real time performance statistics of multiple shared VMFS datastores which are part of a multi-node VMware ESXi cluster.

Disk/ LUN

To get real-time IOPS and latency of a disk (eui.387de1af35b93f6ff0a9bef000000000): 
Get-Stat -Entity 192.168.105.10 -Disk -Realtime -Instance eui.387de1af35b93f6ff0a9bef000000000 -MaxSamples 1 -Stat disk.numberWriteAveraged.average,disk.numberReadAveraged.average,disk.totalWriteLatency.average,disk.totalReadLatency.average | Select MetricId, Value, Unit, Instance


Storage adapter

To get real-time IOPS and latency of a storage adapter: 
Get-Stat -Entity 192.168.105.10 -Realtime -MaxSamples 1 -Stat storageAdapter.totalReadLatency.average, storageAdapter.totalWriteLatency.average, storageAdapter.numberReadAveraged.average, storageAdapter.numberWriteAveraged.average -Instance vmhba64 | Select-Object MetricId, Value, Unit, Instance | Sort-Object MetricId


Storage Path

To get real-time IOPS and latency of a storage path:
Get-Stat -Entity 192.168.105.10 -Realtime -MaxSamples 1 -Stat storagePath.totalReadLatency.average, storagePath.totalWriteLatency.average, storagePath.numberReadAveraged.average, storagePath.numberWriteAveraged.average -Instance fc.300fb123ba76519c:b436362bae5b217-fc.300fb123ba76519c:b436362bae5b217-eui.387de1af35b93f6ff0a9beec00000001 | Select MetricId,Value,Unit,Instance | Sort-Object MetricId


Wednesday, September 18, 2019

vRealize Operations Manager 7.5 - Part7 - vSAN monitoring and troubleshooting

In this article, I will walk you through how to use vROps for vSAN monitoring and performance troubleshooting. It is always recommended to follow a systematic and established approach to troubleshoot problems. Before we start here is a link to one of my article which explains the scientific method of troubleshooting

Given below are some very useful content from VMware that talks about vSAN performance troubleshooting.

Performance Troubleshooting – Understanding the Different Levels of vSAN Performance Metrics
Performance Troubleshooting – Which vSAN Performance Metrics Should be Looked at First?
Troubleshooting vSAN performance

Performance is all relative and sometimes performance issues can be because of the wrong perception. So it is always good to validate it with actual numbers. Compare with a benchmark value or verify all relevant metrics before and after the issue has been reported. Now assume there is a storage issue in the environment. Given below is a systematic order to approach the problem, identify it correctly, isolate it and finally take necessary steps to resolve it. 

vSAN performance troubleshooting approach
  1. Infrastructure: Perform vSAN cluster health check
  2. Virtual machine level: Is there a storage issue observed at the application level?
  3. Virtual machine level: Is there a storage issue per vmdk level?
    1. Latency (vmdk)
    2. IOPS (vmdk)
  4. Cluster level: Look at operations overview at the cluster level
    1. Latency
    2. IOPS
  5. Host level: Identify the IO type that has a performance issue
    1. Read IO
    2. Write IO
  6. Host level: Collect/ analyze metrics of the storage objects
    1. Storage adapter (vmhba)
    2. Disk groups
    3. Cache disk
    4. Capacity disk 
  7. Host level: Collect/ analyze metrics of the network objects
    1. Physical adapter (vmnic)
    2. vSAN network (vmk)
At this point, you have a clearly defined workflow in identifying and resolving the issue. So let's have a look at the various vROps dashboards that provides you end to end visibility of your stack and helps you easily identify and isolate the issue. If there is a problem or abnormality or unusual performance behavior in your vSAN environment, vROps will notify that with alerts based on various metric values it monitors using its inbuilt intelligence and analytics capabilities. Alert generation is based on symptom and alert definitions and this will finally affect the health, risk or efficiency badge of the respective object. Status of the badges, symptoms, alerts, recommendations, historical performance data and time stamps will be very useful in the process of troubleshooting and quickly finding the actual problem.

Infrastructure: Perform vSAN cluster health check

As a starting point, you can make use of integrated health checks from vCenter to verify your vSAN infrastructure.


To understand in-depth about vSAN health checks refer: https://vxplanet.com/2019/01/30/vsan-health-checks-explained-part-1/

Now to get a high-level overview, let's have a look into the health, risk and efficiency badges of vSAN cluster in vROps. Please refer to this blog article from VMware to get a detailed understanding of badges.

Health badge


Risk badge


Alerts


Virtual machine level: Is there a storage issue observed at the application level?

You can make use of application aware operations feature in vROps 7.5 to get full stack visibility. Given below are the list of applications that can be currently monitored using vROps 7.5.


Reference to application aware monitoring: https://blogs.vmware.com/management/2019/05/application-aware-operations-with-vrealize-operations-7-5.html


If your application is not supported or if application aware monitoring is not configured, then you can go with native application performance counters/ methods to identify whether the application itself is observing/ affected by storage latency, low IOPS, etc.

Virtual machine level: Is there a storage issue per vmdk level?

As a first step, you can use the "Troubleshoot a VM" dashboard to understand and track resource usage of a virtual machine.

Troubleshoot a VM - a

Troubleshoot a VM - b

Select the VM object to get more details. Below screenshot shows metrics related to a virtual disk.


Cluster level: Look at operations overview at the cluster level

vSAN operations overview dashboard


Troubleshooting vSAN dashboard

Troubleshooting vSAN - a

Troubleshooting vSAN - b

Troubleshooting vSAN - c

Host level: Identify the IO type that has a performance issue

Host level storage metrics


Host level: Collect/ analyze metrics of the storage objects

Metrics related to a disk group


Read cache and write buffer metrics of a disk group


Performance metrics of a capacity disk


Host level: Collect/ analyze metrics of the network objects

Metrics related to vmnic (physical NIC) and vSAN vmk


Metrics related to network objects will help to determine whether the performance issue is due to resource contention, network misconfiguration, hardware issue, etc.  


References:





Saturday, October 20, 2018

Real time VMware datastore performance monitoring using PowerShell

I had a scenario where we had to monitor the real time performance statistics of multiple shared VMFS datastores which are part of a multi-node VMware ESXi cluster. So this post is about the short script which uses VMware PowerCLI to get read/ write IOPS and latency details of these shared datastores across all nodes in the cluster. The output format is given below which gets refreshed automatically every few seconds.

Prerequisites:
  • VMware.PowerCLI module should be installed on the node from which you are running the script
  • You can verify using: Get-Module -Name VMware.PowerCLI -ListAvailable
  • If not installed, you can find the latest version from the PSGallery: Find-Module -Name VMware.PowerCLI
  • Install the module: Install-Module -Name VMware.PowerCLI
Note:
  • I am using PowerCLI Version 11.0.0.10380590


Latest version of the project and code available at: github.com/vineethac/datastore_perfmon

Sample screenshot of output:


Hope it was useful. Cheers!

Reference:

Thursday, May 11, 2017

Benchmarking Hyper-Converged Storage Spaces Direct (S2D) Cluster

Finally I managed to write some PowerShell code as I am completely inspired by my new PS geek friends. The scripts can be used to generate load and stress test your S2D as well as traditional Hyper-V 2016 cluster. These are functionally similar to VM Fleet. There are 7 scripts in total.
  1. create_clustered_testvms.ps1 : this script creates virtual machines on the cluster nodes which will be used for stress testing
  2. start_all_testvms.ps1 : start all those clustered VMs that you just created
  3. io_stress_trigger.ps1 : to trigger IO stress on all VMs using diskspd 
  4. rebalance_all_testvms.ps1 : this script is originally from winblog, I just made a small change so that it will use live migration while moving the clustered VMs back to their owner node
  5. watch_iops_live.ps1 : to view read, write and total iops of each CSV disk on the S2D cluster 
  6. stop_all_testvms.ps1 : to shutdown all the clustered VMs
  7. wipeoff_testvms.ps1 : to delete all VMs that you created using the first script
PS Version which I am using is given below.


Now I will explain briefly about how to use these scripts and a few prerequisites. Say, you have a 4 node hyper-converged S2D cluster.


As there are 4 nodes , you should have 4 cluster shared volumes (CSV). Assign one CSV to each cluster node as shown below. This means when you create VMs on NODE-01, it will be placed on CSV Volume AA, for NODE-02 VMs will be placed on Volume BB and so on. 


Volume AA is C:\ClusterStorage\Volume1


Similarly,
Volume BB is C:\ClusterStorage\Volume2
Volume CC is C:\ClusterStorage\Volume3
Volume DD is C:\ClusterStorage\Volume4


Your cluster shared volumes are ready now. Create 2 folders inside Volume1 as shown below.


Copy all the 7 PS scripts to scripts folder. Code of each script is given at the end.


You now need a template VHDX and that needs to be copied to template folder.
Note: It should be named as "template"


This template is nothing but a Windows Server 2016 VM created on a dynamically expanding disk. So you just have to create a VM with dynamically expanding VHDX, install Windows Server 2016 and set local administrator password to "Pass1234". Download diskspd from Microsoft, unzip it and just copy the diskspd.exe to C drive of the VM you just created.


Shutdown the VM. No need to sysprep it. Copy the VHDX disk of the VM to template folder and rename it to "template". The disk will be around 9.5 GB in size. Once the template is copied, you are all set to start. 

Step 1: 
Run create_clustered_testvms.ps1 
This will create clustered testvms on each of the nodes. It will be done in such a way that VMs on NODE-01 will be stored on Volume1, VMs on NODE-02 will be stored on Volume2 and so on 

Step 2:
Run start_all_testvms.ps1 
This will start all the testvms

Step 3:
Wait for a few seconds to ensure all the testvms are booted properly; then run io_stress_trigger.ps1 and provide necessary input parameters 



Step 4:
You can watch IOPS of the cluster using watch_iops_live.ps1


If you would like to live migrate some testvms while the stress test is running you can try it and observe the IO variations. But before running the io_stress_trigger.ps1 again you have to move/ migrate all those testvms back to their preferred owners. This can be done using rebalance_all_testvms.ps1 .  

If any testvms are not running on their preferred owner, then io_stress_trigger.ps1 will fail for those VMs. Here, testvms running on NODE-01 has preferred owner NODE-01, similarly for all other testvms. So you have to make sure all the testvms are running on their preferred owner before starting io stress script.

Use stop_all_testvms.ps1 to shutdown all the clustered testvms that you created on step 1. To delete all the testvms, you can use wipeoff_testvms.ps1 .

NOTE: While running scripts 2,3,6 and 7 please make sure all the testvms are running on their preferred owner. Use rebalance_all_testvms.ps1 to assign all testvms back to its preferred owner! Also please run all these scripts on PowerShell with elevated privileges after directly logging into any of the cluster nodes.

All codes given below. It might not be optimal but I am pretty sure it works! Cheers !
--------------------------------------------------------------------------------------------------------------------------

#BEGIN_create_clustered_testvms.ps1
#Get cluster info
$Cluster_name = (Get-Cluster).name
$Nodes_name = (Get-ClusterNode).name
$Node_count = (Get-ClusterNode).count

#Input VM config
$VM_count = Read-Host "Enter number of VMs/ node"
$Cluster_VM_count = $VM_count*$Node_count
[int64]$RAM = Read-Host "Enter memory for each VM in MB Eg: 4096"
$RAM = 1MB*$RAM
$CPU = Read-Host "Enter CPU for each VM"

#Creds to Enter-PSSession
$pass = convertto-securestring -asplaintext -force -string Pass1234
$cred = new-object -typename system.management.automation.pscredential -argumentlist "administrator", $pass

#Loop for each node in cluster
for($i=1; $i -le $Node_count; $i++){
    $VM_path = "C:\ClusterStorage\Volume$i"
    $Node = $Nodes_name[$i-1]

    #Remote session to each node
    $S1 = New-PSSession -ComputerName $Node -Credential $cred

    #Loop for creating new testvms on each node
    for($j=1; $j -le $VM_count; $j++){
        $VM_name = "testvm-$Node-$j"
        new-vm -name $VM_name -computername $Node -memorystartupbytes $RAM -generation 2 -Path $VM_path
        set-vm -name $VM_name -ProcessorCount $CPU -ComputerName $Node
        New-Item -path $VM_path\$VM_name -name "Virtual Hard Disks" -type directory

        #Copy template disk
        Copy-Item "C:\ClusterStorage\Volume1\template\template.vhdx" -Destination "$VM_path\$VM_name\Virtual Hard Disks" -Verbose
        Add-VMHardDiskDrive -VMName $VM_name -ComputerName $Node -path "$VM_path\$VM_name\Virtual Hard Disks\template.vhdx" -Verbose

        #Create new fixed test disk
        New-VHD -Path "$VM_path\$VM_name\Virtual Hard Disks\test_disk.vhdx" -Fixed -SizeBytes 40GB
        Add-VMHardDiskDrive -VMName $VM_name -ComputerName $Node -path "$VM_path\$VM_name\Virtual Hard Disks\test_disk.vhdx" -Verbose

        Get-VM -ComputerName $Node -VMName $VM_name | Start-VM
        Add-ClusterVirtualMachineRole -VirtualMachine $VM_name
        Set-ClusterOwnerNode -Group $VM_name -owner $Node
        Start-Sleep -S 10

        #Remote session to each testvm on the node to initialize and format test disk (drive D:)
        Invoke-Command -Session $S1 -ScriptBlock {param($VM_name2,$cred2) Invoke-Command -VMName $VM_name2 -Credential $cred2 -ScriptBlock {
            Initialize-Disk -Number 1 -PartitionStyle MBR
            New-Partition -DiskNumber 1 -UseMaximumSize -DriveLetter D
            Get-Volume | where DriveLetter -eq D | Format-Volume -FileSystem NTFS -NewFileSystemLabel Test_disk -confirm:$false
            }} -ArgumentList $VM_name,$cred

        Start-Sleep -S 5
        Get-VM -ComputerName $Node -VMName $VM_name | Stop-VM -Force
        }
    }
#END_create_clustered_testvms.ps1


--------------------------------------------------------------------------------------------------------------------------

#BEGIN_start_all_testvms.ps1
#Get cluster info
$Cluster_name = (Get-Cluster).name
$Nodes_name = (Get-ClusterNode).name
$Node_count = (Get-ClusterNode).count

for($i=1; $i -le $Node_count; $i++){
    $Node = $Nodes_name[$i-1]
    Get-VM -ComputerName $Node -VMName "testvm-$Node*" | Start-VM -AsJob

    }
#END_start_all_testvms.ps1


--------------------------------------------------------------------------------------------------------------------------

#BEGIN_io_stress_trigger.ps1
#Get cluster info
$Cluster_name = (Get-Cluster).name
$Nodes_name = (Get-ClusterNode).name
$Node_count = (Get-ClusterNode).count

#Creds to Enter-PSSession
$pass = convertto-securestring -asplaintext -force -string Pass1234
$cred = new-object -typename system.management.automation.pscredential -argumentlist "administrator", $pass

$time = Read-Host "Enter duration of stress in seconds (Eg: 300)"
$block_size = Read-Host "Enter block size (Eg: 4K)"
$writes = Read-Host "Enter write percentage (Eg: 20)"
$OIO = Read-Host "Enter number of outstanding IOs (Eg: 16)"
$threads = Read-Host "Enter number of threads (Eg: 2)"

#Loop for each node in cluster
for($i=1; $i -le $Node_count; $i++){
    $VM_path = "C:\ClusterStorage\Volume$i"
    $Node = $Nodes_name[$i-1]

    #Remote session to each node
    $S1 = New-PSSession -ComputerName $Node -Credential $cred

    $VM_count = (Get-VM -ComputerName $Node -VMName "testvm-$Node*").Count

    #Loop for creating new testvms on each node
    for($j=1; $j -le $VM_count; $j++){
        $VM_name = "testvm-$Node-$j"

        #Remote session to each testvm
        Invoke-Command -Session $S1 -ScriptBlock {param($VM_name2,$cred2,$time1,$block_size1,$writes1,$OIO1,$threads1) Invoke-Command -VMName $VM_name2 -Credential $cred2 -ScriptBlock {param($time2,$block_size2,$writes2,$OIO2,$threads2)
        C:\diskspd.exe -"b$block_size2" -"d$time2" -"t$threads2" -"o$OIO2" -h -r -"w$writes2" -L -Z500M -c38G D:\io_stress.dat
        } -AsJob -ArgumentList $time1,$block_size1,$writes1,$OIO1,$threads1 } -ArgumentList $VM_name,$cred,$time,$block_size,$writes,$OIO,$threads


        }
    }
#END_io_stress_trigger.ps1


--------------------------------------------------------------------------------------------------------------------------

#BEGIN_rebalance_all_testvms.ps1
$clustergroups = Get-ClusterGroup | Where-Object {$_.IsCoreGroup -eq $false}
 foreach ($cg in $clustergroups)
 {
     $CGName = $cg.Name
     Write-Host "`nWorking on $CGName"
     $CurrentOwner = $cg.OwnerNode.Name
     $POCount = (($cg | Get-ClusterOwnerNode).OwnerNodes).Count
     if ($POCount -eq 0)
     {
         Write-Host "Info: $CGName doesn't have a preferred owner!" -ForegroundColor Magenta
     }
     else
     {
         $PreferredOwner = ($cg | Get-ClusterOwnerNode).Ownernodes[0].Name
         if ($CurrentOwner -ne $PreferredOwner)
         {
             Write-Host "Moving resource to $PreferredOwner, please wait..."
             $cg | Move-ClusterVirtualMachineRole -MigrationType Live -Node $PreferredOwner
         }
         else
         {
             write-host "Resource is already on preferred owner! ($PreferredOwner)"
         }
     }
 }
 Write-Host "`n`nFinished. Current distribution: "

 Get-ClusterGroup | Where-Object {$_.IsCoreGroup -eq $false} 
#END_rebalance_all_testvms.ps1


--------------------------------------------------------------------------------------------------------------------------

#BEGIN_watch_iops_live.ps1
#Get cluster info
$Cluster_name = (Get-Cluster).name
$Nodes_name = (Get-ClusterNode).name

while($true)
{

    [int]$total_IO = 0
    [int]$total_readIO = 0
    [int]$total_writeIO = 0
 
    clear
     
    "{0,-15} {1,-15} {2,-15} {3,-15} {4, -15} {5, -15}" -f "Host", "Total IOPS", "Reads/Sec", "Writes/Sec", "Read Q Length", "Write Q Length"

    for($j=1; $j -le $Nodes_name.count; $j++){

        $Node = $Nodes_name[$j-1]

        $Data = Get-CimInstance -ClassName Win32_PerfFormattedData_CsvFsPerfProvider_ClusterCSVFS -ComputerName $Node | Where Name -like Volume$j

        [int]$T = $Data.ReadsPerSec+$Data.WritesPerSec
     
        "{0,-15} {1,-15} {2,-15} {3,-15} {4,-15} {5, -15}" -f "$Node", "$T", $Data.ReadsPerSec, $Data.WritesPerSec, $Data.CurrentReadQueueLength, $Data.CurrentWriteQueueLength

        $total_IO = $total_IO+$T
        $total_readIO = $total_readIO+$Data.ReadsPerSec
        $total_writeIO = $total_writeIO+$Data.WritesPerSec
        }

    echo `n
    "{0,-15} {1,-15} {2,-15} {3,-15} " -f "Cluster IOPS", "$total_IO", "$total_readIO", "$total_writeIO"

    Start-Sleep -Seconds 3

}
#END_watch_iops_live.ps1

--------------------------------------------------------------------------------------------------------------------------

#BEGIN_stop_all_testvms.ps1
#Get cluster info
$Cluster_name = (Get-Cluster).name
$Nodes_name = (Get-ClusterNode).name
$Node_count = (Get-ClusterNode).count

for($i=1; $i -le $Node_count; $i++){
    $Node = $Nodes_name[$i-1]
    Get-VM -ComputerName $Node -VMName "testvm-$Node*" | Stop-VM -Force -AsJob
    }
#END_stop_all_testvms.ps1

--------------------------------------------------------------------------------------------------------------------------

#BEGIN_wipeoff_testvms.ps1
#Get cluster info
$Cluster_name = (Get-Cluster).name
$Nodes_name = (Get-ClusterNode).name
$Node_count = (Get-ClusterNode).count

#Loop for each node in cluster
for($i=1; $i -le $Node_count; $i++){
    $VM_path = "C:\ClusterStorage\Volume$i"
    $Node = $Nodes_name[$i-1]

    $VM_count = (get-vm -ComputerName $Node -Name "testvm-$Node-*").Count

    #Loop to delete testvm on each node
    for($j=1; $j -le $VM_count; $j++){

        $VM_name = "testvm-$Node-$j"
        $full_path = "$VM_path\$VM_name"

        Get-VM -Computername $Node -VMname $VM_name | stop-vm -force
        Get-ClusterGroup $VM_name | Remove-ClusterGroup -Force -RemoveResources
        Get-VM -Computername $Node -VMname $VM_name | remove-vm -force
        Remove-Item $full_path -Force -Recurse -ErrorAction SilentlyContinue -Verbose
        }

    }
#END_wipeoff_testvms.ps1

--------------------------------------------------------------------------------------------------------------------------



Saturday, March 18, 2017

Real time disk IOPS and latency monitoring using PowerShell

The below code collects disk IOPS and average IO latency of a list of servers and output the real time values.

Note: Here we are monitoring only logical disk E of testvm-01, testvm-02 and testvm-03

while($true)
{

$Servers = "testvm-01","testvm-02","testvm-03"

$R_io = '\LogicalDisk(E:)\Disk Reads/sec'
$W_io = '\LogicalDisk(E:)\Disk Writes/sec'
$Lat = '\LogicalDisk(E:)\Avg. Disk sec/Transfer'

$Reads =  (Get-Counter -Counter $R_io -ComputerName $Servers).CounterSamples.CookedValue
$Writes = (Get-Counter -Counter $W_io -ComputerName $Servers).CounterSamples.CookedValue
$Latency = (Get-Counter -Counter $Lat -ComputerName $Servers).CounterSamples.CookedValue

clear

"{0,-15} {1,-15} {2,-15} {3,-15} {4, -15}" -f "Host", "Reads/Sec", "Writes/Sec", "Total IOPS", "Avg Latency (ms)"

echo `n

for($i=0; $i -le 2; $i+=1){

[int]$R = $Reads[$i]
[int]$W = $Writes[$i]
[int]$T = $R+$W
[int]$L = ($Latency[$i])*1000
[String]$N = $Servers[$i]

"{0,-15} {1,-15} {2,-15} {3,-15} {4,-15}" -f "$N", "$R", "$W", "$T", "$L"

}

}




Tuesday, February 28, 2017

Stress test your storage system using iometer

In this article I will explain briefly about how to use iometer for simulating I/O load. Here in my test a 600GB LUN is provisioned from a Compellent storage array and connected to an ESXI 6.5 server as a data store. I've created a VM with 2 disks (C and E for OS and data) on the 600GB data store. We will be using drive E (40 GB) for the test with NTFS partition having 4KB allocation unit size.

Parameters:

Disk Workers - 2 (that means 2 worker threads will run simultaneously during the test)
Disk Targets - Select drive E on both disk workers
Maximum Disk Size - 0 (entire drive E will be used for the test)
Rest all values - default


Network Targets - leave the settings as it is (as we are not doing network stress)

Access Specifications - it specifies the read/ write block size, percentage of read/ write, random/ sequential access etc.

Max IOPS at - 512 bytes transfer request size, 100% sequential, 100% reads
Max throughput at - 64K transfer request size, 100% sequential, 100% reads

In a real world situation it totally depends on the type of workloads. Here we are considering 4KB blocks, with 90% write, 10% read and 100% random access. These values are close to and are applicable to most virtualization workloads. 


Note: Make sure you add access specifications to all the disk workers

Transfer Request Size - 4KB
Percent Random/ Sequential Distribution - 100% Random
Percent Read/ Write Distribution - 90% Write
Rest all values - default


Once all the above settings are configured, you can click the green flag on the top to start the test. Now when you start it, you can see a test file (iobw.tst) will be created in drive E which will grow to the entire size of the disk (approx. 40GB). This is shown in the screenshots below.

Note: For creating a 40GB test file it takes few minutes

On the results display tab, set update frequency to 1 second to view real time results. You can also use the performance monitor tools to view disk reads and writes to cross check with the values of iometer.

Total I/Os per second = disk reads/ sec + disk writes/ sec


You can also monitor read/ write operations of your data store as shown below. These values should also match the results obtained from iometer and perfmon (as there is only one VM on this data store). In the below graph you can see the data store was almost idle as there was no operations on it. The moment I started iometer, it is creating iobw.tst file which is basically a 40GB write operation on drive E.


4KB reads and writes will happen on this test file (iobw.tst).



There is one more way you can monitor the IOPS value. While the iometer is running open resource monitor and observe disk activity generated by dynamo.exe. Make a note of total bytes. Convert it to Kilobytes and divide it by 4. This gives you the total IOPS which also should be close to values generated by iometer which is shown below.


ESXI performance graph is also shown below.


Final results:

iometer - 3799 IOPS
Perfmon - 372 + 3344 = 3716 IOPS
Disk activity monitor - (15291932 Bytes) 14933KB/ 4KB = 3733 IOPS
ESXI performance monitor - 373 + 3365 = 3738 IOPS

Note: To simulate a complex real world scenario or to benchmark your storage system you can provision multiple LUNs from the storage array, host few virtual machines on those LUNs and run iometer with different access specifications.  

Hope this article was useful to you. Cheers.

Monday, November 14, 2016

Best practices while virtualizing Microsoft SQL servers using Hyper-V

-Limit min and max memory for SQL server
-Use fixed size VHDX
-Split data and log files into separate VHDX disks
-Use multiple SCSI controllers
-Right sizing and not over allocating resources
-Making use of multiple RAID disk groups (for sequential and random access)
-For tier-1 mission critical applications use RAID 10 for data, log files, and tempDB for best performance and availability
-For lower tier SQL workloads when cost is a concern, data and tempDB can be on RAID 5
-Use of SSDs or tiered storage for higher IOPS
-If using VMQ on Hyper-V environment, on the guest OS side you can enable vRSS for processing network load across multiple CPUs
-I prefer using fixed size memory for SQL VMs
-Exclude SQL DB related files  from (*.mdf, *.ldf, *.ndf, *.bak, *.trn) on access antivirus scan
-Disable content indexing on SQL data/ log/ tempDB drives
-Enable lock pages in memory (group policy setting)
-DB IFI (Instant File Initialization)
-Set OS power plan to high performance
-OS performance options - visual effects - adjust for best performance

Saturday, August 8, 2015

How to calculate total IOPS supported by a disk array

IOPS stands for input/ output operations per second. 

Consider a RAID array with 4 disks in RAID 5. Each disk is 4TB 15K SAS drive. We can use the below formula for calculating maximum IOPS supported by the RAID array.

Raw IOPS = Disk Speed IOPS * Number of disks

Functional IOPS = (Raw IOPS * Write % / RAID Penalty) + (RAW IOPS * Read %)

No: of disks = 4 (4TB 15K SAS)
IOPS of a single 15K SAS disk = 175 - 210
RAID penalty = 4 (for RAID 5)
Read - Write ratio = 2:1 (say we have 66 % reads and 33 % writes)

From the above details :

Raw IOPS = 175 * 4 = 700
Functional IOPS = (700 * 0.33 / 4) + (700 * 0.66) = 519.75

Therefore, the maximum IOPS supported by this RAID array  = 520

The above calculation is entirely based on assumption that the read - write ration is 2:1. In real time scenarios it may vary depending on the type of workload. That means workload characterization is also important while calculating maximum IOPS value supported by your RAID array. It also depends on the type of RAID, as penalty will be different for different type of RAID. Disk type (SSD, SAS, SATA etc), disk RPM and number of disks also affects the total IOPS value.

From this we can conclude that, if your current IOPS usage is closer to the maximum IOPS supported, then you have to be very cautious as it may lead to I/O contentions due to heavy workloads causing high latency and performance degradation to your storage server.