Showing posts with label Hyper-V. Show all posts
Showing posts with label Hyper-V. Show all posts

Thursday, August 1, 2024

A decade of tech - My professional journey so far

Laying the Groundwork

My professional career commenced in February 2014, as a Trainee IT Services Engineer at Alamy Images. During my initial days, I was tasked with daily maintenance activities such as running tape backups, setting up Active Directory user accounts, mailboxes, and desktops for new employees. I also handled general IT support, troubleshooting various user issues within the organization.

After a few months, I had the opportunity to set up a lab infrastructure project using old decommissioned servers as part of a continuous learning initiative. This hands-on experience involved racking, stacking, and cabling physical servers, installing and configuring ESXi and Hyper-V hypervisors, FreeNAS storage servers, and deploying highly available clusters. Additionally, I gained exposure to configuring L2 network switches. This project significantly contributed to building my IT infrastructure foundation.

A year later, I was promoted to Junior IT Services Engineer, where I focused on virtualization projects. I spearheaded the migration of over 20 Dev/ Test/ UAT virtual machines from VMware to a Hyper-V cluster, enhancing system flexibility and cost-efficiency. I deployed a high-availability Hyper-V failover cluster in production and contributed to the planning and execution of a iSCSI storage server migration project.

Beyond virtualization, I worked on network infrastructure by a seamless L2 switch replacement and upgrade project with minimal operational disruption. Furthermore, I assisted in capacity planning initiatives for optimized resource utilization for both physical and virtual environments. These experiences refined my technical skills and problem-solving abilities. During this time, I developed a passion for infrastructure management and optimization, shaping my future career path.

From Junior IT Services Engineer to Storage Solutions Engineer

In January 2017, I transitioned to a Systems Development Engineer role at Dell EMC, specializing in Solutions Engineering. This marked a significant career shift as I immersed myself in the world of storage and virtualization solutions integration/development.

My daily responsibilities encompassed the installation and testing of various components, progressing from integration to validating system reliability and performance at scale. I designed and deployed multiple PowerFlex software-defined storage clusters for customer demos and proof-of-concepts, showcasing the product's performance and auto rebuild capabilities. A notable achievement was automating the storage performance benchmarking using PowerShell, FIO, and ELK stack, reducing process time from weeks to days.

I led the engineering efforts for developing a vROps management pack for PowerFlex, ensuring seamless integration and visibility. Additionally, I mastered vSphere Virtual Volumes (vVols), successfully executing integration projects between Dell storage solutions and VMware environments.

To streamline operations, I created a PowerShell module for managing PowerFlex using REST APIs and developed Ansible playbook for automated deployment of Kubernetes cluster with PowerFlex CSI driver. My expertise extended beyond systems engineering and automation as I authored and published whitepapers on disaster recovery using VMware SRM and hardware lifecycle management with Dell OME.

This period solidified my reputation as a virtualization and storage solutions expert, providing me with a deep understanding of storage architecture, performance optimization, and automation. I developed a passion for building scalable and reliable hyperconverged solutions.

From Storage Solutions Engineer to Site Reliability Engineer

In July 2021, I transitioned to a Site Reliability Engineer (SRE) role at VMware, focusing on ensuring the reliability and scalability of Kubernetes-as-a-Service project based on the vSphere with Tanzu platform.


Managing a vast infrastructure of Kubernetes clusters, I honed my skills in incident response, GitOps pipelines, automation, and monitoring. I played a crucial role in maintaining platform availability, collaborating closely with multiple internal teams and stakeholders to resolve issues and enhance service delivery. My proficiency in Python and PowerShell was instrumental in automating tasks and building custom monitoring solutions. During this time, I prepared diligently, practiced extensively, and successfully qualified for the CKA exam.

Beyond core SRE responsibilities, I explored emerging technologies. I successfully deployed and evaluated open-source language models on Kubernetes using Python, Ollama, and LangChain. In addition, I contributed to developing custom metrics for the Kubernetes-as-a-Service platform using Python, Prometheus, Grafana, and Helm.

This role deepened my expertise and ability to bridge the gap between development and operations, fostering a culture of reliability and efficiency. It has been an exciting journey of learning and growth, positioning me as a versatile IT professional with a strong foundation in both infrastructure and cloud-native technologies.

Gratitude

"This journey has been immensely fulfilling, made possible by the support and encouragement of exceptional organisations, inspiring managers, talented colleagues, friends, and family. I am truly grateful for the opportunities to learn, grow, and contribute meaningfully to driving success and making a positive impact."

The journey continues...

Tuesday, July 18, 2017

Best practices while building a Hyper-V host

This article explains briefly about some of the best practice considerations while building a stand-alone Hyper-V 2012 R2/ 2016 host. You can use any compatible hardware, but here I will be explaining using Dell PowerEdge servers as I am working with them everyday.

  1. Select proper hardware

    You have to be really careful before purchasing a server. Analyze the requirements first and work a bit on capacity planning too. For example, PowerEdge R630 will be a good choice to start with as it is a 1U system with 2 processors. It can have up to 1.5 TB memory, but generally most SMB customers go with somewhere around 128 GB. Choosing the right network controller is also very important as it directly impacts the data transfer performance of the virtual machines. If you are planning to use converged networking on the host, select appropriate adapter. I recommend using one 10G dual port network card at the minimum for a converged network configuration. If you are looking for redundancy at network card level, then you can go with two 10G dual port cards. You can also go forward with multiple dual port or quad port 1G cards as per your requirements in case of budget limitations.  

  2. Number of disks, type of disks and RAID controller

    On a stand alone host, mostly customers will be using the local drives and if they need additional storage it will be provisioned from a SAN. The number of disks and the type of disks (SSD, SAS, NL-SAS etc.) will have direct impact on performance at storage level. Also, it is very important to select the right RAID controller. RAID types supported, size of controller cache, read/ write policies etc. are some of the major parameters that you have to consider while choosing the RAID controller card. Dell uses PERC (PowerEdge RAID Controller). For example, you can select PERC H730P which has 2 GB cache memory and a BBU and supports RAID levels 0,1,5,6,10,50 and 60. H710P also supports 4 KB block size disk drives. For more info please have a look at my article RAID configuration using PERC.

  3. Always use and follow HCL (Hardware Compatibility List)
  4. Configure out-of-band management. Dell uses iDRAC which helps you to manage your server remotely
  5. Update BIOS, firmware and drivers to the latest and greatest version
  6. Make sure you install all the necessary Windows updates
  7. Partition style, file system and AUS (Allocation Unit Size) of the drive where VM files will be saved

    Say, you are going to save all the VM files in drive D. While creating this drive select GPT partition style. If you are running Hyper-V 2012 R2, then use NTFS file system with AUS 64 KB. If you are having Hyper-V 2016, then use ReFS with 4 KB AUS. Please refer this MSFT article for more info.

  8. Always create a NIC team and select right teaming modes

    The most widely used teaming mode is switch independent + dynamic load balancing as it is the least complicated in terms of configuration and has no dependency on your switches. But if your switches support VLT (for Dell) or vPC (for Cisco) technology then the best teaming mode will be LACP + dynamic load balancing which provides you redundancy as well as aggregated throughput of all the active links in the team.

  9. Use separate VLANs for different types of traffic

    It is recommended to use separate VLANs for management, VM traffic, iSCSI, live migration and iDRAC.

  10. If using converged networking assign proper minimum bandwidths. Reference article for QoS recommendations linked here.
  11. Use minimum number of vSwithes
  12. Configure MPIO if adding additional storage from a SAN
  13. Set jumbo frames for iSCSI interfaces 
  14. Disable NetBIOS over TCP/ IP and DNS registration on iSCSI NICs

    Please check my post: Best practice recommendations for iSCSI network adapters

  15. Enable shared nothing live migration. For more info please check my previous post 
  16. If using 10G NICs make use of VMQ

    Please have a look at this MSFT article which provides you tips on VMQ CPU assignment

  17. Add exclusions for VHD/ VHDX files from scanning if antivirus is installed
  18. Run necessary stress tests to benchmark the system

    Benchmarking helps to get an overview of the IOPS numbers in the best/ worst case scenarios, so that you can provision your workload accordingly avoiding IO congestion at storage level. You can use synthetic benchmarking tools like iometer, diskspd etc. for conducting stress tests. Also go through my article:  How to calculate total IOPS supported by a disk array. But please note that these calculations doesn't take in to account on the effect of controller cache. That means the actual IOPS values while benchmarking the system will be higher than that of the values you got from the formula which is because of the effect of cache. When you select a write-back policy, all write IOs will land directly on the controller cache and will be acknowledged. Later those will be flushed to the disk array. Larger the cache size higher the IO performance. This shows the importance of choosing the right RAID controller.

  19. Choose OS power plan High Performance
  20. Make sure PSRemoting is enabled
  21. Enable proper monitoring either using a monitoring tool or custom scripts
  22. As a DR plan you can consider using Hyper-V replica
  23. Enable RDP
  24. I strongly recommend to create a diagram to visualize connectivity of the host to your network
  25. Organize VM files and folders properly as shown below
Here all VMs are stored in E:\VM folder

Virtual hard disks, VM config files, Snapshot files etc. of each VM is organized in a proper folder structure

I hope this will be helpful if you are totally new to Hyper-V and please feel free to let me know if you have any other best practice suggestions which I missed to mention here. Cheers!

Friday, June 30, 2017

Some of the coolest features/ enhancements in Hyper-V 2016

VM compute resiliency: This will help providing resiliency to transient issues like a temporary disconnection of a cluster node due to some network issues or if the cluster service itself on the node crashes etc. The VMs will still continue working "Unmonitored" even if the node falls out of cluster membership into an isolated state. Here the unmonitored state of the VM implies that it is no longer monitored by cluster service. The default resiliency period is 4 minutes. This means the Unmonitored VMs will be allowed to run on that isolated node for 4 minutes and after that VMs will be failed over to a suitable node/ nodes in the cluster. And that particular node which is isolated is moved to a down state. The cluster service itself is now not a necessary dependency for a VM to run. As long as connectivity exists the VM will continue working.

Node quarantine: If a cluster node is isolated certain number of times (default is 3) within an hour it will be moved to quarantine state and the VMs running on it (if any) will be failed over to another suitable node/ nodes in the cluster. 

Event 1 - cluster service stopped on node A - node A isolated (down) - cluster service restarted - node A online
Event 2 - cluster service stopped on node A - node A isolated (down) - cluster service restarted - node A online
Event 3 - cluster service stopped on node A - node A down - node A quarantined

The node will be quarantined for a period of 2 hours by default. But the administrator can manually start the cluster service on that node to join it back to the cluster.

VM storage resiliency: If there is a storage interruption, the VM identifies it and it will pause all the IO's for a certain duration and once the storage is available all IO operations will be resumed. This is very helpful in case of transient storage issues, saving the VM from blue screening or crashing. If the storage path is not back online after a certain period of time, it will pause the VM. Once storage comes back it auto resumes.

VM memory run time resize: You can now increase/ decrease RAM of a running VM.

Hot add/ remove VM network adapters: VM network adapters can also be added or removed on the fly.

Cluster OS rolling upgrades: With this feature you can upgrade your Hyper-V 2012 R2 cluster to Hyper-V 2016 cluster without shutting down the cluster. You can upgrade your existing cluster in 2 ways. Either you can add new 2016 nodes to the 2012 R2 cluster, migrate workload to new 2016 nodes and evict old nodes. Or you can evict one of the existing 2012 R2 node, do a clean installation of 2016, add it back to the cluster and do the same for rest of the nodes. Once all the nodes are 2016, you can update cluster functional level to 2016.

Thursday, May 11, 2017

Benchmarking Hyper-Converged Storage Spaces Direct (S2D) Cluster

Finally I managed to write some PowerShell code as I am completely inspired by my new PS geek friends. The scripts can be used to generate load and stress test your S2D as well as traditional Hyper-V 2016 cluster. These are functionally similar to VM Fleet. There are 7 scripts in total.
  1. create_clustered_testvms.ps1 : this script creates virtual machines on the cluster nodes which will be used for stress testing
  2. start_all_testvms.ps1 : start all those clustered VMs that you just created
  3. io_stress_trigger.ps1 : to trigger IO stress on all VMs using diskspd 
  4. rebalance_all_testvms.ps1 : this script is originally from winblog, I just made a small change so that it will use live migration while moving the clustered VMs back to their owner node
  5. watch_iops_live.ps1 : to view read, write and total iops of each CSV disk on the S2D cluster 
  6. stop_all_testvms.ps1 : to shutdown all the clustered VMs
  7. wipeoff_testvms.ps1 : to delete all VMs that you created using the first script
PS Version which I am using is given below.


Now I will explain briefly about how to use these scripts and a few prerequisites. Say, you have a 4 node hyper-converged S2D cluster.


As there are 4 nodes , you should have 4 cluster shared volumes (CSV). Assign one CSV to each cluster node as shown below. This means when you create VMs on NODE-01, it will be placed on CSV Volume AA, for NODE-02 VMs will be placed on Volume BB and so on. 


Volume AA is C:\ClusterStorage\Volume1


Similarly,
Volume BB is C:\ClusterStorage\Volume2
Volume CC is C:\ClusterStorage\Volume3
Volume DD is C:\ClusterStorage\Volume4


Your cluster shared volumes are ready now. Create 2 folders inside Volume1 as shown below.


Copy all the 7 PS scripts to scripts folder. Code of each script is given at the end.


You now need a template VHDX and that needs to be copied to template folder.
Note: It should be named as "template"


This template is nothing but a Windows Server 2016 VM created on a dynamically expanding disk. So you just have to create a VM with dynamically expanding VHDX, install Windows Server 2016 and set local administrator password to "Pass1234". Download diskspd from Microsoft, unzip it and just copy the diskspd.exe to C drive of the VM you just created.


Shutdown the VM. No need to sysprep it. Copy the VHDX disk of the VM to template folder and rename it to "template". The disk will be around 9.5 GB in size. Once the template is copied, you are all set to start. 

Step 1: 
Run create_clustered_testvms.ps1 
This will create clustered testvms on each of the nodes. It will be done in such a way that VMs on NODE-01 will be stored on Volume1, VMs on NODE-02 will be stored on Volume2 and so on 

Step 2:
Run start_all_testvms.ps1 
This will start all the testvms

Step 3:
Wait for a few seconds to ensure all the testvms are booted properly; then run io_stress_trigger.ps1 and provide necessary input parameters 



Step 4:
You can watch IOPS of the cluster using watch_iops_live.ps1


If you would like to live migrate some testvms while the stress test is running you can try it and observe the IO variations. But before running the io_stress_trigger.ps1 again you have to move/ migrate all those testvms back to their preferred owners. This can be done using rebalance_all_testvms.ps1 .  

If any testvms are not running on their preferred owner, then io_stress_trigger.ps1 will fail for those VMs. Here, testvms running on NODE-01 has preferred owner NODE-01, similarly for all other testvms. So you have to make sure all the testvms are running on their preferred owner before starting io stress script.

Use stop_all_testvms.ps1 to shutdown all the clustered testvms that you created on step 1. To delete all the testvms, you can use wipeoff_testvms.ps1 .

NOTE: While running scripts 2,3,6 and 7 please make sure all the testvms are running on their preferred owner. Use rebalance_all_testvms.ps1 to assign all testvms back to its preferred owner! Also please run all these scripts on PowerShell with elevated privileges after directly logging into any of the cluster nodes.

All codes given below. It might not be optimal but I am pretty sure it works! Cheers !
--------------------------------------------------------------------------------------------------------------------------

#BEGIN_create_clustered_testvms.ps1
#Get cluster info
$Cluster_name = (Get-Cluster).name
$Nodes_name = (Get-ClusterNode).name
$Node_count = (Get-ClusterNode).count

#Input VM config
$VM_count = Read-Host "Enter number of VMs/ node"
$Cluster_VM_count = $VM_count*$Node_count
[int64]$RAM = Read-Host "Enter memory for each VM in MB Eg: 4096"
$RAM = 1MB*$RAM
$CPU = Read-Host "Enter CPU for each VM"

#Creds to Enter-PSSession
$pass = convertto-securestring -asplaintext -force -string Pass1234
$cred = new-object -typename system.management.automation.pscredential -argumentlist "administrator", $pass

#Loop for each node in cluster
for($i=1; $i -le $Node_count; $i++){
    $VM_path = "C:\ClusterStorage\Volume$i"
    $Node = $Nodes_name[$i-1]

    #Remote session to each node
    $S1 = New-PSSession -ComputerName $Node -Credential $cred

    #Loop for creating new testvms on each node
    for($j=1; $j -le $VM_count; $j++){
        $VM_name = "testvm-$Node-$j"
        new-vm -name $VM_name -computername $Node -memorystartupbytes $RAM -generation 2 -Path $VM_path
        set-vm -name $VM_name -ProcessorCount $CPU -ComputerName $Node
        New-Item -path $VM_path\$VM_name -name "Virtual Hard Disks" -type directory

        #Copy template disk
        Copy-Item "C:\ClusterStorage\Volume1\template\template.vhdx" -Destination "$VM_path\$VM_name\Virtual Hard Disks" -Verbose
        Add-VMHardDiskDrive -VMName $VM_name -ComputerName $Node -path "$VM_path\$VM_name\Virtual Hard Disks\template.vhdx" -Verbose

        #Create new fixed test disk
        New-VHD -Path "$VM_path\$VM_name\Virtual Hard Disks\test_disk.vhdx" -Fixed -SizeBytes 40GB
        Add-VMHardDiskDrive -VMName $VM_name -ComputerName $Node -path "$VM_path\$VM_name\Virtual Hard Disks\test_disk.vhdx" -Verbose

        Get-VM -ComputerName $Node -VMName $VM_name | Start-VM
        Add-ClusterVirtualMachineRole -VirtualMachine $VM_name
        Set-ClusterOwnerNode -Group $VM_name -owner $Node
        Start-Sleep -S 10

        #Remote session to each testvm on the node to initialize and format test disk (drive D:)
        Invoke-Command -Session $S1 -ScriptBlock {param($VM_name2,$cred2) Invoke-Command -VMName $VM_name2 -Credential $cred2 -ScriptBlock {
            Initialize-Disk -Number 1 -PartitionStyle MBR
            New-Partition -DiskNumber 1 -UseMaximumSize -DriveLetter D
            Get-Volume | where DriveLetter -eq D | Format-Volume -FileSystem NTFS -NewFileSystemLabel Test_disk -confirm:$false
            }} -ArgumentList $VM_name,$cred

        Start-Sleep -S 5
        Get-VM -ComputerName $Node -VMName $VM_name | Stop-VM -Force
        }
    }
#END_create_clustered_testvms.ps1


--------------------------------------------------------------------------------------------------------------------------

#BEGIN_start_all_testvms.ps1
#Get cluster info
$Cluster_name = (Get-Cluster).name
$Nodes_name = (Get-ClusterNode).name
$Node_count = (Get-ClusterNode).count

for($i=1; $i -le $Node_count; $i++){
    $Node = $Nodes_name[$i-1]
    Get-VM -ComputerName $Node -VMName "testvm-$Node*" | Start-VM -AsJob

    }
#END_start_all_testvms.ps1


--------------------------------------------------------------------------------------------------------------------------

#BEGIN_io_stress_trigger.ps1
#Get cluster info
$Cluster_name = (Get-Cluster).name
$Nodes_name = (Get-ClusterNode).name
$Node_count = (Get-ClusterNode).count

#Creds to Enter-PSSession
$pass = convertto-securestring -asplaintext -force -string Pass1234
$cred = new-object -typename system.management.automation.pscredential -argumentlist "administrator", $pass

$time = Read-Host "Enter duration of stress in seconds (Eg: 300)"
$block_size = Read-Host "Enter block size (Eg: 4K)"
$writes = Read-Host "Enter write percentage (Eg: 20)"
$OIO = Read-Host "Enter number of outstanding IOs (Eg: 16)"
$threads = Read-Host "Enter number of threads (Eg: 2)"

#Loop for each node in cluster
for($i=1; $i -le $Node_count; $i++){
    $VM_path = "C:\ClusterStorage\Volume$i"
    $Node = $Nodes_name[$i-1]

    #Remote session to each node
    $S1 = New-PSSession -ComputerName $Node -Credential $cred

    $VM_count = (Get-VM -ComputerName $Node -VMName "testvm-$Node*").Count

    #Loop for creating new testvms on each node
    for($j=1; $j -le $VM_count; $j++){
        $VM_name = "testvm-$Node-$j"

        #Remote session to each testvm
        Invoke-Command -Session $S1 -ScriptBlock {param($VM_name2,$cred2,$time1,$block_size1,$writes1,$OIO1,$threads1) Invoke-Command -VMName $VM_name2 -Credential $cred2 -ScriptBlock {param($time2,$block_size2,$writes2,$OIO2,$threads2)
        C:\diskspd.exe -"b$block_size2" -"d$time2" -"t$threads2" -"o$OIO2" -h -r -"w$writes2" -L -Z500M -c38G D:\io_stress.dat
        } -AsJob -ArgumentList $time1,$block_size1,$writes1,$OIO1,$threads1 } -ArgumentList $VM_name,$cred,$time,$block_size,$writes,$OIO,$threads


        }
    }
#END_io_stress_trigger.ps1


--------------------------------------------------------------------------------------------------------------------------

#BEGIN_rebalance_all_testvms.ps1
$clustergroups = Get-ClusterGroup | Where-Object {$_.IsCoreGroup -eq $false}
 foreach ($cg in $clustergroups)
 {
     $CGName = $cg.Name
     Write-Host "`nWorking on $CGName"
     $CurrentOwner = $cg.OwnerNode.Name
     $POCount = (($cg | Get-ClusterOwnerNode).OwnerNodes).Count
     if ($POCount -eq 0)
     {
         Write-Host "Info: $CGName doesn't have a preferred owner!" -ForegroundColor Magenta
     }
     else
     {
         $PreferredOwner = ($cg | Get-ClusterOwnerNode).Ownernodes[0].Name
         if ($CurrentOwner -ne $PreferredOwner)
         {
             Write-Host "Moving resource to $PreferredOwner, please wait..."
             $cg | Move-ClusterVirtualMachineRole -MigrationType Live -Node $PreferredOwner
         }
         else
         {
             write-host "Resource is already on preferred owner! ($PreferredOwner)"
         }
     }
 }
 Write-Host "`n`nFinished. Current distribution: "

 Get-ClusterGroup | Where-Object {$_.IsCoreGroup -eq $false} 
#END_rebalance_all_testvms.ps1


--------------------------------------------------------------------------------------------------------------------------

#BEGIN_watch_iops_live.ps1
#Get cluster info
$Cluster_name = (Get-Cluster).name
$Nodes_name = (Get-ClusterNode).name

while($true)
{

    [int]$total_IO = 0
    [int]$total_readIO = 0
    [int]$total_writeIO = 0
 
    clear
     
    "{0,-15} {1,-15} {2,-15} {3,-15} {4, -15} {5, -15}" -f "Host", "Total IOPS", "Reads/Sec", "Writes/Sec", "Read Q Length", "Write Q Length"

    for($j=1; $j -le $Nodes_name.count; $j++){

        $Node = $Nodes_name[$j-1]

        $Data = Get-CimInstance -ClassName Win32_PerfFormattedData_CsvFsPerfProvider_ClusterCSVFS -ComputerName $Node | Where Name -like Volume$j

        [int]$T = $Data.ReadsPerSec+$Data.WritesPerSec
     
        "{0,-15} {1,-15} {2,-15} {3,-15} {4,-15} {5, -15}" -f "$Node", "$T", $Data.ReadsPerSec, $Data.WritesPerSec, $Data.CurrentReadQueueLength, $Data.CurrentWriteQueueLength

        $total_IO = $total_IO+$T
        $total_readIO = $total_readIO+$Data.ReadsPerSec
        $total_writeIO = $total_writeIO+$Data.WritesPerSec
        }

    echo `n
    "{0,-15} {1,-15} {2,-15} {3,-15} " -f "Cluster IOPS", "$total_IO", "$total_readIO", "$total_writeIO"

    Start-Sleep -Seconds 3

}
#END_watch_iops_live.ps1

--------------------------------------------------------------------------------------------------------------------------

#BEGIN_stop_all_testvms.ps1
#Get cluster info
$Cluster_name = (Get-Cluster).name
$Nodes_name = (Get-ClusterNode).name
$Node_count = (Get-ClusterNode).count

for($i=1; $i -le $Node_count; $i++){
    $Node = $Nodes_name[$i-1]
    Get-VM -ComputerName $Node -VMName "testvm-$Node*" | Stop-VM -Force -AsJob
    }
#END_stop_all_testvms.ps1

--------------------------------------------------------------------------------------------------------------------------

#BEGIN_wipeoff_testvms.ps1
#Get cluster info
$Cluster_name = (Get-Cluster).name
$Nodes_name = (Get-ClusterNode).name
$Node_count = (Get-ClusterNode).count

#Loop for each node in cluster
for($i=1; $i -le $Node_count; $i++){
    $VM_path = "C:\ClusterStorage\Volume$i"
    $Node = $Nodes_name[$i-1]

    $VM_count = (get-vm -ComputerName $Node -Name "testvm-$Node-*").Count

    #Loop to delete testvm on each node
    for($j=1; $j -le $VM_count; $j++){

        $VM_name = "testvm-$Node-$j"
        $full_path = "$VM_path\$VM_name"

        Get-VM -Computername $Node -VMname $VM_name | stop-vm -force
        Get-ClusterGroup $VM_name | Remove-ClusterGroup -Force -RemoveResources
        Get-VM -Computername $Node -VMname $VM_name | remove-vm -force
        Remove-Item $full_path -Force -Recurse -ErrorAction SilentlyContinue -Verbose
        }

    }
#END_wipeoff_testvms.ps1

--------------------------------------------------------------------------------------------------------------------------



Monday, November 14, 2016

Best practices while virtualizing Microsoft SQL servers using Hyper-V

-Limit min and max memory for SQL server
-Use fixed size VHDX
-Split data and log files into separate VHDX disks
-Use multiple SCSI controllers
-Right sizing and not over allocating resources
-Making use of multiple RAID disk groups (for sequential and random access)
-For tier-1 mission critical applications use RAID 10 for data, log files, and tempDB for best performance and availability
-For lower tier SQL workloads when cost is a concern, data and tempDB can be on RAID 5
-Use of SSDs or tiered storage for higher IOPS
-If using VMQ on Hyper-V environment, on the guest OS side you can enable vRSS for processing network load across multiple CPUs
-I prefer using fixed size memory for SQL VMs
-Exclude SQL DB related files  from (*.mdf, *.ldf, *.ndf, *.bak, *.trn) on access antivirus scan
-Disable content indexing on SQL data/ log/ tempDB drives
-Enable lock pages in memory (group policy setting)
-DB IFI (Instant File Initialization)
-Set OS power plan to high performance
-OS performance options - visual effects - adjust for best performance

Wednesday, October 26, 2016

3 Node Hyper-V 2012 R2 Cluster Design

Below diagram shows a traditional 3 tier highly available cluster with 3 Hyper-V nodes and 2 shared storage nodes (storage in Active-Passive/ Active-Active mode) all connected via network switches.

Compute

*Hyper-V nodes are running on DELL PowerEdge R630 rack servers

Networking

*Shared storage is accessed via MPIO over two separate VLANs (61 and 62)
*VM traffic is over NIC teaming (Switch independent and dynamic)
*Live migration/ cluster network is also teamed together

Storage

*We are using DELL PowerEdge R320 with Open-E DSS V7 as storage nodes
*Each node has 8 x 10K SAS drives with RAID 5
*Storage can be either in Active-Passive/ Active-Active cluster mode
*In Active-Passive mode, only one of the storage nodes will be active. That means resources of only one storage node will be utilized at a time
*In Active-Active mode, both servers will be active and serving storage traffic 


Tuesday, August 9, 2016

Hyper-V VM deployment using powershell and VHDX templates

Following powershell script can be used to deploy virtual machines on a Hyper-V host.

CODE:

#Start
#VM name
[string]$vmname = Read-Host "Name of VM"
$vmcheck = Get-VM -name $vmname

#To check for duplicate VM on the host
if(!$vmcheck)
{
Write-Host "Above warning can be ignroed as there is no duplicate VM. Please proceed and enter following details. `n"
[int32]$gen = Read-Host "Generation type"
[int32]$cpu = Read-Host "Number of vCPU"
[string]$vmpath = Read-Host "Enter path for VM configuration files (Eg: E:\VM)"
[string]$dynamic = $null

while("yes","no" -notcontains $dynamic)
{
$dynamic = Read-Host "Will this VM use dynamic memory? (yes/no)"
}

#Dynamic memory parameters
if($dynamic -eq "yes")
{
[int64]$minRAM = Read-Host "Minimum memory (MB)"
[int64]$maxRAM = Read-Host "Maximum memory (MB)"
[int64]$startRAM = Read-Host "Starting memory (MB) [Note: Specify value between $minRAM and $maxRAM]"
$minRAM = 1MB*$minRAM
$maxRAM = 1MB*$maxRAM
$startRAM = 1MB*$startRAM

#Creating the VM with dynamic RAM
New-VM -Name $vmname -Path $vmpath -Generation $gen
Set-VM -Name $vmname -DynamicMemory -MemoryStartupBytes $startRAM -MemoryMinimumBytes $minRAM -MemoryMaximumBytes $maxRAM
}

else
{
#Creating the VM with static RAM
[int64]$staticRAM = Read-Host "Static memory (MB)"
$staticRAM = 1MB*$staticRAM
New-VM -Name $vmname -Path $vmpath -Generation $gen -MemoryStartupBytes $staticRAM
}

#Setting VM auto start to none and auto stop to shutdown
Set-VM -Name $vmname -ProcessorCount $cpu -AutomaticStartAction Nothing -AutomaticStopAction ShutDown

#Creating VM hard disk directory
New-Item -path $vmpath\$vmname -name "Virtual Hard Disks" -type directory

#Enabling processor compatibility configuration for migration
Set-VMProcessor $vmname -CompatibilityForMigrationEnabled $true
}#vmcheck ends here

else
{
Write-Host "A VM named $vmname already exists"
}
#End

Now the VM is created. But it doesn't have virtual hard disk (VHDX file). Assuming that you already have a syspreped VHDX template. Copy that VHDX template to the virtual hard disk folder of the VM that you just created. Rename it as per your standard. Now attach the disk to SCSI controller if Gen 2 or to IDE controller if Gen 1. Change the boot order and select hard drive as first boot entry. Connect the NIC to vSwitch. Now you can start your VM.


Reference:

techthoughts
starwindsoftware

Saturday, July 9, 2016

Anatomy of Hyper-V cluster debug log

  • Get-ClusterLog dumps the events to a text file
  • Location: C:\Windows\Clsuter\Reports\Cluster.log
  • It captures last 72 hours log
  • Cluster log is in GMT (because of geographically spanned multi-site clusters)
  • Usage: Get-ClusterLog -timespan (which gives last "x" minutes logs)
  • You can also set the levels of logs
  • Set-ClusterLog -Level 3 (level 3 is default)
  • It can be from level 0 to level 5 (increasing level of logging has performance impact)
  • Level 5 will provide the highest level of detail
  • Log format:
    [ProcessID] [ThreadID] [Date/Time] [INFO/WARN/ERR/DBG] [RescouceType] [ResourceName] [Description]

Troubleshooting Live Migration issues on Hyper-V

  1. Check whether enough resources (CPU, RAM) are available at the destination host
  2. Make sure all nodes in the cluster follow same naming standard for vSwitches
  3. Check NUMA spanning is enabled or not. If NUMA spanning is disabled, VM must fit entirely within a single physical NUMA node or the VM will not start or be restored or migrated
  4. Constrained delegation should be configured for all servers in the cluster if you are using Kerberos authentication protocol for live migration
  5. Check live migration setting is enabled on Hyper-V settings
  6. Verify Hyper-V-High-Availability logs in event viewer
  7. Finally check cluster debug log (Get-Clusterlog -timespan) in C:\Windows\Cluster\Reports\Cluster.log 

How Live Migration works on Hyper-V

1.Live migration setup

  • Source host  creates TCP connection with destination host
  • VM configuration data is transferred to destination host
  • A skeleton VM is setup at destination host
  • Physical memory is allocated to that VM
2.Memory pages are transferred from the source to destination host

  • In-state memory (working set) of the VM will be transferred first
  • Default page size is 4 KB
  • All utilized pages will be copied to destination
  • Modified pages are tracked by source and marked as being modified
  • Several iteration of copy process will take place
3.Remaining modified pages will be transferred to destination host

  • VM is then registered and the device state is transferred
  • Less modified pages implies fast migration
  • Total working set is copied to destination
4.Move storage handle from source to destination

  • Till this step VM at destination host is not online
5.Once control of storage is transferred, VM will be online and resumed at destination

  • Now the VM is completely migrated and running on destination host
6.Network cleanup

  • Message is sent to physical network switch causes it to relearn MAC address of migrated VM 

Tuesday, March 29, 2016

Hyper-V on Nutanix

This article explains briefly about Hyper-V on Nutanix virtual computing platform. The below figure shows a 'N' node Nutanix architecture, where each node is an independent server unit with a hypervisor, processor, memory and local storage (combination of SSD and HDD). Along with this there is a CVM (controller VM) through which storage resources are accessed.

'N' node Hyper-V over Nutanix architecture
Local storage from all the nodes are combined together to form a virtualized and unified storage pool by the Nutanix Distributed File System (NDFS). This can be considered as an advanced NAS which delivers a unified storage pool across the cluster and having features like striping, replication, error detection, failover, automatic recovery etc. From this pool shared storage resources are presented to the hypervisors using containers.


NDFS - Logical view of storage pool and containers


As mentioned above, each Hyper-V node has its own CVM which is shown below.
Hyper-V node with a CVM

PRISM console

View of storage pool in PRISM
View of containers in PRISM
Containers are mounted to Hyper-V as SMB 3.0 based file shares where virtual machine files are stored. This is shown below.

SMB 3.0 share path to store VHDs and VM configuration files

Once share path is given properly as mentioned above, you can create virtual machines on your Hyper-V server.

Saturday, November 28, 2015

Shared Nothing Live Migration

Shared Nothing Live Migration is a Hyper-V 3.0 feature that help us live migrate virtual machines from one Hyper-V server to another without a shared storage and cluster membership.
 
Note : Failover clustering provides HA, but shared nothing live migration is a mobility solution that gives flexibility in a planned movement of VMs between Hyper-V hosts without downtime.

Hyper-V settings for Live migrations 

As a prerequisite for this, we need to standardize network connectivity on Hyper-V host machines (eg : vSwitches should have same names for VM traffic, iSCSI traffic etc). And for this shared nothing Live Migration traffic we can use a separate VLAN (say, VLAN 90) so that it won’t affect local LAN.

Separate VLAN for live migration traffic


Also we need to configure constrained delegation on Hyper-V servers to use Kerberos authentication protocol when managing the servers remotely. This is shown below.

Use Kerberos

Delegation to specified services


 

Saturday, October 24, 2015

Enabling jumbo frame on Hyper-V 2012 R2 Core NIC using powershell

Enabling jumbo frame on a NIC using powershell is shown below :

Enabling jumbo frames using powershell


To list advanced properties of NIC3 : Get-NetAdapterAdvancedProperty -Name "NIC3"

To change jumbo mtu to 9000 : Get-NetAdapterAdvancedProperty -Name "NIC3" -RegistryKeyword "*jumbopacket" | Set-NetAdapterAdvancedProperty -RegistryValue 9000


Creating VHD/ VHDX templates

Templates can save you lot of time, instead of building a server from scratch. Following are the steps to create a VHD/ VHDX template that can be used on Hyper-V servers.

1.Create a virtual machine (say Windows Server 2008 R2 with 80 GB hard drive)
2.Make sure you create fixed disk
3.Install OS
4.Install Windows updates
5.Install Hyper-V integration services
6.Install any applications as per your requirement

Once you are done with all the above steps, now its time to sysprep your machine. This is shown below :

sysprep

Make sure you check Generalize and shutdown options. Once the system completes sysprep operation, it will automatically shutdown. You can now take a copy of this VHD disk, rename it as you wish and save it to a location where you keep your templates. Also, make sure to make this file as read-only, so that you can avoid booting it up accidentally.

Note : You can create VHD templates for OS earlier than Windows Server 2012 and VHDX templates for Windows Server 2012 and later.

Monday, March 30, 2015

How to run Hyper-V on a VM in Hyper-V

If you try to install Hyper-V role on a VM in Hyper-V, you will get an error message mentioning that, "Hyper-V cannot be installed : A hypervisor is already running". Follow the steps given in this blog to get it installed using PowerShell commands.

Note:
It is NOT recommended in a production environment. Use this only for self study and testing in labs. 

How to run Hyper-V on a VM in ESXI 5.5

You can now virtualize Hyper-V as a VM running on VMware ESXI 5.5. Initially if you try to install Hyper-V role, you will get an error message mentioning that, "Hyper-V cannot be installed". Follow the below steps to get it installed.

1. Enable CPU/ MMU virtualization in VM settings.


2. Power off the Hyper-V VM and remove it from inventory. Browse the data store and download the VMX configuration file of the VM and save to your local machine. Open it using WordPad and change the guestOS line to :

guestOS = “winhyperv”

Save the file and upload it back to data store. Now right click on the VMX file and add it to inventory. Power on the VM and install Hyper-V role. It must now get installed without any error.

Note :
It is NOT recommended in a production environment. Use this only for self study and testing in labs.