Showing posts with label hyperconverged. Show all posts
Showing posts with label hyperconverged. Show all posts

Friday, October 23, 2020

VMware PowerCLI 101 - part8 - Working with vSAN

This article explains how to work with vSAN resources using PowerCLI. 

Note I am using the following versions:
PowerShell: 5.1.14393.3866
VMware PowerCLI: 12.1.0.17009493


Connect to vCenter:
Connect-VIServer <IP of vCenter server>

List all vSAN get cmdlets:
Get-Command Get-Vsan*


vSAN runtime info:
$c = Get-Cluster Cluster01
Get-VsanRuntimeInfo -Cluster $c


vSAN space usage:
Get-VsanSpaceUsage


vSAN cluster configuration:
Get-VsanClusterConfiguration


vSAN disk details:
Get-VsanDisk


View all properties of a disk:
(Get-VsanDisk)[31] | select *


View disk vendor, model, firmware revision, physical location, operational state:
(Get-VsanDisk)[31].ExtensionData


 vSAN disk group details:
Get-VsanDiskGroup


Get all properties of a disk group:

Saturday, June 30, 2018

Introduction to Nutanix cluster components

In this article I will briefly explain about the different components of a Nutanix cluster. The major components are listed below.

Nutanix cluster components
  1. Stargate: Data I/O manager for the cluster.
  2. Medusa: Access interface for Cassandra.
  3. Cassandra: Distributed metadata store.
  4. Curator: Handles Map Reduce cluster management and cleanup.
  5. Zookeeper: Manages cluster configuration.
  6. Zeus: Access interface for Zookeeper.
  7. Prism: Management interface for Nutanix UI, nCLI and APIs.
Stargate
  • Responsible for all data management and I/O operations.
  • It is the main point of contact for a Nutanix cluster.
  • Workflow: Read/ write from VM < > Hypervisor < > Stargate.
  • Stargate works closely with Curator to ensure data is protected and optimized.
  • It also depends on Medusa to gather metadata and Zeus to gather cluster configuration data.
Medusa
  • Medusa is the Nutanix abstraction layer that sits infront of DB that holds the cluster metadata.
  • Stargate and Curator communicates to Cassandra through Medusa.
Cassandra
  • It is a distributed high performance and scalable DB.
  • It stores all metadata about all VMs stored in a Nutanix datastore.
  • It needs verification of atleast one other Cassandra node to commit its operations.
  • Cassandra depends on Zeus for cluster configuration.
Curator
  • Curator constantly access the environment and is responsible for managing and distributing data throughout the cluster.
  • It does disk balancing and information life cycle management.
  • It is elected by a Curator master node who manages the task and job delegation.
  • Master node coordinates periodic scans of the metadata DB and identifies cleanup and optimization tasks tat Stargate or other components should perform.
  • It is also responsible for analyzing the metadata, this is shared across all Curator nodes using a Map Reduce algorithm. 
Zookeeper
  • It runs on 3 nodes in the cluster.
  • It can be increased to 5 nodes of the cluster.
  • Zookeeper coordinates and distributes services.
  • One is elected as leader.
  • All Zookeeper nodes can process reads.
  • Leader is responsible for cluster configuration write requests and forwards to its peers.
  • If leader fails to respond, a new leader is elected.
Zeus
  • Zeus is the Nutanix library interface which all other components use to access cluster configuration information.
  • It is responsible for cluster configuration and leadership logs.
  • If Zeus goes down, all goes down!
Prism
  • Prism is the central entity of viewing  activity inside the cluster.
  • It is the management gateway for administrators to configure and monitor a Nutanix cluster.
  • It also elects a node.
  • Prism depends on data stored in Zookeeper and Cassandra.

Note: All the info provided above are based on Nutanix 4.5 Platform Professional (NPP) administration course.

Thursday, November 30, 2017

Software Defined Storage using ScaleIO

In this article I will explain briefly about ScaleIO and various options that are available to deploy ScaleIO software defined storage (SDS) solution. 

ScaleIO can be considered as a very good option for customers who are moving towards deploying software defined storage  solutions and hyperconverged infrastructure. As ScaleIO software supports multiple hypervisors and operating systems like VMware ESXi, Hyper-V, RHEL, Windows etc. customers with a heterogeneous IT infrastructure gets the most benefit out of it. Apart from that it offers multiple deployment modes like hyperconverged, two layer and mixed mode. I am sure most of you are very much familiar with the term hyperconverged where compute and storage runs together on the same box. You can scale both compute and storage resources together by adding more and more nodes to your cluster. A two layer mode is nothing but a storage only configuration where you can scale the storage resources separately. It is essentially a virtual SAN infrastructure implemented using ScaleIO SDS. A mixed mode scenario will usually occur when transitioning from storage only configuration to hyperconverged.

Now I will just give an overview on how to deploy ScaleIO on VMware and RHEL platforms. ScaleIO has tight integration with VMware and they provide a powershell script and vCenter plugin to simplify the deployment. In case of RHEL platform, you can use Installation Manager (IM) which is a part of ScaleIO Gateway for quick and easy deployment of ScaleIO cluster. Customers have multiple options to consume ScaleIO. They can just buy the ScaleIO software alone and use commodity x86 hardware to build the cluster (not a great idea for production deployments as they have to figure out and use the validated/ qualified hardware and software components to ensure seamless operation and proper support) or they can buy ScaleIO Ready Nodes which are prevalidated, preconfigured and optimized PowerEdge servers to deploy ScaleIO cluster. Apart from that there is another offering VxRack System Flex which is a rack-scale hyperconverged solution built on Dell EMC PowerEdge servers with integrated Cisco networking and ScaleIO software. 

Lets have a look at the major components of ScaleIO. Below figure shows a 5 node hyperconverged ScaleIO cluster running on a highly available VMware platform. The three main components of ScaleIO are:

  • SDC - ScaleIO Data Client
  • SDS - ScaleIO Data Server
  • MDM - Meta Data Manager


In this scenario, all 5 nodes have ESXi installed and clustered. All nodes have local hard disks present in them. And its the responsibility of ScaleIO software to pool all the hard disks from all 5 nodes forming a distributed virtual SAN.

SDC is a light weight driver which is responsible for presenting LUNs provisioned from the ScaleIO system. SDS is responsible for managing local disks present in each node. MDM contains all the metadata required for system operation and configuration changes. It manages the metadata, SDC, SDS, system capacity, device mappings, volumes, data protection, errors/ failures, rebuild and rebalance operations etc. ScaleIO supports 3 node/ 5 node MDM cluster. Above figure shows a 5 node MDM cluster, where there will be 3 manager MDMs and out of which one will be master and two will be slaves and there will be two Tie-Breaker (TB) which helps in deciding master MDM by maintaining a majority in the cluster. In a production environment with 5 or more nodes, it is recommended to use a 5 node MDM cluster as it can tolerate 2 MDM failures.

ScaleIO uses a distributed two way mesh mirror scheme to protect data against disk or node failures. To ensure QoS it has the capability where you can limit bandwidth as well as IOPS for each volume provisioned from a ScaleIO cluster. And regarding scalability a single ScaleIO cluster supports upto 1024 nodes. In very large ScaleIO deployments it is highly recommended to configure separate protection domains and fault sets to minimize the impact of multiple failures at the same time. 

You can download ScaleIO software for free to test and play around in your lab environment.

References:
Dell EMC ScaleIO Basic Architecture
Dell EMC ScaleIO Design Considerations And Best Practices
Dell EMC ScaleIO Ready Node

Thursday, August 31, 2017

Benchmarking hyper-converged vSphere environment using HCIBench

In this article I will explain briefly about using HCIBench for storage benchmarking vSphere deployments. I recently had a chance to try HCIBench on a ScaleIO cluster running on a 3 node ESXi 6.0 cluster. Latest version of the tool can be downloaded from here.

I conducted the tests with HCIBench version 1.6.5.1.  Installation steps are very straight forward. You can install it on one of the ESXi nodes using "Deploy OVF Template" option.

*Browse and select the template.
*Provide a name and select location.
*Select a resource to host HCIBench.
*Review details and click Next.
*Accept license agreements.
*Select a datastore to store HCIBench files.
*Select networks (as shown below).


Note:
Public network is the network through which you can access management web GUI of HCIBench.
Private network is the network where HCIBench test VMs will be deployed.

*Provide management IP details (I am using static IP)  and root password for HCIBench  as shown below.


*Click Next and Finish.
*Once the deployment is complete, you can power on the HCIBench VM.
*Web GUI can be accessed by: <IP address>:8080
*Provide root credentials to access the configuration page.

In the configuration page fill the necessary details as given below.

*vCenter IP.
*vCenter username and password.
*Datacenter name.
*Cluster name.
*Network name (the network on which HCIBench test VMs will be deployed).

Note:
The network on to which HCIBench test VMs will be deployed should have a DHCP server to provide IPs to the test VMs. If you do not have a DHCP server available in that network, select "Set Static IP for Vdbench VMs" as shown below. In this case HCIBench will provide IPs to the test VMs through the private network that you have chosen in the earlier installation step. Here, I don't have a DHCP server in VMNet-SIO network. So I connected eth1 of HCIBench to VMNet-SIO (private network) and checked the option "Set Static IP for Vdbench VMs".

*Provide datastore names.


*Provide ESXi host username and password.
*Total number of VMs.
*Number of data disks per VM.
*Size of each data disk.

Note:
Make sure to keep a ratio between the total number of datastores, number of ESXi nodes and the number of test VMs. Here I have 3 ESXi nodes and 3 datastores in the cluster. So if I deploy 9 VMs for the test, each ESXi node will host 3 test VMs, one on each datastore. 


*Give a name for the test.
*Select the test parameters.
*You can use the Generate button to customize the parameters (shown in next screenshot).


*Input parameters as per your requirement and submit.


*If you are doing this the very first time, you will find the below option at the end of the page to upload Vdbench ZIP file. You can upload the file if you have one, or you can download it from the Oracle website using the Download button.
*After providing necessary inputs, you can save the configuration and validate. If the validation passes, you can click on test to begin the benchmarking test.


*Once the test is complete, click on the Result button and it will take you to the directory where result files are stored. 

Hope the article was useful to you. Happy benchmarking. Cheers!

Thursday, May 11, 2017

Benchmarking Hyper-Converged Storage Spaces Direct (S2D) Cluster

Finally I managed to write some PowerShell code as I am completely inspired by my new PS geek friends. The scripts can be used to generate load and stress test your S2D as well as traditional Hyper-V 2016 cluster. These are functionally similar to VM Fleet. There are 7 scripts in total.
  1. create_clustered_testvms.ps1 : this script creates virtual machines on the cluster nodes which will be used for stress testing
  2. start_all_testvms.ps1 : start all those clustered VMs that you just created
  3. io_stress_trigger.ps1 : to trigger IO stress on all VMs using diskspd 
  4. rebalance_all_testvms.ps1 : this script is originally from winblog, I just made a small change so that it will use live migration while moving the clustered VMs back to their owner node
  5. watch_iops_live.ps1 : to view read, write and total iops of each CSV disk on the S2D cluster 
  6. stop_all_testvms.ps1 : to shutdown all the clustered VMs
  7. wipeoff_testvms.ps1 : to delete all VMs that you created using the first script
PS Version which I am using is given below.


Now I will explain briefly about how to use these scripts and a few prerequisites. Say, you have a 4 node hyper-converged S2D cluster.


As there are 4 nodes , you should have 4 cluster shared volumes (CSV). Assign one CSV to each cluster node as shown below. This means when you create VMs on NODE-01, it will be placed on CSV Volume AA, for NODE-02 VMs will be placed on Volume BB and so on. 


Volume AA is C:\ClusterStorage\Volume1


Similarly,
Volume BB is C:\ClusterStorage\Volume2
Volume CC is C:\ClusterStorage\Volume3
Volume DD is C:\ClusterStorage\Volume4


Your cluster shared volumes are ready now. Create 2 folders inside Volume1 as shown below.


Copy all the 7 PS scripts to scripts folder. Code of each script is given at the end.


You now need a template VHDX and that needs to be copied to template folder.
Note: It should be named as "template"


This template is nothing but a Windows Server 2016 VM created on a dynamically expanding disk. So you just have to create a VM with dynamically expanding VHDX, install Windows Server 2016 and set local administrator password to "Pass1234". Download diskspd from Microsoft, unzip it and just copy the diskspd.exe to C drive of the VM you just created.


Shutdown the VM. No need to sysprep it. Copy the VHDX disk of the VM to template folder and rename it to "template". The disk will be around 9.5 GB in size. Once the template is copied, you are all set to start. 

Step 1: 
Run create_clustered_testvms.ps1 
This will create clustered testvms on each of the nodes. It will be done in such a way that VMs on NODE-01 will be stored on Volume1, VMs on NODE-02 will be stored on Volume2 and so on 

Step 2:
Run start_all_testvms.ps1 
This will start all the testvms

Step 3:
Wait for a few seconds to ensure all the testvms are booted properly; then run io_stress_trigger.ps1 and provide necessary input parameters 



Step 4:
You can watch IOPS of the cluster using watch_iops_live.ps1


If you would like to live migrate some testvms while the stress test is running you can try it and observe the IO variations. But before running the io_stress_trigger.ps1 again you have to move/ migrate all those testvms back to their preferred owners. This can be done using rebalance_all_testvms.ps1 .  

If any testvms are not running on their preferred owner, then io_stress_trigger.ps1 will fail for those VMs. Here, testvms running on NODE-01 has preferred owner NODE-01, similarly for all other testvms. So you have to make sure all the testvms are running on their preferred owner before starting io stress script.

Use stop_all_testvms.ps1 to shutdown all the clustered testvms that you created on step 1. To delete all the testvms, you can use wipeoff_testvms.ps1 .

NOTE: While running scripts 2,3,6 and 7 please make sure all the testvms are running on their preferred owner. Use rebalance_all_testvms.ps1 to assign all testvms back to its preferred owner! Also please run all these scripts on PowerShell with elevated privileges after directly logging into any of the cluster nodes.

All codes given below. It might not be optimal but I am pretty sure it works! Cheers !
--------------------------------------------------------------------------------------------------------------------------

#BEGIN_create_clustered_testvms.ps1
#Get cluster info
$Cluster_name = (Get-Cluster).name
$Nodes_name = (Get-ClusterNode).name
$Node_count = (Get-ClusterNode).count

#Input VM config
$VM_count = Read-Host "Enter number of VMs/ node"
$Cluster_VM_count = $VM_count*$Node_count
[int64]$RAM = Read-Host "Enter memory for each VM in MB Eg: 4096"
$RAM = 1MB*$RAM
$CPU = Read-Host "Enter CPU for each VM"

#Creds to Enter-PSSession
$pass = convertto-securestring -asplaintext -force -string Pass1234
$cred = new-object -typename system.management.automation.pscredential -argumentlist "administrator", $pass

#Loop for each node in cluster
for($i=1; $i -le $Node_count; $i++){
    $VM_path = "C:\ClusterStorage\Volume$i"
    $Node = $Nodes_name[$i-1]

    #Remote session to each node
    $S1 = New-PSSession -ComputerName $Node -Credential $cred

    #Loop for creating new testvms on each node
    for($j=1; $j -le $VM_count; $j++){
        $VM_name = "testvm-$Node-$j"
        new-vm -name $VM_name -computername $Node -memorystartupbytes $RAM -generation 2 -Path $VM_path
        set-vm -name $VM_name -ProcessorCount $CPU -ComputerName $Node
        New-Item -path $VM_path\$VM_name -name "Virtual Hard Disks" -type directory

        #Copy template disk
        Copy-Item "C:\ClusterStorage\Volume1\template\template.vhdx" -Destination "$VM_path\$VM_name\Virtual Hard Disks" -Verbose
        Add-VMHardDiskDrive -VMName $VM_name -ComputerName $Node -path "$VM_path\$VM_name\Virtual Hard Disks\template.vhdx" -Verbose

        #Create new fixed test disk
        New-VHD -Path "$VM_path\$VM_name\Virtual Hard Disks\test_disk.vhdx" -Fixed -SizeBytes 40GB
        Add-VMHardDiskDrive -VMName $VM_name -ComputerName $Node -path "$VM_path\$VM_name\Virtual Hard Disks\test_disk.vhdx" -Verbose

        Get-VM -ComputerName $Node -VMName $VM_name | Start-VM
        Add-ClusterVirtualMachineRole -VirtualMachine $VM_name
        Set-ClusterOwnerNode -Group $VM_name -owner $Node
        Start-Sleep -S 10

        #Remote session to each testvm on the node to initialize and format test disk (drive D:)
        Invoke-Command -Session $S1 -ScriptBlock {param($VM_name2,$cred2) Invoke-Command -VMName $VM_name2 -Credential $cred2 -ScriptBlock {
            Initialize-Disk -Number 1 -PartitionStyle MBR
            New-Partition -DiskNumber 1 -UseMaximumSize -DriveLetter D
            Get-Volume | where DriveLetter -eq D | Format-Volume -FileSystem NTFS -NewFileSystemLabel Test_disk -confirm:$false
            }} -ArgumentList $VM_name,$cred

        Start-Sleep -S 5
        Get-VM -ComputerName $Node -VMName $VM_name | Stop-VM -Force
        }
    }
#END_create_clustered_testvms.ps1


--------------------------------------------------------------------------------------------------------------------------

#BEGIN_start_all_testvms.ps1
#Get cluster info
$Cluster_name = (Get-Cluster).name
$Nodes_name = (Get-ClusterNode).name
$Node_count = (Get-ClusterNode).count

for($i=1; $i -le $Node_count; $i++){
    $Node = $Nodes_name[$i-1]
    Get-VM -ComputerName $Node -VMName "testvm-$Node*" | Start-VM -AsJob

    }
#END_start_all_testvms.ps1


--------------------------------------------------------------------------------------------------------------------------

#BEGIN_io_stress_trigger.ps1
#Get cluster info
$Cluster_name = (Get-Cluster).name
$Nodes_name = (Get-ClusterNode).name
$Node_count = (Get-ClusterNode).count

#Creds to Enter-PSSession
$pass = convertto-securestring -asplaintext -force -string Pass1234
$cred = new-object -typename system.management.automation.pscredential -argumentlist "administrator", $pass

$time = Read-Host "Enter duration of stress in seconds (Eg: 300)"
$block_size = Read-Host "Enter block size (Eg: 4K)"
$writes = Read-Host "Enter write percentage (Eg: 20)"
$OIO = Read-Host "Enter number of outstanding IOs (Eg: 16)"
$threads = Read-Host "Enter number of threads (Eg: 2)"

#Loop for each node in cluster
for($i=1; $i -le $Node_count; $i++){
    $VM_path = "C:\ClusterStorage\Volume$i"
    $Node = $Nodes_name[$i-1]

    #Remote session to each node
    $S1 = New-PSSession -ComputerName $Node -Credential $cred

    $VM_count = (Get-VM -ComputerName $Node -VMName "testvm-$Node*").Count

    #Loop for creating new testvms on each node
    for($j=1; $j -le $VM_count; $j++){
        $VM_name = "testvm-$Node-$j"

        #Remote session to each testvm
        Invoke-Command -Session $S1 -ScriptBlock {param($VM_name2,$cred2,$time1,$block_size1,$writes1,$OIO1,$threads1) Invoke-Command -VMName $VM_name2 -Credential $cred2 -ScriptBlock {param($time2,$block_size2,$writes2,$OIO2,$threads2)
        C:\diskspd.exe -"b$block_size2" -"d$time2" -"t$threads2" -"o$OIO2" -h -r -"w$writes2" -L -Z500M -c38G D:\io_stress.dat
        } -AsJob -ArgumentList $time1,$block_size1,$writes1,$OIO1,$threads1 } -ArgumentList $VM_name,$cred,$time,$block_size,$writes,$OIO,$threads


        }
    }
#END_io_stress_trigger.ps1


--------------------------------------------------------------------------------------------------------------------------

#BEGIN_rebalance_all_testvms.ps1
$clustergroups = Get-ClusterGroup | Where-Object {$_.IsCoreGroup -eq $false}
 foreach ($cg in $clustergroups)
 {
     $CGName = $cg.Name
     Write-Host "`nWorking on $CGName"
     $CurrentOwner = $cg.OwnerNode.Name
     $POCount = (($cg | Get-ClusterOwnerNode).OwnerNodes).Count
     if ($POCount -eq 0)
     {
         Write-Host "Info: $CGName doesn't have a preferred owner!" -ForegroundColor Magenta
     }
     else
     {
         $PreferredOwner = ($cg | Get-ClusterOwnerNode).Ownernodes[0].Name
         if ($CurrentOwner -ne $PreferredOwner)
         {
             Write-Host "Moving resource to $PreferredOwner, please wait..."
             $cg | Move-ClusterVirtualMachineRole -MigrationType Live -Node $PreferredOwner
         }
         else
         {
             write-host "Resource is already on preferred owner! ($PreferredOwner)"
         }
     }
 }
 Write-Host "`n`nFinished. Current distribution: "

 Get-ClusterGroup | Where-Object {$_.IsCoreGroup -eq $false} 
#END_rebalance_all_testvms.ps1


--------------------------------------------------------------------------------------------------------------------------

#BEGIN_watch_iops_live.ps1
#Get cluster info
$Cluster_name = (Get-Cluster).name
$Nodes_name = (Get-ClusterNode).name

while($true)
{

    [int]$total_IO = 0
    [int]$total_readIO = 0
    [int]$total_writeIO = 0
 
    clear
     
    "{0,-15} {1,-15} {2,-15} {3,-15} {4, -15} {5, -15}" -f "Host", "Total IOPS", "Reads/Sec", "Writes/Sec", "Read Q Length", "Write Q Length"

    for($j=1; $j -le $Nodes_name.count; $j++){

        $Node = $Nodes_name[$j-1]

        $Data = Get-CimInstance -ClassName Win32_PerfFormattedData_CsvFsPerfProvider_ClusterCSVFS -ComputerName $Node | Where Name -like Volume$j

        [int]$T = $Data.ReadsPerSec+$Data.WritesPerSec
     
        "{0,-15} {1,-15} {2,-15} {3,-15} {4,-15} {5, -15}" -f "$Node", "$T", $Data.ReadsPerSec, $Data.WritesPerSec, $Data.CurrentReadQueueLength, $Data.CurrentWriteQueueLength

        $total_IO = $total_IO+$T
        $total_readIO = $total_readIO+$Data.ReadsPerSec
        $total_writeIO = $total_writeIO+$Data.WritesPerSec
        }

    echo `n
    "{0,-15} {1,-15} {2,-15} {3,-15} " -f "Cluster IOPS", "$total_IO", "$total_readIO", "$total_writeIO"

    Start-Sleep -Seconds 3

}
#END_watch_iops_live.ps1

--------------------------------------------------------------------------------------------------------------------------

#BEGIN_stop_all_testvms.ps1
#Get cluster info
$Cluster_name = (Get-Cluster).name
$Nodes_name = (Get-ClusterNode).name
$Node_count = (Get-ClusterNode).count

for($i=1; $i -le $Node_count; $i++){
    $Node = $Nodes_name[$i-1]
    Get-VM -ComputerName $Node -VMName "testvm-$Node*" | Stop-VM -Force -AsJob
    }
#END_stop_all_testvms.ps1

--------------------------------------------------------------------------------------------------------------------------

#BEGIN_wipeoff_testvms.ps1
#Get cluster info
$Cluster_name = (Get-Cluster).name
$Nodes_name = (Get-ClusterNode).name
$Node_count = (Get-ClusterNode).count

#Loop for each node in cluster
for($i=1; $i -le $Node_count; $i++){
    $VM_path = "C:\ClusterStorage\Volume$i"
    $Node = $Nodes_name[$i-1]

    $VM_count = (get-vm -ComputerName $Node -Name "testvm-$Node-*").Count

    #Loop to delete testvm on each node
    for($j=1; $j -le $VM_count; $j++){

        $VM_name = "testvm-$Node-$j"
        $full_path = "$VM_path\$VM_name"

        Get-VM -Computername $Node -VMname $VM_name | stop-vm -force
        Get-ClusterGroup $VM_name | Remove-ClusterGroup -Force -RemoveResources
        Get-VM -Computername $Node -VMname $VM_name | remove-vm -force
        Remove-Item $full_path -Force -Recurse -ErrorAction SilentlyContinue -Verbose
        }

    }
#END_wipeoff_testvms.ps1

--------------------------------------------------------------------------------------------------------------------------



Monday, November 30, 2015

Nutanix : a web-scale hyper converged infrastructure solution for enterprise datacenters

Nutanix is an industry leader in hyper converged infrastructure and software defined storage that is optimized for virtual workloads. You can even think it as a cluster-in-a-box solution with compute, storage and hypervisor consolidated together into a 1U or 2U enclosure. And its interesting that in a Nutanix architecture there is no RAID and no need of a SAN storage too. Storage is totally local and they are using direct attached local disks (combination of both SSD and SAS disks) for storing data.

How does it look like ?


Front and rear side of a Nutanix appliance (eg : NX-1000)


Each Nutanix box contains 4 independent nodes with are clustered together. This is shown in the figure below.

Nutanix box with 4 nodes

Each of these nodes operate independently, it has its own CPU, RAM, HDDs etc and all those nodes are clustered together. so each time you want to increase the compute and storage capacity, you can add more boxes (with 1, 2 or 4 nodes depending on the need) to the cluster. Detailed logical architecture of a single node is given below.

Single Nutanix node architecture

You can see in each node, there are SSDs as well as SAS HDDs for storage. And there is a controller VM, which is actually a virtual storage controller that runs on each and every node for improving scalability and resiliency while preventing performance bottlenecks. This controller VM is something like a VSA, but it does more than that. It is intelligent than a traditional VSA and is capable of  functionalities like automated tiering, data locality, de-duplication etc and much more. All storage controllers in a cluster communicates with each other forming Nutanix distributed file system. For each read, there are 3 levels of cache. An in-memory cache within each node, then a hot tier (SSDs) and finally cold tier (SAS HDDs). Here the hypervisor communicates with the controller VM just like it would communicate to a physical storage controller. When a write operation happens, the VM will contact the virtual storage controller and then it is written first to the local SSDs. To ensure the protection data is then replicated to multiple nodes in the cluster, so that it is always available even if a node fails. We can have RF2 (2 way replication) or RF3 (3 way replication). It is an auto healing system, so that if a node fails and if it has only one copy of data left, then the system will automatically identify it using map reduce or those type of analytics and then it will be replicated to another nodes.

If you want to add more nodes, all you have to do is to connect it to the network and power it on, the system will be auto discovered using a auto discovery protocol which runs on top of IPV6. So its very easy to add a new node to a cluster. You can dynamically expand your cluster resources by adding more boxes without shutting down the cluster. Rolling upgrades can be done with out downtime by updating the controller VM one by one in a cluster. Now, each node is clustered at the Nutanix architecture level and you can cluster it at the hypervisor level too (say, VMware ESXI cluster using vCenter server) and providing a highly available web-scale hyper converged solution.

DELL and Nutanix partnered together and they have introduced DELL XC Series appliances optimized for virtual workloads.

References :
www.nutanix.com