Showing posts with label storage virtualization. Show all posts
Showing posts with label storage virtualization. Show all posts

Friday, September 6, 2024

Revisiting Storage Performance Benchmarking

Few years ago, I had the opportunity to explore the intricacies of storage performance benchmarking using tools like FIO, DISKSPD, and Iometer. Those studies provided valuable insights into the performance characteristics of various storage solutions, shaping my understanding and approach to storage performance analysis. As I prepare for an upcoming project in this domain, I find it essential to revisit my previous work, reflect on the lessons learned, and share my experiences. This blog post aims to provide a comprehensive overview of my benchmarking journey and the evolving landscape of storage performance studies.


Recent advancements 

The field of storage technology has seen significant advancements in recent years. The rise of NVMe and storage-class memory technologies has also redefined high-end storage performance, offering unprecedented speed and efficiency. These advancements highlight the dynamic nature of storage performance benchmarking and underscore the importance of staying updated with the latest tools and methodologies.

Challenges

Benchmarking storage performance is not without its challenges. One of the primary difficulties is ensuring a consistent and controlled testing environment, as variations in hardware, software, and network conditions can significantly impact results. Another challenge is the selection of appropriate benchmarks that accurately reflect real-world workloads, which requires a deep understanding of the specific use cases and performance metrics. Additionally, interpreting the results can be complex, as it involves analyzing multiple metrics such as IOPS, throughput, and latency, and understanding their interplay. These challenges necessitate meticulous planning and a thorough understanding of both the benchmarking tools and the storage systems being tested.

Prior works

Following are some of the articles on storage benchmarking that I’ve published in the past:

Custom storage benchmarking framework

While there are numerous storage benchmarking tools available, such as VMFleet and HCIBench, I wanted to highlight a custom framework I developed a few years ago. Here are some reasons why we created this custom tool:

  • Great learning experience: It provided valuable insights into how things work.
  • Customization: Being a custom framework, it allows you to add or remove features as needed.
  • Flexibility: You can modify multiple parameters to suit your requirements.
  • Custom test profiles: You can create tailored storage test profiles.
  • No IP assignment needed: There’s no need for IP assignment or DHCP for the stress test VMs.
  • Centralized log collection: It offers centralized log collection for detailed analysis.


You can access the scripts and readme on my GitHub repository:

https://github.com/vineethac/vsan_cluster_storage_benchmarking_with_diskspd


Here is an overview.

  • Profile Manifest: All storage test profiles are listed in profile_manifest.psd1. You can define as many profiles as you want.
  • VM Template: A Windows VM template should be present in the vCenter server.
  • Benchmarking Manifest: Details of vCenter, cluster name, VM template, number of stress test VMs per host, etc., are provided in benchmarking_manifest.psd1.
  • Deploy Test VMs: deploy_test_vms.ps1 will deploy all the test VMs with pre-configured parameters.
  • Start Stress Test: start_stress_test.ps1 will initiate the storage stress test process for all the profiles mentioned in profile_manifest.psd1 one by one.
  • Log Collection: All log files will be automatically copied to a central location on the host from where these scripts are running.
  • Cleanup: Use delete_test_vms.ps1 to clean up the stress test VMs from the cluster.


Note:
 These scripts were created about five years ago, and I haven’t had the opportunity to refactor them according to current best practices and new PowerShell scripting standards. I plan to enhance them in the coming months!

This overview should provide you with a clear understanding of the overall process and workflow involved in the storage benchmarking process. I hope it was useful. Cheers!

Saturday, June 30, 2018

Introduction to Nutanix cluster components

In this article I will briefly explain about the different components of a Nutanix cluster. The major components are listed below.

Nutanix cluster components
  1. Stargate: Data I/O manager for the cluster.
  2. Medusa: Access interface for Cassandra.
  3. Cassandra: Distributed metadata store.
  4. Curator: Handles Map Reduce cluster management and cleanup.
  5. Zookeeper: Manages cluster configuration.
  6. Zeus: Access interface for Zookeeper.
  7. Prism: Management interface for Nutanix UI, nCLI and APIs.
Stargate
  • Responsible for all data management and I/O operations.
  • It is the main point of contact for a Nutanix cluster.
  • Workflow: Read/ write from VM < > Hypervisor < > Stargate.
  • Stargate works closely with Curator to ensure data is protected and optimized.
  • It also depends on Medusa to gather metadata and Zeus to gather cluster configuration data.
Medusa
  • Medusa is the Nutanix abstraction layer that sits infront of DB that holds the cluster metadata.
  • Stargate and Curator communicates to Cassandra through Medusa.
Cassandra
  • It is a distributed high performance and scalable DB.
  • It stores all metadata about all VMs stored in a Nutanix datastore.
  • It needs verification of atleast one other Cassandra node to commit its operations.
  • Cassandra depends on Zeus for cluster configuration.
Curator
  • Curator constantly access the environment and is responsible for managing and distributing data throughout the cluster.
  • It does disk balancing and information life cycle management.
  • It is elected by a Curator master node who manages the task and job delegation.
  • Master node coordinates periodic scans of the metadata DB and identifies cleanup and optimization tasks tat Stargate or other components should perform.
  • It is also responsible for analyzing the metadata, this is shared across all Curator nodes using a Map Reduce algorithm. 
Zookeeper
  • It runs on 3 nodes in the cluster.
  • It can be increased to 5 nodes of the cluster.
  • Zookeeper coordinates and distributes services.
  • One is elected as leader.
  • All Zookeeper nodes can process reads.
  • Leader is responsible for cluster configuration write requests and forwards to its peers.
  • If leader fails to respond, a new leader is elected.
Zeus
  • Zeus is the Nutanix library interface which all other components use to access cluster configuration information.
  • It is responsible for cluster configuration and leadership logs.
  • If Zeus goes down, all goes down!
Prism
  • Prism is the central entity of viewing  activity inside the cluster.
  • It is the management gateway for administrators to configure and monitor a Nutanix cluster.
  • It also elects a node.
  • Prism depends on data stored in Zookeeper and Cassandra.

Note: All the info provided above are based on Nutanix 4.5 Platform Professional (NPP) administration course.

Tuesday, December 27, 2016

Highly Available SOFS On Clustered Storage Spaces

This article explains briefly about the design and steps required to deploy a highly available scale-out-file server (SOFS) on clustered storage spaces using Windows Server 2012 R2. 

Note: Here we are using only a single JBOD. But there are several storage spaces certified JBODs available in market (eg: DATAON) which are enclosure aware. That means when you are using multiple JBODs together, you will have data redundancy at the JBOD level too. 


  1. JBOD is connected to both file servers using shared SAS HBA connectors
  2. MPIO is enabled on both servers (SERVER 01/ 02)
  3. Make sure JBOD disks are available to both servers
  4. In this case we have total 24 disks (6 SSD and 18 SAS HDD)
  5. Create new storage pool
    1. By default all available disks are included in a pool named the primodial pool
    2. If primodial pool is not listed under storage spaces, then it indicates that the storage doesn't meet requirements of storage spaces
    3. If you select primodial pool, all available disks will be listed under physical disks
    4. Select option create new storage pool
    5. Give it a name
    6. Select physical disks you want to be in the pool
    7. Hot spares can be configured too at this stage
  6. Verify new storage pool is listed under storage pools
  7. Next step is to create virtual disks (these are not vhdx files, they call called spaces)
    1. Now select the new storage pool that you have just created in step 5
    2. Click on tasks, select new virtual disk
    3. Give a name
    4. Select the storage layout (Mirror, Parity, Simple)
    5. Choose provisioning type (Thin, Fixed)
    6. Specify size
  8. Now create volumes
    1. Right click the virtual disk (space) that you have just created in the previous step and select new volume
    2. Select server name and then the virtual disk name
    3. Specify volume size, drive letter, file system type, allocation unit size and volume label 
  9. Create failover cluster using the 2 file servers
    1. Provide a cluster name
    2. Volumes that you created in step 8 will be listed as available volumes
    3.  Add those as cluster shared volumes
    4. Now it appears in C:\ClusterStorage\
  10. Next step is to create a highly available SOFS
    1. On failover cluster manager - roles - new clustered role - file server - file server for scale out application data (SOFS) 
    2. Provide client access point name (eg: SMB-FS01)
    3. Right click on SOFS role in failover cluster manager and select add shared folder
    4. Choose SMB share server applications
    5. Provide a name (eg: DATA01)
    6. Local path to share (C:\ClusterStorage\volume1\shares\DATA01
    7. Remote path to share (\\SMB-FS01\DATA01)
Now you have a highly available file share (\\SMB-FS01\DATA01) to store your virtual machine files which is built over clustered storage spaces.

Reference: Microsoft 

Monday, December 5, 2016

Generic Storage System Architecture


The above diagram shows a generic stand-alone storage system architecture, where a storage OS is installed over a bare metal server and thus making it a storage server. Here I am using an enterprise class storage OS named Open-E DSS V7 which is installed on a Dell PowerEdge R720xd. R720xd can have up to twelve 3.5" drives at the front plus two 2.5" drives at the rear. Here in the diagram, the last 2 disks (Disk Group03) are 2.5" drives installed at the rear and being used as OS drive in RAID1. Apart from that we have five SAS 7.2K and 15K disks that are grouped into two RAID groups. Comparing the disk IOPS 'Disk Group01' can be considered fast and 'Disk Group02' as slow.

As I haven't mentioned the size of each SAS disk, lets assume using 'Disk Group01' a 10TB RAID5 virtual disk (VD) is created and using 'Disk Group02' a 12TB RAID5 VD is created. You can configure hot spares for each disk group if you have additional disks. As I mentioned above, for OS installation we have created a 10GB RAID1 VD using 'Disk Group03'. After installation of the OS (Open-E DSS V7), it scans and shows 10TB and 12TB as available storage units. 

Now the next step is to create volume groups. Here we created two volume groups (VG00 and VG01) to differentiate fast and slow storage. 

  • VG00 uses VD01
  • VG01 uses VD02
Once volume groups are created, you can now carve out LUNs separately based on your requirements. For example, if you want a LUN that is going to be used as a datastore to store your virtual machines, then you can create it on VG00 (fast), or if you need it for storing some general purpose backup files, then you can create it on VG01 (slow). So depending on your requirement you can decide where to create your LUN.

Note: Here I classified RAID disk groups based on speed. You can divide it based on reads and writes. So that you can choose RAID10 disk group for write intensive operations and RAID5 disk group for reads. It can even be divided based on access type. That means a disk group exclusively for sequential file access (SQL logs) and another disk group for random access (SQL data, VM datastore etc).