Showing posts with label PowerFlex. Show all posts
Showing posts with label PowerFlex. Show all posts

Thursday, August 1, 2024

A decade of tech - My professional journey so far

Laying the Groundwork

My professional career commenced in February 2014, as a Trainee IT Services Engineer at Alamy Images. During my initial days, I was tasked with daily maintenance activities such as running tape backups, setting up Active Directory user accounts, mailboxes, and desktops for new employees. I also handled general IT support, troubleshooting various user issues within the organization.

After a few months, I had the opportunity to set up a lab infrastructure project using old decommissioned servers as part of a continuous learning initiative. This hands-on experience involved racking, stacking, and cabling physical servers, installing and configuring ESXi and Hyper-V hypervisors, FreeNAS storage servers, and deploying highly available clusters. Additionally, I gained exposure to configuring L2 network switches. This project significantly contributed to building my IT infrastructure foundation.

A year later, I was promoted to Junior IT Services Engineer, where I focused on virtualization projects. I spearheaded the migration of over 20 Dev/ Test/ UAT virtual machines from VMware to a Hyper-V cluster, enhancing system flexibility and cost-efficiency. I deployed a high-availability Hyper-V failover cluster in production and contributed to the planning and execution of a iSCSI storage server migration project.

Beyond virtualization, I worked on network infrastructure by a seamless L2 switch replacement and upgrade project with minimal operational disruption. Furthermore, I assisted in capacity planning initiatives for optimized resource utilization for both physical and virtual environments. These experiences refined my technical skills and problem-solving abilities. During this time, I developed a passion for infrastructure management and optimization, shaping my future career path.

From Junior IT Services Engineer to Storage Solutions Engineer

In January 2017, I transitioned to a Systems Development Engineer role at Dell EMC, specializing in Solutions Engineering. This marked a significant career shift as I immersed myself in the world of storage and virtualization solutions integration/development.

My daily responsibilities encompassed the installation and testing of various components, progressing from integration to validating system reliability and performance at scale. I designed and deployed multiple PowerFlex software-defined storage clusters for customer demos and proof-of-concepts, showcasing the product's performance and auto rebuild capabilities. A notable achievement was automating the storage performance benchmarking using PowerShell, FIO, and ELK stack, reducing process time from weeks to days.

I led the engineering efforts for developing a vROps management pack for PowerFlex, ensuring seamless integration and visibility. Additionally, I mastered vSphere Virtual Volumes (vVols), successfully executing integration projects between Dell storage solutions and VMware environments.

To streamline operations, I created a PowerShell module for managing PowerFlex using REST APIs and developed Ansible playbook for automated deployment of Kubernetes cluster with PowerFlex CSI driver. My expertise extended beyond systems engineering and automation as I authored and published whitepapers on disaster recovery using VMware SRM and hardware lifecycle management with Dell OME.

This period solidified my reputation as a virtualization and storage solutions expert, providing me with a deep understanding of storage architecture, performance optimization, and automation. I developed a passion for building scalable and reliable hyperconverged solutions.

From Storage Solutions Engineer to Site Reliability Engineer

In July 2021, I transitioned to a Site Reliability Engineer (SRE) role at VMware, focusing on ensuring the reliability and scalability of Kubernetes-as-a-Service project based on the vSphere with Tanzu platform.


Managing a vast infrastructure of Kubernetes clusters, I honed my skills in incident response, GitOps pipelines, automation, and monitoring. I played a crucial role in maintaining platform availability, collaborating closely with multiple internal teams and stakeholders to resolve issues and enhance service delivery. My proficiency in Python and PowerShell was instrumental in automating tasks and building custom monitoring solutions. During this time, I prepared diligently, practiced extensively, and successfully qualified for the CKA exam.

Beyond core SRE responsibilities, I explored emerging technologies. I successfully deployed and evaluated open-source language models on Kubernetes using Python, Ollama, and LangChain. In addition, I contributed to developing custom metrics for the Kubernetes-as-a-Service platform using Python, Prometheus, Grafana, and Helm.

This role deepened my expertise and ability to bridge the gap between development and operations, fostering a culture of reliability and efficiency. It has been an exciting journey of learning and growth, positioning me as a versatile IT professional with a strong foundation in both infrastructure and cloud-native technologies.

Gratitude

"This journey has been immensely fulfilling, made possible by the support and encouragement of exceptional organisations, inspiring managers, talented colleagues, friends, and family. I am truly grateful for the opportunities to learn, grow, and contribute meaningfully to driving success and making a positive impact."

The journey continues...

Friday, January 1, 2021

Dell EMC PowerFlex MP for vROps 8.x - Part7 - Create custom reports

In March 2020, I published a blog on how to create custom views and reports in vROps 8.x. This article explains how to create a custom storage report for Dell EMC PowerFlex using the PowerFlex Management Pack for vROps 8.x. 

Sample PowerFlex Storage Report PDF and template is available in my GitHub repo for download. You can use it as a starting point/ modify it as per requirement.

To create a new view: Dashboards - Views - Add.

Provide a name and description for the new view. Here, for example, I will create a view that shows PowerFlex Protection Domain Info.



Select List.


Select Protection Domain as subject and group it by PowerFlex Rack/ Appliance System.


Double click or drag and drop the selected metrics or properties to include in the view. In the following screenshot, I selected 4 capacity metrics to include in the view.


You can also select and change the units and transformation as per requirements. Once it is done, click Save.

Now a view is created. Similarly, you can create multiple views for the different PowerFlex resource kinds. The next step is to include this view in an existing template or in a new template. 

To create a new report template: Dashboards - Reports - Add.

  • Provide a name and description for the new report template.
  • From the views and dashboards, find the PowerFlex Protection Domain Info view that we created earlier, double-click or drag and drop them to the right pane. You can add multiple views to be included in this report template.
  • Select PDF and CSV.
  • Select all the layout options if you like to and click Save.
  • Now the custom report template is created. You can select it and click Run.

Select PowerFlex and then select PowerFlex World and click ok.


The report will run in the background and will be available to download under the "Generated Reports" tab. You can select it and download the PDF or CSV file. You can even configure a schedule to generate a report and email it or save it to a location automatically based on your requirements. Hope it was useful. Cheers!

Related posts


Tuesday, December 15, 2020

Dell EMC PowerFlex MP for vROps 8.x - Part6 - Create custom alerts

In this post, we will take a look at creating custom alerts for PowerFlex by adding symptom definitions and alert definitions. Refer to my previous blog post to understand more about the alerting aspects in vROps. Here we will take an example scenario and see how we can create custom symptom definitions and alert definitions.

Scenario


The user is running some latency-sensitive business-critical applications using PowerFlex storage. Below are the symptoms that he would like to define and alerts should be produced for the same and these should affect the "Health" badge of the PowerFlex volume object.


Step1: Add Symptom Definitions


Go to Alerts - Symptom Definitions - Click Add.

Select base object type: Expand PowerFlex Adapter - Select Volume.

  • Select the metric User Data SDC Read Latency (ms): double click on it twice so that you can define both warning and critical symptoms.
  • Select the metric User Data SDC Write Latency (ms): double click on it twice so that you can define both warning and critical symptoms.

Now, fill all the required fields as per the conditions we defined earlier.


Click Save. Now as you can see below the 4 symptom definitions are created.


Step2: Add Alert Definitions


Go to Alerts - Alert Definitions - Click Add.

  • Provide alert name, select the base object type and advanced settings and click Next.

  • Filter and search the symptoms that we created earlier. Drag and drop the two volume read latency related symptoms and select Any. Click Next.

  • If you want to provide any recommendations you can add it in this step and click Next.
  • Select vSphere Solution's Default Policy and click Next and click Create.
Similarly, you can create an alert definition for PowerFlex Volume Write Latency too.


Now, we are all done. Let's test the alerts! I am using FIO to generate IO load on one of the PowerFlex volume.


You can see the Read Latency for this volume is grater than 1 ms, and so a warning alert should be produced for this specific volume.




Hope it was useful. Cheers!

Related posts


Part1: Install
Part2: Configure
Part3: Dashboards
Part4: Resource kinds and relationships
Part5: Collection interval 


References



Friday, December 4, 2020

Dell EMC PowerFlex MP for vROps 8.x - Part5 - Collection interval

In this post, we will take a look at modifying the collection interval of  PowerFlex Adapter instances. The PowerFlex Management Pack for vROps supports 4 instance types.

  • PowerFlex Gateway
  • PowerFlex Networking
  • PowerFlex Manager
  • PowerFlex Nodes

The default collection interval for all these adapter instances is set to 5 minutes. In most cases, you don't need to modify this. But, say you want to get PowerFlex storage performance metrics more frequently, then you have to change the collection interval of the PowerFlex Gateway instance. You can set it to as low as 1 minute. As per the testing that I have done in the lab, a PowerFlex Gateway adapter instance is able to complete the collection process of a PowerFlex storage cluster in less than a minute.

Note: If you are modifying the collection interval from the default value, make sure to verify that the collection process is able to complete successfully within the new time interval.

Administration - Inventory - Adapter Instances - PowerFlex Adapter Instance

Note: In the product guide it is recommended to configure not more than 40 Cisco switches in one PowerFlex Networking instance. So, if you have 80 switches in your PowerFlex system, you will need to configure 2 PowerFlex Networking instances where each instance will connect/ query/ collect details from 40 switches. This is based on the default collection interval of 5 minutes.

This simply means, in 5 minutes one PowerFlex Networking adapter instance can complete the collection from a max of 40 switches only. So, in 1 minute, it can complete the collection of a maximum of 8 switches. This is a rough calculation and it depends on factors like REST API response, switch firmware/ OS version, etc. So if you change the default interval, always make sure to monitor it (the collection cycle) for some time and verify whether the collection process is able to complete successfully within the new time interval. 

Hope it was useful. Cheers!

Related posts


Part1 - Install
Part2 - Configure
Part3 - Dashboards
Part4 - Resource kinds and relationships


References


Saturday, November 28, 2020

Storage performance benchmarking of Tanzu Kubernetes Clusters

Benchmarking of IT infrastructure is standard practice and is usually done before putting it into a production environment. It gives you baseline values about different performance aspects of the system/ solution under test. These benchmarking principles are applicable for Kubernetes clusters too. But the test cases and evaluation criteria may slightly vary compared to benchmarking a traditional IT infrastructure. 

Following are some of the test considerations:

  • Performance of PVCs.
    • Time to provision PVCs.
    • Read/ Write IOPS and Latency of PVCs.
  • Pod startup latency.
  • The time consumed to complete the deployment of different K8s objects.
    • Statefulset
    • Deployment etc.
  • Performance behavior of sample application workloads.
  • Network performance and connectivity between different K8s nodes.

In this article, I will explain a quick and easy way to benchmark the storage system used by the Kubernetes cluster to provision PVCs for application workloads. I am using FIO to generate storage IOs. You can use the following YAML file to deploy FIO pods as a statefulset. Note that here I am using PowerFlex VVOL datastore as Cloud Native Storage (CNS) for Tanzu K8s clusters and so the storage class "powerflex-storage-policy". This may differ in your case, and you might need to modify it to match the storage class available in your setup.


This YAML file will deploy a statefulset with 15 FIO pods (as per the number of replicas mentioned) and will start the storage IO stress test (8k block size, 70% random reads, 30% random writes, 2 jobs, 16 iodepth) on the attached PVC as and when the pod is started. Total 15 PVCs will be created in this case, and one PVC will get attached to one FIO pod. 

Note: If you get an error "forbidden: unable to validate against any pod security policy" after applying the above statefulset, then the pods will not get created. You will need to first create and apply Pod Security Policy (PSP) to the Tanzu Kubernetes Cluster.


Following is an overview of my vSphere with Tanzu setup:

Tanzu K8s control plane nodes/ master VMs: 3
Tanzu K8s worker nodes/ VMs: 15


Contexts, Tanzu K8s cluster nodes, and storage class.


Create a statefulset using the above YAML file.
kubectl apply -f https://gist.githubusercontent.com/vineethac/7c9f6ce2b72868b8832a4404b79ebba2/raw/980f9d6c24c10b1b7b39b20d80c15a9f2ee6c4f1/fio_ss.yaml -n <namespace name>


You can see that it took roughly 6 minutes to deploy 15 FIO pods and corresponding PVCs. The time may vary depending on whether the FIO image is locally available on the nodes, available resources on the nodes, etc.  


As and when each pod is created, FIO will automatically start IO stress on it. IOs will be read/ written into the attached PVCs. As I mentioned earlier, I am using a storage class "powerflex-storage-policy" and this is associated with a VVOL datastore backed by a PowerFlex storage pool. In this case, all the PVCs are created in a PowerFlex VVOL datastore.


You can also see multiple volumes in the PowerFlex UI and all those volume names starting with "vasa" are externally managed by the PowerFlex VASA provider. The performance of each volume can be also be monitored using the PowerFlex UI.


If you would like to see the historical performance data, you can use vROps. Dell EMC has recently released a vROps management pack for PowerFlex systems. It is a monitoring and alerting solution that provides extensive visibility into the PowerFlex infrastructure. For monitoring K8s clusters and resources, you can use the vROps management pack for container monitoring


Note: When the duration mentioned in the FIO test is over, the pods will get restarted and the IO stress will also start. To modify the FIO parameters you can use kubectl edit statefulset fiopod-statefulset-multipod -n fiogit modify required parameters and save it. After saving it the new changes will get applied automatically. Once you are done with the testing, you can delete the statefulset and the corresponding PVCs using kubectl delete command. This method is useful when you want to test something quickly or if you have only less test profiles. If you have many test profiles with varying block sizes, iodepth, etc, then you will need to build a small script or something to automate the process. 

Hope it was useful. Cheers!


Related articles


References


Sunday, November 8, 2020

Dell EMC PowerFlex MP for vROps 8.x - Part4 - Resource kinds and relationships

In this post, we will take a look at the different resource kinds that are part of the Dell EMC PowerFlex Management Pack. Following is a very high-level logical representation of the PowerFlex Adapter resource kinds and their relationships:


Go to Environment - All objects - PowerFlex Adapter


You can also get a PowerFlex system level view in vROps using the PowerFlex rack/ appliance system resource kind. This system view is making use of the system name field that we provided while configuring each PowerFlex Adapter instance type. The system name is used to group all the logical components of one PowerFlex system. 


This view provides end-to-end visibility of the PowerFlex infrastructure components that will be useful to understand the relationship between different layers of the stack. This will be also helpful to identify and troubleshoot in case of issues.

Hope it was useful. Cheers!

Related posts


Part1 - Install
Part2 - Configure
Part3 - Dashboards


Wednesday, November 4, 2020

Dell EMC PowerFlex MP for vROps 8.x - Part3 - Dashboards

We have covered the installation and configuration of the PowerFlex Management Pack in the previous posts. In this post, we will have a look at the different dashboards that are part of the MP. Following are the 13 dashboards you will get after installing the MP:

Overview
  • PowerFlex System Overview
PowerFlex Manager
  • PowerFlex Manager Details
Management Controller 
  • PowerFlex Management Controller
Compute
  • PowerFlex ESXi Cluster Usage
  • PowerFlex ESXi Host Usage
  • PowerFlex SVM Utilization
Networking
  • PowerFlex Networking Environment
  • PowerFlex Networking Performance
Storage
  • PowerFlex Summary
  • PowerFlex Details
  • PowerFlex Replication Details
Server Hardware
  • PowerFlex Node Summary
  • PowerFlex Node Details

Now, let's have a quick look at some of these dashboards and their functionality.

PowerFlex Node Summary


This dashboard shows the health of all PowerFlex nodes being monitored by the MP. You can see the classification of nodes as Compute Only, Storage Only, Hyperconverged, and Management Controller along with a relationship between a node and its corresponding hardware components.


PowerFlex Summary


This dashboard shows the health status of all the logical components of the PowerFlex storage system. It also has a parent-child relationship between different objects of the storage system. You can also see widgets for capacity usage trend forecasting, alerts, top storage pools by capacity usage, top volumes by size, etc.


PowerFlex Details


This dashboard shows all PowerFlex storage performance KPIs like IOPS, Bandwidth, Latency, etc.


PowerFlex Networking Environment


You can see the health status of Cisco networking components and the relationship between network interfaces, nodes, switch ports, VLANs, port-channels, etc.


PowerFlex Networking Performance


This dashboard shows the switch and switch port KPIs like Throughout, Errors, Packet discards, etc.


PowerFlex Manager


You can see the service deployment details like service health, RCM compliance status, deployment status, etc. in this dashboard.


Hope it was useful. Cheers!

References


Monday, November 2, 2020

Dell EMC PowerFlex MP for vROps 8.x - Part2 - Configure

In this post, I will explain how to configure the PowerFlex Management Pack for vROps


Before getting into the configuration, I would like to provide a high-level view of my lab setup. I have two separate PowerFlex rack systems that I will be monitoring using the management pack. The two systems are named RAMS and VIKINGS and have the following components.



The PowerFlex Management Pack supports the following 4 instance types:
  • PowerFlex Networking - queries and collects networking details from Cisco switches
  • PowerFlex Gateway - queries and collects storage details from PowerFlex Gateway
  • PowerFlex Nodes - queries and collects server hardware health details from iDRACs
  • PowerFlex Manager - queries and collects service deployment details from PowerFlex Manager

Note: The default collection interval for all PowerFlex Adapter instance types is set to 5 minutes.

I have already configured the controller VCSA and customer VCSA of both (RAMS and VIKINGS) clusters as shown below. This makes use of the native vSphere Adapter and vSAN Adapter present in vROps.


Note: The PowerFlex MP is already installed in vROps. Please see the previous post on how to install it.

Now we can start adding required accounts for the PowerFlex Adapter to connect to the different REST endpoints.

PowerFlex Networking


Click add account.


Select the PowerFlex Adapter.


Let's configure the account for monitoring Cisco TOR switches of the RAMS cluster.

Provide the following details:

  • Name
  • Management IP address of Cisco TOR switches

Select the instance type as "PowerFlex Networking" and provide a system name. 
In this case, these TOR switches are part of RAMS. So I have given the system name as RAMS.



Add a new credential. Select the credential kind as "PowerFlex Networking Adapter Credentials". 
Provide a credential name, username and password. Click OK.


Click VALIDATE CONNECTION.


If everything is fine, you will get a test connection successful message. Click OK.


Click ADD to save the account. You will see the account we just created under the other accounts page.
Initially, the status will be warning but it will turn to OK in few seconds.




Note: In the product guide it is recommended to configure not more than 40 Cisco switches in one PowerFlex Networking instance. So, if you have 80 switches in your PowerFlex system, you will need to configure 2 PowerFlex Networking instances where each instance will connect/ query/ collect details from 40 switches.

PowerFlex Gateway



PowerFlex Nodes



Make sure to provide the PowerFlex Management Controller vCenter details in the advanced settings. If you have configured the native adapter with vCenter IP address, then you have to provide the IP address in the advanced settings. In this case, I have configured the native adapter with the vCenter hostname/ FQDN, so in the advanced settings, I have provided the same FQDN. This field will be used to identify and classify the PowerFlex Management Controller nodes.

Note: In the product guide it is recommended to configure 30 iDRACs or less in one PowerFlex Node instance. So, if you have 120 nodes in your PowerFlex system, you will need to configure 4 PowerFlex Node instances where each instance will connect/ query/ collect details from 30 iDRACs.

PowerFlex Manager



Note: While adding the credentials for the PowerFlex Manager, it is mandatory to provide the PowerFlex Manager Domain Name. VXFMLOCAL is the domain name for the default admin user.

Verify the status of all accounts.



Now we have finished creating all the required accounts to monitor the RAMS system. Similarly, you can add multiple PowerFlex systems and monitor them using the management pack. In my case, I have one more PowerFlex system named VIKINGS and I have added all the required accounts as given in the following screenshot. As you can see below, for the VIKINGS system I have configured seperate instances for CO, SO, and Controller nodes. This is because the iDRAC credentials for CO, SO, and Controller nodes are different. 


In the dashboards section, you can see all the 13 dashboards. Depending on the number of components/ size of the PowerFlex system, it may take 15-20 minutes for the data to get populated in the respective dashboards. 



In the next part, we will go through the different dashboards and other capabilities of the management pack. Hope it was useful. Cheers!

References