Showing posts with label monitoring. Show all posts
Showing posts with label monitoring. Show all posts

Sunday, December 3, 2023

Kubernetes mini project

In this mini project, we are going to learn the following:

  • Deploy a simple Python based web application on a Kubernetes cluster.
  • We will use Helm to deploy this app.
  • This web app uses FastAPI and exposes some metrics using the Prometheus Python client.
  • To store and visualize these metrics we will deploy Prometheus and Grafana in the K8s cluster.
  • We will also deploy and use an ingress controller for exposing the web app, Prometheus, and Grafana to external users.
  • For logging we will deploy and use Grafana Loki stack.


Full project in my GitHub

High-level steps to complete this project

Step1: Write the Python app.

Step2: Create the Dockerfile for the app.

Step3: Create the container image.

Step4: Push the container image to an image registry like Docker Hub.

Step5: Get access to a K8s cluster.

Step6: Deploy an ingress controller.

Step7: Create the Helm chart for your app and deploy it to the K8s cluster.

Step8: Deploy Prometheus stack on the K8s cluster using Helm.

Step9: Create a servicemonitor resource which defines the target to be monitored by Prometheus.

Step10: Verify targets and service discovery in Prometheus.

Step11: Configure Grafana dashboard and verify.

Step12. Deploy Grafana Loki stack using Helm.


Hope it was useful. Cheers!

Sunday, July 23, 2023

Kubernetes 101 - Part11 - Find Kubernetes nodes with DiskPressure

Following are two quick and easy ways to find Kubernetes nodes with disk pressure:

jq:


kubectl get nodes -o json | jq -r '.items[] | select(.status.conditions[].reason=="KubeletHasDiskPressure") | .metadata.name'


jsonpath:


kubectl get nodes -o jsonpath='{range .items[*]} {.metadata.name} {" "} {.status.conditions[?(@.type=="DiskPressure")].status} {" "} {"\n"}'


❯ kubectl get no
NAME                                 STATUS   ROLES                  AGE     VERSION
tkc-btvsm-72hz2                      Ready    control-plane,master   124d    v1.23.8+vmware.3
tkc-btvsm-79xtn                      Ready    control-plane,master   124d    v1.23.8+vmware.3
tkc-btvsm-klmjz                      Ready    control-plane,master   124d    v1.23.8+vmware.3
tkc-workers-2cmvm-5bfcc5c9cd-gmv6m   Ready    <none>                 5d17h   v1.23.8+vmware.3
tkc-workers-2cmvm-5bfcc5c9cd-m44sq   Ready    <none>                 5d17h   v1.23.8+vmware.3
tkc-workers-2cmvm-5bfcc5c9cd-mjjlk   Ready    <none>                 5d17h   v1.23.8+vmware.3
tkc-workers-2cmvm-5bfcc5c9cd-wflrl   Ready    <none>                 5d17h   v1.23.8+vmware.3
tkc-workers-2cmvm-5bfcc5c9cd-xnqvk   Ready    <none>                 5d17h   v1.23.8+vmware.3
❯
❯
❯ kubectl get nodes -o json | jq -r '.items[] | select(.status.conditions[].reason=="KubeletHasDiskPressure") | .metadata.name'
tkc-workers-2cmvm-5bfcc5c9cd-m44sq
tkc-workers-2cmvm-5bfcc5c9cd-wflrl
❯
❯ kubectl get nodes -o jsonpath='{range .items[*]} {.metadata.name} {" "} {.status.conditions[?(@.type=="DiskPressure")].status} {" "} {"\n"}'
 tkc-btvsm-72hz2   False
 tkc-btvsm-79xtn   False
 tkc-btvsm-klmjz   False
 tkc-workers-2cmvm-5bfcc5c9cd-gmv6m   False
 tkc-workers-2cmvm-5bfcc5c9cd-m44sq   True
 tkc-workers-2cmvm-5bfcc5c9cd-mjjlk   False
 tkc-workers-2cmvm-5bfcc5c9cd-wflrl   True
 tkc-workers-2cmvm-5bfcc5c9cd-xnqvk   False
 %
❯

Hope it was useful. Cheers!

Sunday, June 27, 2021

vSphere with Tanzu using NSX-T - Part9 - Monitoring

In the previous posts we discussed the following:

Part1 - Prerequisites

Part2 - Configure NSX

Part3 - Edge Cluster

Part4 - Tier-0 Gateway and BGP peering

Part5 - Tier-1 Gateway and Segments

Part6 - Create tags, storage policy, and content library

Part7 - Enable workload management


In this article, I will explain some of the popular tools used for monitoring Kubernetes clusters that provides insight into different objects in K8s, status, metrics, logs, and so on.

  • Lens
  • Octant
  • Prometheus and Grafana
  • vROps and Kubernetes Management Pack
  • Kubebox


-Lens-

Download the Lens binary file from: https://k8slens.dev/


I am installing it on a Windows server. Once the installation is complete, the first thing you have to do is to provide the Kube config file details so that Lens can connect to the Kubernetes cluster and start monitoring it.

Add Cluster

Click File - Add Cluster


You can either browse and select the Kube config file or you can paste the content of your Kube config file as text. I am just pasting it as text.

 

Once you have pasted your Kube config file contents, make sure to select the context, and then click Add cluster.


Deploy Prometheus stack

If you aren't seeing CPU and memory metrics, you will need to install the Prometheus stack on your K8s cluster. And Lens has a feature that deploys the Prometheus stack on your K8s cluster with the click of a button!

Select the cluster icon and click Settings.


Scroll all the way to the end, and under Features, you will find an Install button. In my case, I've already installed it, that's why it's showing the Uninstall button.


Once you click the Install button, Lens will go ahead and install the Prometheus stack on the selected K8s cluster. After few minutes, you should be able to see all the metrics.

You can see a namespace called "lens-metrics" and under that, the Prometheus stack components are deployed.


Following are the service objects that are created as part of the Prometheus stack deployment.


And, here is the PVC that is attached to the Prometheus pod.


Terminal access

Click on Terminal to get access directly to the K8s cluster.


Pod metrics, SSH to the pod, and container logs



Scaling
 
Note: In a production environment, it is always a best practice to apply configuration changes to your K8s cluster objects through a version control system.


You can also see the Service Accounts, Roles, Role Bindings, and PSPs under the Access Control tab. For more details see https://docs.k8slens.dev/main/.


-Octant-

https://vineethac.blogspot.com/2020/08/visualize-your-kubernetes-clusters-and.html


-Prometheus and Grafana-



-vROps and Kubernetes Management Pack-

https://blogs.vmware.com/management/2020/12/announcing-the-vrealize-operations-management-pack-for-kubernetes-1-5-1.html

https://rudimartinsen.com/2021/03/07/vrops-kubernetes-mgmt-pack/

https://www.brockpeterson.com/post/vrops-management-pack-for-kubernetes


-Kubebox-


curl -Lo kubebox https://github.com/astefanutti/kubebox/releases/download/v0.9.0/kubebox-linux && chmod +x kubebox


Select namespace


Select Pod

This will show the selected pod metrics and logs.


Note: Kubebox relies on cAdvisor to retrieve the resource usage metrics. It’s recommended to use the provided cadvisor.yaml file, that’s tested to work with Kubebox. 

kubectl apply -f https://raw.github.com/astefanutti/kubebox/master/cadvisor.yaml

Kubebox: https://github.com/astefanutti/kubebox

Hope it was useful. Cheers!

Saturday, January 23, 2021

Benchmarking Kubernetes infrastructure using K-Bench

K-Bench is a framework to benchmark the control and data plane aspects of a Kubernetes cluster. More details are available at https://github.com/vmware-tanzu/k-bench. In my case, I am going to conduct this benchmarking study on a Tanzu Kubernetes cluster which is provisioned using Tanzu Kubernetes Grid service on a vSphere 7 U1 cluster.

Step 1: Clone the K-Bench repo

git clone https://github.com/vmware-tanzu/k-bench.git


Step 2: Install

./install.sh


Once the installation is done it will say, "Completed k-bench installation.".

Step 3: Run the benchmark

./run.sh


If you don't specify any test, then it is going to conduct the default set of tests. All sets of tests are defined under the config directory. If you browse to the config directory and list, there are separate folders specific to each test. You can see folders starting with cp and dp, and it refers to control plane and data plane related tests.


If no specific test is mentioned, then it is going to run all that is defined in the default directory. You can also see details of the test and results in the logs. The directories starting with "results" will have log files corresponding to each test run.


Following is a sample log that shows a summary of pod creation throughput, pod creation average latency, pod startup total latency, list/ update/ delete pod latency, etc.


Now, if you want to run a specific test case, you can do it as follows:
Usage: ./run.sh -r <run-tag> [-t <comma-separated-tests> -o <output-dir>]
DP network internode test

For example, you can run a data plane test to check the network performance between two nodes as shown below.

./run.sh -r "kbench-run-on-tkg-cluster-02"  -t "dp_network_internode" -o "./"


As soon as you run the above command, two pods will be created inside "kbench-pod-namespace" on two worker nodes as you see below.


It will then start "iperf3" process inside those two pods to create a network load following a client-server model as per the actions defined in the config.json file.


Sample logs are given below. It shows details like the amount of data transferred, transfer rate, network latency, etc.


Once the test run is complete, the pods and other resources created will be automatically deleted. Similarly, you can select the other set of tests that are pre-defined in the framework. I believe you have the flexibility to define custom test cases too as per your requirements. I hope it was useful. Cheers!

Related posts


Storage performance benchmarking of Tanzu Kubernetes clusters
Monitoring Tanzu Kubernetes cluster using Prometheus and Grafana


References



Friday, January 1, 2021

Dell EMC PowerFlex MP for vROps 8.x - Part7 - Create custom reports

In March 2020, I published a blog on how to create custom views and reports in vROps 8.x. This article explains how to create a custom storage report for Dell EMC PowerFlex using the PowerFlex Management Pack for vROps 8.x. 

Sample PowerFlex Storage Report PDF and template is available in my GitHub repo for download. You can use it as a starting point/ modify it as per requirement.

To create a new view: Dashboards - Views - Add.

Provide a name and description for the new view. Here, for example, I will create a view that shows PowerFlex Protection Domain Info.



Select List.


Select Protection Domain as subject and group it by PowerFlex Rack/ Appliance System.


Double click or drag and drop the selected metrics or properties to include in the view. In the following screenshot, I selected 4 capacity metrics to include in the view.


You can also select and change the units and transformation as per requirements. Once it is done, click Save.

Now a view is created. Similarly, you can create multiple views for the different PowerFlex resource kinds. The next step is to include this view in an existing template or in a new template. 

To create a new report template: Dashboards - Reports - Add.

  • Provide a name and description for the new report template.
  • From the views and dashboards, find the PowerFlex Protection Domain Info view that we created earlier, double-click or drag and drop them to the right pane. You can add multiple views to be included in this report template.
  • Select PDF and CSV.
  • Select all the layout options if you like to and click Save.
  • Now the custom report template is created. You can select it and click Run.

Select PowerFlex and then select PowerFlex World and click ok.


The report will run in the background and will be available to download under the "Generated Reports" tab. You can select it and download the PDF or CSV file. You can even configure a schedule to generate a report and email it or save it to a location automatically based on your requirements. Hope it was useful. Cheers!

Related posts


Tuesday, December 15, 2020

Dell EMC PowerFlex MP for vROps 8.x - Part6 - Create custom alerts

In this post, we will take a look at creating custom alerts for PowerFlex by adding symptom definitions and alert definitions. Refer to my previous blog post to understand more about the alerting aspects in vROps. Here we will take an example scenario and see how we can create custom symptom definitions and alert definitions.

Scenario


The user is running some latency-sensitive business-critical applications using PowerFlex storage. Below are the symptoms that he would like to define and alerts should be produced for the same and these should affect the "Health" badge of the PowerFlex volume object.


Step1: Add Symptom Definitions


Go to Alerts - Symptom Definitions - Click Add.

Select base object type: Expand PowerFlex Adapter - Select Volume.

  • Select the metric User Data SDC Read Latency (ms): double click on it twice so that you can define both warning and critical symptoms.
  • Select the metric User Data SDC Write Latency (ms): double click on it twice so that you can define both warning and critical symptoms.

Now, fill all the required fields as per the conditions we defined earlier.


Click Save. Now as you can see below the 4 symptom definitions are created.


Step2: Add Alert Definitions


Go to Alerts - Alert Definitions - Click Add.

  • Provide alert name, select the base object type and advanced settings and click Next.

  • Filter and search the symptoms that we created earlier. Drag and drop the two volume read latency related symptoms and select Any. Click Next.

  • If you want to provide any recommendations you can add it in this step and click Next.
  • Select vSphere Solution's Default Policy and click Next and click Create.
Similarly, you can create an alert definition for PowerFlex Volume Write Latency too.


Now, we are all done. Let's test the alerts! I am using FIO to generate IO load on one of the PowerFlex volume.


You can see the Read Latency for this volume is grater than 1 ms, and so a warning alert should be produced for this specific volume.




Hope it was useful. Cheers!

Related posts


Part1: Install
Part2: Configure
Part3: Dashboards
Part4: Resource kinds and relationships
Part5: Collection interval 


References



Friday, December 4, 2020

Dell EMC PowerFlex MP for vROps 8.x - Part5 - Collection interval

In this post, we will take a look at modifying the collection interval of  PowerFlex Adapter instances. The PowerFlex Management Pack for vROps supports 4 instance types.

  • PowerFlex Gateway
  • PowerFlex Networking
  • PowerFlex Manager
  • PowerFlex Nodes

The default collection interval for all these adapter instances is set to 5 minutes. In most cases, you don't need to modify this. But, say you want to get PowerFlex storage performance metrics more frequently, then you have to change the collection interval of the PowerFlex Gateway instance. You can set it to as low as 1 minute. As per the testing that I have done in the lab, a PowerFlex Gateway adapter instance is able to complete the collection process of a PowerFlex storage cluster in less than a minute.

Note: If you are modifying the collection interval from the default value, make sure to verify that the collection process is able to complete successfully within the new time interval.

Administration - Inventory - Adapter Instances - PowerFlex Adapter Instance

Note: In the product guide it is recommended to configure not more than 40 Cisco switches in one PowerFlex Networking instance. So, if you have 80 switches in your PowerFlex system, you will need to configure 2 PowerFlex Networking instances where each instance will connect/ query/ collect details from 40 switches. This is based on the default collection interval of 5 minutes.

This simply means, in 5 minutes one PowerFlex Networking adapter instance can complete the collection from a max of 40 switches only. So, in 1 minute, it can complete the collection of a maximum of 8 switches. This is a rough calculation and it depends on factors like REST API response, switch firmware/ OS version, etc. So if you change the default interval, always make sure to monitor it (the collection cycle) for some time and verify whether the collection process is able to complete successfully within the new time interval. 

Hope it was useful. Cheers!

Related posts


Part1 - Install
Part2 - Configure
Part3 - Dashboards
Part4 - Resource kinds and relationships


References