Friday, April 8, 2022

Working with Kubernetes using Python - Part 03 - Get nodes

Following code snipet uses kubeconfig python module to switch context and Python client for the kubernetes API to get cluster node details. It takes the default kubeconfig file, and switch to the required context, and get node info of the respective cluster. 

kubectl commands: 

kubectl config get-contexts
kubectl config current-context
kubectl config use-context <context_name>
kubectl get nodes -o json

Code: 

Reference:

https://kubeconfig-python.readthedocs.io/en/latest/
https://github.com/kubernetes-client/python

Hope it was useful. Cheers!

Saturday, March 19, 2022

Working with Kubernetes using Python - Part 02 - Switch context

Following code snipet uses kubeconfig python module and it takes the default kubeconfig file, and switch to a new context. 

kubectl commands: 

kubectl config get-contexts
kubectl config current-context
kubectl config use-context <context_name>

Code: 

Note: If you want to use a specific kubeconfig file, instead of  conf = KubeConfig() 

you can use conf = KubeConfig('path-to-your-kubeconfig')

Reference:

https://kubeconfig-python.readthedocs.io/en/latest/

Hope it was useful. Cheers!

Friday, March 4, 2022

Working with Kubernetes using Python - Part 01 - List contexts

Following code snipet uses Python client for the kubernetes API and it takes the default kubeconfig file, list the contexts, and active context. 

kubectl commands: 

kubectl config get-contexts
kubectl config current-context

Code: 

Note: If you want to use a specific kubeconfig file, instead of  

contexts, active_context = config.list_kube_config_contexts() 

you can use  

contexts, active_context = config.list_kube_config_contexts(config_file="path-to-kubeconfig")

Reference:

https://github.com/kubernetes-client/python

Hope it was useful. Cheers!

Saturday, February 19, 2022

Kubernetes 101 - Part5 - Removing namespaces stuck in terminating state

Namespaces getting stuck at terminating state is one of the common issues I have seen while working with K8s. Here is an example namespace in terminating state and you can see there are no resources under it. In this case we are removing this namespace by setting the finalizers to null.

% kg ns rohitgu-intelligence-cluster-6
NAME                             STATUS        AGE
rohitgu-intelligence-cluster-6   Terminating   188d

% kg pods,tkc,all,cluster-api -n rohitgu-intelligence-cluster-6
No resources found in rohitgu-intelligence-cluster-6 namespace.

% kg ns rohitgu-intelligence-cluster-6 -o json > rohitgu-intelligence-cluster-6-json

% jq '.spec.finalizers = [] | .metadata.finalizers = []' rohitgu-intelligence-cluster-6-json > rohitgu-intelligence-cluster-6-json-nofinalizer

% cat rohitgu-intelligence-cluster-6-json-nofinalizer
{
  "apiVersion": "v1",
  "kind": "Namespace",
  "metadata": {
    "annotations": {
      "calaxxxxx.xxxx.com/ccsrole-created": "true",
      "calaxxxxx.xxxx.com/owner": "rohitgu",
      "calaxxxxx.xxxx.com/user-namespace": "rohitgu",
      "ls_id-0": "e28442b5-ace0-4e20-b5a0-c32bc72427d9",
      "ncp/extpoolid": "domain-c1034:02cde809-99d1-423e-aac9-014889740308-ippool-10-186-120-1-10-186-123-254",
      "ncp/router_id": "t1_87d44fc8-ac60-441a-8e35-509ff31a4eba_rtr",
      "ncp/snat_ip": "10.186.120.40",
      "ncp/subnet-0": "172.29.1.144/28",
      "vmware-system-resource-pool": "resgroup-663809",
      "vmware-system-vm-folder": "group-v663810"
    },
    "creationTimestamp": "2021-08-26T09:11:39Z",
    "deletionTimestamp": "2022-03-02T07:16:27Z",
    "labels": {
      "kubernetes.io/metadata.name": "rohitgu-intelligence-cluster-6",
      "vSphereClusterID": "domain-c1034"
    },
    "name": "rohitgu-intelligence-cluster-6",
    "resourceVersion": "1133900371",
    "selfLink": "/api/v1/namespaces/rohitgu-intelligence-cluster-6",
    "uid": "87d44fc8-ac60-441a-8e35-509ff31a4eba",
    "finalizers": []
  },
  "spec": {
    "finalizers": []
  },
  "status": {
    "conditions": [
      {
        "lastTransitionTime": "2022-03-02T07:16:32Z",
        "message": "Discovery failed for some groups, 1 failing: unable to retrieve the complete list of server APIs: data.packaging.carvel.dev/v1alpha1: the server is currently unable to handle the request",
        "reason": "DiscoveryFailed",
        "status": "True",
        "type": "NamespaceDeletionDiscoveryFailure"
      },
      {
        "lastTransitionTime": "2022-03-02T07:16:56Z",
        "message": "All legacy kube types successfully parsed",
        "reason": "ParsedGroupVersions",
        "status": "False",
        "type": "NamespaceDeletionGroupVersionParsingFailure"
      },
      {
        "lastTransitionTime": "2022-03-02T07:16:56Z",
        "message": "All content successfully deleted, may be waiting on finalization",
        "reason": "ContentDeleted",
        "status": "False",
        "type": "NamespaceDeletionContentFailure"
      },
      {
        "lastTransitionTime": "2022-03-02T07:23:22Z",
        "message": "All content successfully removed",
        "reason": "ContentRemoved",
        "status": "False",
        "type": "NamespaceContentRemaining"
      },
      {
        "lastTransitionTime": "2022-03-02T07:23:22Z",
        "message": "All content-preserving finalizers finished",
        "reason": "ContentHasNoFinalizers",
        "status": "False",
        "type": "NamespaceFinalizersRemaining"
      }
    ],
    "phase": "Terminating"
  }
}

% kubectl replace --raw "/api/v1/namespaces/rohitgu-intelligence-cluster-6/finalize" -f rohitgu-intelligence-cluster-6-json-nofinalizer

% kg ns rohitgu-intelligence-cluster-6
Error from server (NotFound): namespaces "rohitgu-intelligence-cluster-6" not found

Hope it was useful. Cheers!

Sunday, January 30, 2022

vSphere with Tanzu using NSX-T - Part14 - Testing TKC storage using kubestr

In the previous posts we discussed the following:

This article is about using kubestr to test storage options of Tanzu Kubernetes Cluster (TKC). Following are the steps to install kubestr on MAC:

  • wget https://github.com/kastenhq/kubestr/releases/download/v0.4.31/kubestr_0.4.31_MacOS_amd64.tar.gz
  • tar -xvf kubestr_0.4.31_MacOS_amd64.tar.gz 
  • chmod +x kubestr
  • mv kubestr /usr/local/bin

 

Now, lets do kubestr help.

% kubestr help
kubestr is a tool that will scan your k8s cluster
        and validate that the storage systems in place as well as run
        performance tests.

Usage:
  kubestr [flags]
  kubestr [command]

Available Commands:
  browse      Browse the contents of a CSI PVC via file browser
  csicheck    Runs the CSI snapshot restore check
  fio         Runs an fio test
  help        Help about any command

Flags:
  -h, --help             help for kubestr
  -e, --outfile string   The file where test results will be written
  -o, --output string    Options(json)

Use "kubestr [command] --help" for more information about a command.

 

I am going to use the following TKC for testing.

% KUBECONFIG=gc.kubeconfig kubectl get nodes                                            
NAME                               STATUS   ROLES                  AGE    VERSION
gc-control-plane-pwngg             Ready    control-plane,master   103d   v1.20.9+vmware.1
gc-workers-wrknn-f675446b6-cz766   Ready    <none>                 103d   v1.20.9+vmware.1
gc-workers-wrknn-f675446b6-f6zqs   Ready    <none>                 103d   v1.20.9+vmware.1
gc-workers-wrknn-f675446b6-rsf6n   Ready    <none>                 103d   v1.20.9+vmware.1

 

Let's run kubestr against the cluster now.

% KUBECONFIG=gc.kubeconfig kubestr                                      

**************************************
  _  ___   _ ___ ___ ___ _____ ___
  | |/ / | | | _ ) __/ __|_   _| _ \
  | ' <| |_| | _ \ _|\__ \ | | |   /
  |_|\_\\___/|___/___|___/ |_| |_|_\

Explore your Kubernetes storage options
**************************************
Kubernetes Version Check:
  Valid kubernetes version (v1.20.9+vmware.1)  -  OK

RBAC Check:
  Kubernetes RBAC is enabled  -  OK

Aggregated Layer Check:
  The Kubernetes Aggregated Layer is enabled  -  OK

W0130 14:17:16.937556   87541 warnings.go:70] storage.k8s.io/v1beta1 CSIDriver is deprecated in v1.19+, unavailable in v1.22+; use storage.k8s.io/v1 CSIDriver
Available Storage Provisioners:

  csi.vsphere.xxxx.com:
    Can't find the CSI snapshot group api version.
    This is a CSI driver!
    (The following info may not be up to date. Please check with the provider for more information.)
    Provider:            vSphere
    Website:             https://github.com/kubernetes-sigs/vsphere-csi-driver
    Description:         A Container Storage Interface (CSI) Driver for VMware vSphere
    Additional Features: Raw Block,<br/><br/>Expansion (Block Volume),<br/><br/>Topology Aware (Block Volume)

    Storage Classes:
      * sc2-01-vc16c01-wcp-mgmt

    To perform a FIO test, run-
      ./kubestr fio -s <storage class>

 

 

You can run storage tests using kubestr and it uses FIO for generating IOs. For example this is how you can run a basic storage test.

% KUBECONFIG=gc.kubeconfig kubestr fio -s sc2-01-vc16c01-wcp-mgmt -z 10G
PVC created kubestr-fio-pvc-zvdhr
Pod created kubestr-fio-pod-kdbs5
Running FIO test (default-fio) on StorageClass (sc2-01-vc16c01-wcp-mgmt) with a PVC of Size (10G)
Elapsed time- 29.290421119s
FIO test results:
 
FIO version - fio-3.20
Global options - ioengine=libaio verify=0 direct=1 gtod_reduce=1

JobName: read_iops
  blocksize=4K filesize=2G iodepth=64 rw=randread
read:
  IOPS=3987.150391 BW(KiB/s)=15965
  iops: min=3680 max=4274 avg=3992.034424
  bw(KiB/s): min=14720 max=17096 avg=15968.827148

JobName: write_iops
  blocksize=4K filesize=2G iodepth=64 rw=randwrite
write:
  IOPS=3562.628906 BW(KiB/s)=14267
  iops: min=3237 max=3750 avg=3565.896484
  bw(KiB/s): min=12950 max=15000 avg=14264.862305

JobName: read_bw
  blocksize=128K filesize=2G iodepth=64 rw=randread
read:
  IOPS=2988.549316 BW(KiB/s)=383071
  iops: min=2756 max=3252 avg=2992.344727
  bw(KiB/s): min=352830 max=416256 avg=383056.187500

JobName: write_bw
  blocksize=128k filesize=2G iodepth=64 rw=randwrite
write:
  IOPS=2754.796143 BW(KiB/s)=353151
  iops: min=2480 max=2992 avg=2759.586182
  bw(KiB/s): min=317440 max=382976 avg=353242.781250

Disk stats (read/write):
  sdd: ios=117160/105647 merge=0/1210 ticks=2100090/2039676 in_queue=4139076, util=99.608589%
  -  OK

As you can see, a PVC of 10G, a FIO pod will be created, and this will be used for the FIO test. Once the test is complete, the PVC and FIO pod will be deleted automatically. 

I hope it was useful. Cheers!


Saturday, December 18, 2021

vSphere with Tanzu using NSX-T - Part13 - Export WCP admin kubeconfig

In the previous posts we discussed the following:

This article shows the steps to export WCP admin kubeconfig file from the supervisor control plane VM. This is the admin kubeconfig file that can be used to manage the Supervisor/ WCP K8s cluster.

Step1: SSH as root to the vCenter server.

Step2: Run the script /usr/lib/vmware-wcp/decryptK8Pwd.py and make a note of the IP and PWD.

Step3: SSH as root to the IP that you noted down from previous step, and then provide the password that you got from step2.

Step4: You can now copy the admin kubeconfig file from /etc/kubernetes/admin.conf file to your local machine. Make sure to modify the field server: https://127.0.0.1:6443 in your local admin.conf file to the IP that you got from step2 (server: https://IP_from_step2:6443). 

Note: If you are managing multiple WCP clusters, you can merge all the kubeconfig files. Refer this blog by Jacob Tomlinson for more details. 

Hope it was useful. Cheers!

Friday, December 10, 2021

ESXi in a HA cluster fails to Enter Maintenance Mode and gets stuck

Recently we came across a situation where when we try to put a ESXi host in Maintenance Mode, it is getting stuck at certain level. These ESXi nodes were part of a vSphere with Tanzu 7 U3 cluster. While troubleshooting we noticed that there are some VMs that are either orphaned or inaccessible running on it. We deleted those orphaned and inaccessible VMs and then the ESXi node enters Maintenance Mode successfully.

You can use VMware PowerCLI to list those orphaned and inaccessible VMs.

(Get-VMHost <host_fqdn> | Get-VM | Where {$_.ExtensionData.Summary.Runtime.ConnectionState -eq "orphaned"}) | select Name,Id,PowerState

(Get-VMHost <host_fqdn> | Get-VM | Where {$_.ExtensionData.Summary.Runtime.ConnectionState -eq "inaccessible"}) | select Name,Id,PowerState

We then deleted those orphaned and inaccessible VMs. You can try to delete them using Remove-VM command. 

Remove-VM -VM <vm_name> -DeletePermanently 

If that does not work, you can try with dcli.

dcli> com vmware vcenter vm delete --vm <vm-id>

Hope it was useful.