Showing posts with label ssh. Show all posts
Showing posts with label ssh. Show all posts

Saturday, June 22, 2024

vSphere with Tanzu using NSX-T - Part31 - Troubleshooting inaccessible TKC with expired control plane certs

In the course of managing multiple Tanzu Kubernetes Clusters (TKC), I encountered an unexpected issue: the control plane certificates had expired, preventing us from accessing the cluster using the kubeconfig file. To make matters worse, we were unable to SSH into the TKC control plane Virtual Machines (VMs) due to the vmware-system-user password expiring in accordance with STIG Hardening.

The recommended workaround for updating the vmware-system-user password expiry involves applying a specific daemonset on Guest Clusters. However, this approach requires access to the TKC using its admin kubeconfig file, which was unavailable due to the expired certificates.

Warning: In case of critical production issues that affect the accessibility of your Tanzu Kubernetes Cluster (TKC), it is strongly advised to submit a product support request to our team for assistance. This will ensure that you receive expert guidance and a timely resolution to help minimize the impact on your environment.

To resolve this issue, I followed an alternative workaround: I reset the root password of the TKC control plane VMs through the vCenter VM console, as outlined in this knowledge base article. Once the root password was reset, I was able to log directly into the TKC control plane VM using the VM console.




After gaining access to the TKC control plane VM, I proceeded to renew the control plane certificates using kubeadm, as detailed in this blog post. It's essential to apply this process to all control plane nodes in your cluster to ensure proper functionality.

root [ /etc/kubernetes ]# kubeadm certs check-expiration

root [ /etc/kubernetes ]# kubeadm certs renew all
[renew] Reading configuration from the cluster...
[renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[renew] Error reading configuration from the Cluster. Falling back to default configuration

certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed
certificate for serving the Kubernetes API renewed
certificate the apiserver uses to access etcd renewed
certificate for the API server to connect to kubelet renewed
certificate embedded in the kubeconfig file for the controller manager to use renewed
certificate for liveness probes to healthcheck etcd renewed
certificate for etcd nodes to communicate with each other renewed
certificate for serving etcd renewed
certificate for the front proxy client renewed
certificate embedded in the kubeconfig file for the scheduler manager to use renewed

Done renewing certificates. You must restart the kube-apiserver, kube-controller-manager, kube-scheduler and etcd, so that they can use the new certificates.

Although this workaround required some additional steps, it ultimately allowed us to regain access to our Tanzu Kubernetes Cluster and maintain its security and functionality.

Hope it was useful. Cheers!

Friday, June 9, 2023

vSphere with Tanzu using NSX-T - Part26 - Jumpbox kubectl plugin to SSH to TKC node

For troubleshooting TKC (Tanzu Kubernetes Cluster) you may need to ssh into the TKC nodes. For doing ssh, you will need to first create a jumpbox pod under the supervisor namespace and from there you can ssh to the TKC nodes.

Here is the manual procedure: https://docs.vmware.com/en/VMware-vSphere/7.0/vmware-vsphere-with-tanzu/GUID-587E2181-199A-422A-ABBC-0A9456A70074.html


Following kubectl plugin creats a jumpbox pod under a supervisor namespace. You can exec into this jumpbox pod to ssh into the TKC VMs.

kubectl-jumpbox

#!/bin/bash

Help()
{
   # Display Help
   echo "Description: This plugin creats a jumpbox pod under a supervisor namespace. You can exec into this jumpbox pod to ssh into the TKC VMs."
   echo "Usage: kubectl jumpbox SVNAMESPACE TKCNAME"
   echo "Example: k exec -it jumpbox-tkc1 -n svns1 -- /usr/bin/ssh vmware-system-user@VMIP"
}

# Get the options
while getopts ":h" option; do
   case $option in
      h) # display Help
         Help
         exit;;
     \?) # incorrect option
         echo "Error: Invalid option"
         exit;;
   esac
done

kubectl create -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: jumpbox-$2
  namespace: $1           #REPLACE
spec:
  containers:
  - image: "photon:3.0"
    name: jumpbox
    command: [ "/bin/bash", "-c", "--" ]
    args: [ "yum install -y openssh-server; mkdir /root/.ssh; cp /root/ssh/ssh-privatekey /root/.ssh/id_rsa; chmod 600 /root/.ssh/id_rsa; while true; do sleep 30; done;" ]
    volumeMounts:
      - mountPath: "/root/ssh"
        name: ssh-key
        readOnly: true
    resources:
      requests:
        memory: 2Gi
  
  volumes:
    - name: ssh-key
      secret:
        secretName: $2-ssh     #REPLACE YOUR-CLUSTER-NAME-ssh 

  
EOF

Usage

  • Place the plugin in the system executable path.
  • I placed it in $HOME/.krew/bin directory in my laptop.
  • Once you copied the plugin to the proper path, you can make it executable by: chmod 755 kubectl-jumpbox
  • After that you should be able to run the plugin as: kubectl jumpbox SUPERVISORNAMESPACE TKCNAME


 

Example

❯ kg tkc -n vineetha-dns1-test
NAME               CONTROL PLANE   WORKER   TKR NAME                           AGE    READY   TKR COMPATIBLE   UPDATES AVAILABLE
tkc                1               3        v1.21.6---vmware.1-tkg.1.b3d708a   213d   True    True             [1.22.9+vmware.1-tkg.1.cc71bc8]
tkc-using-cci-ui   1               1        v1.23.8---vmware.3-tkg.1           37d    True    True

❯ kg po -n vineetha-dns1-test
NAME         READY   STATUS    RESTARTS   AGE
nginx-test   1/1     Running   0          29d


❯ kubectl jumpbox vineetha-dns1-test tkc
pod/jumpbox-tkc created

❯ kg po -n vineetha-dns1-test
NAME          READY   STATUS    RESTARTS   AGE
jumpbox-tkc   0/1     Pending   0          8s
nginx-test    1/1     Running   0          29d

❯ kg po -n vineetha-dns1-test
NAME          READY   STATUS    RESTARTS   AGE
jumpbox-tkc   1/1     Running   0          21s
nginx-test    1/1     Running   0          29d

❯ k jumpbox -h
Description: This plugin creats a jumpbox pod under a supervisor namespace. You can exec into this jumpbox pod to ssh into the TKC VMs.
Usage: kubectl jumpbox SVNAMESPACE TKCNAME
Example: k exec -it jumpbox-tkc1 -n svns1 -- /usr/bin/ssh vmware-system-user@VMIP

❯ kg vm -n vineetha-dns1-test -o wide
NAME                                                              POWERSTATE   CLASS               IMAGE                                                       PRIMARY-IP      AGE
tkc-control-plane-8rwpk                                           poweredOn    best-effort-small   ob-18900476-photon-3-k8s-v1.21.6---vmware.1-tkg.1.b3d708a   172.29.0.7      133d
tkc-using-cci-ui-control-plane-z8fkt                              poweredOn    best-effort-small   ob-20953521-tkgs-ova-photon-3-v1.23.8---vmware.3-tkg.1      172.29.13.130   37d
tkc-using-cci-ui-tkg-cluster-nodepool-9nf6-n6nt5-b97c86fb45mvgj   poweredOn    best-effort-small   ob-20953521-tkgs-ova-photon-3-v1.23.8---vmware.3-tkg.1      172.29.13.131   37d
tkc-workers-zbrnv-6c98dd84f9-52gn6                                poweredOn    best-effort-small   ob-18900476-photon-3-k8s-v1.21.6---vmware.1-tkg.1.b3d708a   172.29.0.6      133d
tkc-workers-zbrnv-6c98dd84f9-d9mm7                                poweredOn    best-effort-small   ob-18900476-photon-3-k8s-v1.21.6---vmware.1-tkg.1.b3d708a   172.29.0.8      133d
tkc-workers-zbrnv-6c98dd84f9-kk2dg                                poweredOn    best-effort-small   ob-18900476-photon-3-k8s-v1.21.6---vmware.1-tkg.1.b3d708a   172.29.0.3      133d

❯ k exec -it jumpbox-tkc -n vineetha-dns1-test -- /usr/bin/ssh vmware-system-user@172.29.0.7
The authenticity of host '172.29.0.7 (172.29.0.7)' can't be established.
ECDSA key fingerprint is SHA256:B7ptmYm617lFzLErJm7G5IdT7y4SJYKhX/OenSgguv8.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '172.29.0.7' (ECDSA) to the list of known hosts.
Welcome to Photon 3.0 (\m) - Kernel \r (\l)
 13:06:06 up 133 days,  4:46,  0 users,  load average: 0.23, 0.33, 0.27

36 Security notice(s)
Run 'tdnf updateinfo info' to see the details.
vmware-system-user@tkc-control-plane-8rwpk [ ~ ]$ sudo su
root [ /home/vmware-system-user ]#
root [ /home/vmware-system-user ]#


Hope it was useful. Cheers!

Monday, April 30, 2018

Infrastructure testing using Pester - Part 3

In this article, I will explain briefly about how to use Pester to validate your switching infrastructure/ switch configurations. If your switches have incorrect configurations, you will experience several problems like network disconnections, high latency, low throughput, etc. And all these will contribute towards network performance issues. In a hyper-converged infrastructure, incorrect switch configurations will affect both compute and storage performance. So it is very important to make sure your switches are configured in the right way according to best practice recommendations.

Using Pester tests, you can define the expected configuration rules and execute it against your existing switches to verify everything is configured correctly or not.

Here in this example, I will show how to verify the below.
  • Networking OS version (here I am using Dell EMC S5048F-ON switch)
  • The interfaces are Up (given a range)
  • A given set of VLANs are present and Up

Prerequisite PowerShell modules:
  • Pester - Version 4.3.1
  • Posh-SSH - Version 2.0.2

Note: I am using Powershell 5.1.14393.0

#Collect input 
#Provide interface range to verify status
[int]$Start_port = Read-host "Enter starting switch port number"
[int]$End_port = Read-host "Enter ending switch port number"
#Provide VLANs to verify status
[int[]]$vlans = 20,23
$check_vlans = @{}

#New SSH session to the Switch 
$sw_creds = Get-Credential -Message "Enter switch creds"

Write-Host "Creating new SSH session to Switch."
$SWssh = New-SSHSession -ComputerName 192.168.10.4 -Credential $sw_creds -Force -ConnectionTimeout 300
Write-Host "Collecting configuration details from Switch. This will take few seconds."
Start-Sleep -s 3

Write-Host "Collecting VLAN details from Switch. This will take few seconds."
for ($j=0; $j -lt $vlans.Count; $j++) {
    Write-Host "Collecting details of VLAN $($vlans[$j])"
    $cmd_vlan = "show interfaces vlan $($vlans[$j])"
    $check_vlan = invoke-sshcommand -Command $cmd_vlan -SSHSession $SWssh
    Start-Sleep -s 3
    $check_vlans[$j] = $check_vlan.Output
}

#Collecting networking OS info 
$Networking_OS = Invoke-SSHCommand -SSHSession $SWssh -Command "show system"
Start-Sleep -s 3

#Collecting interface status details
$interface_cmd =  "show interfaces twentyFiveGigE 1/$Start_port-1/$End_port"
$interface_status = Invoke-SSHCommand -SSHSession $SWssh -Command $interface_cmd

Write-Host "Configuration verification started.`n"

Describe "System basic checks" {
    Context "Check networking OS version" {
        It "Should be Dell EMC Networking OS Version : 9.12(1.0)" {
            ($Networking_OS.Output) -match 'Dell EMC Networking OS Version : 9\.12\(1\.0\)\s\s$' | Should be $true
        }
    }
}

$Global:i=1

Describe "Interface checks" {
    for ($i=$Start_port; $i -le $End_port; $i++{
        Context "Interface should be UP" {
            It "Interface 1/$i should be UP" {
                 $Global:c1 = "twentyFiveGigE 1/$i is up, line protocol is up"
                 $res = ($interface_status.Output) -match $c1
                 $res | should be $true
            }
        }
    }
}

Describe "VLAN checks" {
    for ($j=0; $j -lt $vlans.Count; $j++) {
        Context "Check VLAN $($vlans[$j])" {
             It "Should contain VLAN $($vlans[$j])" {
                  $check = ($check_vlans[$j]) -match '% Error: No such interface'
                  $check | should be $null
                  Write-host $check
             }
             It "VLAN $($vlans[$j]) should be UP" {
                  $tt = "Vlan $($vlans[$j]) is up, line protocol is up"
                  $t3 = ($check_vlans[$j]) -match $tt
                  $t3 | should be $true
             }
        }
    }
}

Remove-SSHSession -SSHSession $SWssh

#Sample output

You can write custom Pester tests according to your switching infrastructure configuration, where you can verify port channels, VLAN membership of switch ports, VLT configuration etc. Hope this was helpful. Cheers!