Showing posts with label ESXI. Show all posts
Showing posts with label ESXI. Show all posts

Friday, August 15, 2025

Understanding NUMA: Its Impact on VM Performance in ESXi

VMware ESXi hosts use Non-Uniform Memory Access (NUMA) architecture to optimize CPU and memory locality. Each NUMA node consists of a subset of CPUs and memory. Accessing local memory within the same NUMA node is significantly faster than remote memory access. Misaligned NUMA configurations can lead to latency spikes, increased CPU Ready Time, and degraded VM performance.


Key symptoms

The common symptoms for Virtual Machines (VMs) on ESXi that have a misconfigured or misaligned Non-Uniform Memory Access (NUMA) configuration primarily manifest as performance degradation and latency. The main issue caused by NUMA misalignment is that the VM's vCPUs end up frequently having to access memory that belongs to a different physical NUMA node on the ESXi host (known as Remote Access), which is significantly slower than accessing local memory.

The resulting symptoms for the VM include:

  • Overall Slowness and Unresponsiveness: Services and applications running inside the guest OS may respond slowly or intermittently. The entire VM can feel sluggish.

  • High CPU Ready Time (%RDY): This is the most critical ESXi-level metric. CPU Ready Time represents the percentage of time a VM was ready to run but could not be scheduled on a physical CPU. High %RDY times (often above 5% or 10%) can indicate that the VM's vCPUs are struggling to get scheduled efficiently, which happens when they are spread across multiple NUMA nodes (NUMA spanning).

  • Excessive Remote Memory Access: When a VM consumes more vCPUs or memory than is available on a single physical NUMA node, a portion of its memory traffic becomes "remote." You can check this using the esxtop utility on the ESXi host.

Common misconfigurations


Misalignment often occurs when the VM's vCPU and memory settings exceed the resources of a single physical NUMA node on the host. Common causes include:

  • Over-Sized VM: Allocating more vCPUs than the physical cores available in a single physical NUMA node or allocating more memory than the physical memory on a single NUMA node.
  • Hot-Add Features: Enabling CPU Hot-Add or Memory Hot-Add can disable vNUMA (Virtual NUMA) for the VM, preventing the VMkernel from presenting an optimized NUMA topology to the guest OS.
  • Incorrect Cores per Socket Setting: While vSphere 6.5 and later are smarter about vNUMA, configuring the Cores per Socket value manually in a way that doesn't align with the host's physical NUMA topology can still lead to poor scheduling and memory placement, particularly when licensing dictates a low number of virtual sockets.
  • Setting VM Limits: Setting a memory limit on a VM that is lower than its configured memory can force the VMkernel to allocate the remaining memory from a remote NUMA node.

Check NUMA assignments in ESXi

  • SSH into the ESXi node.
  • Issue the esxtop command and press m for memory view, then press f to enable the fields, G to enable NUMA information.
  • You should be able to view the NUMA related information like NRMEM, NLMEM, and N%L.
    • NRMEM (MB): NUMA Remote MEMory
      • This is the current amount of a VM's memory (in MB) that is physically located on a remote NUMA node relative to where the VM's vCPUs are currently running.
      • High NRMEM indicates NUMA locality issues, meaning the vCPUs must cross the high-speed interconnect (like Intel's QPI/UPI or AMD's Infinity Fabric) to access some of their data, which results in slower performance.
    • NLMEM (MB): NUMA Local MEMory
      • This is the current amount of a VM's memory (in MB) that is physically located on the local NUMA node, meaning it's on the same physical node as the vCPUs accessing it.
      • The ESXi NUMA scheduler's goal is to maximize NLMEM to ensure fast memory access.
    • N%L: NUMA % Locality
      • This is the percentage of the VM's total memory that resides on the local NUMA node.
      • A value close to 100% is ideal, indicating excellent memory locality. If this value drops below 80%, the VM may experience poor NUMA locality and potential performance issues due to slower remote memory access.
  • Issue the esxtop command and press v to see the virtual machine screen.
  • From the virtual machine screen note down the GID of the VM under consideration, and press q to exit the screen.
  • Now issue the sched-stats -t numa-clients command. This will list down NUMA details of the VM. Check the groupID column to match the GID of the VM.
  • For example, the GID of the VM I am looking at is 7886858. This is a 112 CPU VM which is running on an 8-socket physical host.

  • You can see the VM is spread/ placed under NUMA nodes 0, 1, 2, and 3.
  • The remoteMem is 0, for each of these NUMA nodes, which means they are accessing all the local memory of the NUMA node.
  • To view physical NUMA details of the ESXI you can use sched-stats -t numa-pnode command. You can see this server has 8 NUMA nodes.
  • To view the NUMA latency, you can use the sched-stats -t numa-latency command.

Verify NUMA node details at guest OS


Windows
  • Easiest way is to go to Task Manager - Performance - CPU
    • Right click on the CPU utilization graph and select Change graph to - NUMA nodes
    • If there only one NUMA node, you may notice the option as greyed out.
  • To get detailed info you can consider using the sysinternals utility coreinfo64.

Linux
  • To view NUMA related details from the Linux guest OS layer, you can use the following commands:
lscpu | grep -i NUMA
dmesg | grep -i NUMA

Remediation


The most common remediation steps for fixing Non-Uniform Memory Access (NUMA) related performance issues in ESXi VMs revolve around right-sizing the VM to align its resources with the physical NUMA boundaries of the host.

The primary goal is to minimize Remote Memory Access (NRMEM) and maximize Local Memory Access (N%L). The vast majority of NUMA issues stem from a VM's resource allocation crossing a physical NUMA node boundary.

  • Right-Size VMs: Keep vCPU count within physical cores of a single NUMA node.
  • Evenly Divide Resources: For monster/ wide VMs, ensure the total vCPUs are configured such that they are evenly divisible by the number of physical NUMA nodes they span.
    • Example: If a VM needs 16 vCPUs on a host with 12-core NUMA nodes, configure the vCPUs to be a multiple of a NUMA node count (e.g., 2 sockets $\times$ 8 cores per socket to create 2 vNUMA nodes, aligning with 2 pNUMA nodes).
  • Cores per Socket Setting (Important for older vSphere/Licensing): While vSphere 6.5 and later automatically present an optimal vNUMA topology, you should still configure the Cores per Socket setting on the VM to create a vNUMA structure that aligns with the physical NUMA boundaries of the host. This helps the guest OS make better scheduling decisions.
  • Disable VM CPU/ Memory Hot-Add: Plan capacity upfront.

NUMA awareness is critical for troubleshooting and optimizing VM performance on ESXi. Misconfigured NUMA placements can severely impact latency-sensitive workloads like databases and analytics. Regular checks at both the hypervisor and guest OS layers ensure memory locality, reduce latency, and improve efficiency.

References


Hope it was useful. Cheers!

Saturday, July 12, 2025

Troubleshooting ESXi PSOD: A Quick Guide for SREs

When an ESXi host hits a Purple Screen of Death (PSOD), it’s more than just a crash - it’s a signal that something critical needs attention. Here’s how to handle it effectively.


What happens during a PSOD?

  • The ESXi server displays a purple diagnostic screen.
  • You’ll see alerts/ incidents for host connectivity, FC/ Ethernet link down, and related alarms.
  • The console screen confirms the purple screen.

Immediate actions

  • Capture screenshots of the PSOD from the console screen.
  • Check server hardware health via the out-of-band management interface like iDRAC/ RMC/ BMC.
  • Observe if the host is stuck or rebooting repeatedly.
    • If ESXi reloads successfully, immediately place the node in Maintenance Mode via vCenter.
    • If it keeps crashing, try to capture all PSOD instances.

Collecting logs

  • Generate a support bundle from vCenter once the host is online.
  • Collect server hardware logs.

Engage support

  • Broadcom/ VMware: Share PSOD screenshots and ESXi support bundle for RCA.
  • Hardware vendor: Attach server hardware logs, screenshots, and context for analysis.

Analyze crash dumps

  • Look for these keywords in the core dump logs: BlueScreen, Backtrace, Exception
  • In the ESXi support bundle you will find the crash dump logs under /var/core directory.
  • Analyzing the core dump files should help you find the root cause of the PSOD event. It could be due to some hardware issue, bugs in ESXi hypervisor, faults in device firmware or drivers, etc.
  • You may notice many vmkernel-zdump files, and to quickly filter out all the BlueScreen events, you can use the following PowerShell code snippet.
------------------------------------------------------------------
param(
    [string]$directoryPath,
    [string[]]$keywords
)

# Function to search for keywords in files
function Search-Files {
    param (
        [string]$path,
        [string[]]$keywords
    )

    # Get all files in the directory and subdirectories
    $files = Get-ChildItem -Path $path -Recurse -File

    # Loop through each file
    foreach ($file in $files) {
        # Read the content of the file
        $content = Get-Content -Path $file.FullName

        # Loop through each line in the file content
        foreach ($line in $content) {
            # Check if the line contains any of the keywords (case-insensitive)
            foreach ($keyword in $keywords) {
                if ($line -match "(?i)$keyword") {
                    # Print the file name and the matching line
                    Write-Output "File: $($file.FullName)"
                    Write-Output "Line: $line"
                    Add-Content -path out.txt -value $line
                    break
                }
            }
        }
    }
}

# Call the function with the provided parameters
Search-Files -path $directoryPath -keywords $keywords
------------------------------------------------------------------
  • Save the above code snippet to a .ps1 file (example: find.ps1) and you can run it as follows:
> .\find.ps1 -directoryPath "C:\esx-esxi1.xre.com-2025-04-18--13.05-2106842\var\core" -keywords "bluescreen"
> .\find.ps1 -directoryPath "C:\esx-esxi1.xre.com-2025-04-18--13.05-2106842\var\core" -keywords "bluescreen", "#PF Exception"
  • All the log lines that include the given keyword or keywords will be saved to out.txt file.
  • A sample output of the above-mentioned code snippet against an ESXi core dump is given below.

  • Once you identify the root cause of the PSOD event, you can start working towards the resolution which may involve replacing a faulty hardware component, updating firmware/ driver/ ESXi, etc.

References

Hope it was useful. Cheers!

Saturday, January 18, 2025

VMware PowerCLI 101 - part12 - Get uptime and boot time of ESXi nodes and VMs

You can use the following PowerCLI commandlets to get the uptime and boot time of ESXi hosts and VMs.

  • Get ESXi boot time.
> Get-Cluster | Get-VMHost | Select Name, @{Name="Boot time";Expression={$_.ExtensionData.Runtime.BootTime}}

Name                                  Boot time
----                                  ---------
esxi1.wxxxxx.com                      2/27/2025 10:13:45 AM
esxi2.wxxxxx.com                      6/18/2025 2:39:38 PM

  • Get ESXi uptime.
> Get-Cluster | Get-VMHost | Select Name, @{Name="Uptime (Days)";Expression={(Get-Date) - $_.ExtensionData.Runtime.BootTime | Select-Object -ExpandProperty Days}}

Name                                  Uptime (Days)
----                                  -------------
esxi1.wxxxxx.com                      116
esxi2.wxxxxx.com                      5

  • Get VM boot time.
> Get-Cluster | Get-VM | Select Name, @{Name="Boot time";Expression={$_.ExtensionData.Runtime.BootTime}}

Name                                      Boot time
----                                      ---------
test-vineeth
rhel84-vineeth
VMware vCenter Server                     2/25/2025 11:28:13 AM
vCLS-50a1b507-5a1d-5e00-8410-7599fc72e5b5 6/18/2025 2:51:59  PM
  • Get VM uptime.
> Get-Cluster | Get-VM | Select Name, @{Name="Uptime (Days)";Expression={(Get-Date) - $_.ExtensionData.Runtime.BootTime | Select-Object -ExpandProperty Days}}

Name                                      Uptime (Days)
----                                      -------------
test-vineeth
rhel84-vineeth
VMware vCenter Server                     118
vCLS-50a1b507-5a1d-5e00-8410-7599fc72e5b5 5

Hope it was useful. Cheers!

Friday, September 20, 2024

VMware PowerCLI 101 - part10 - Verify vSphere cluster state

To verify the most common selected status attributes of the cluster:

Get-Cluster | Get-VMHost | Select Name,@{N='HAState';E={$_.ExtensionData.Runtime.DasHostState.State}},ConnectionState,PowerState,@{N='OverallStatus';E={$_.ExtensionData.OverallStatus}},@{N='ConfigStatus';E={$_.ExtensionData.ConfigStatus}},@{N='InMaintenanceMode';E={$_.ExtensionData.Runtime.InMaintenanceMode}},@{N='RebootRequired';E={$_.ExtensionData.Summary.RebootRequired}},@{N='BootTime';E={$_.ExtensionData.Runtime.BootTime}} | ft

> Get-Cluster | Get-VMHost | Select Name,@{N='HAState';E={$_.ExtensionData.Runtime.DasHostState.State}},ConnectionState,PowerState,@{N='OverallStatus';E={$_.ExtensionData.OverallStatus}},@{N='ConfigStatus';E={$_.ExtensionData.ConfigStatus}},@{N='InMaintenanceMode';E={$_.ExtensionData.Runtime.InMaintenanceMode}},@{N='RebootRequired';E={$_.ExtensionData.Summary.RebootRequired}},@{N='BootTime';E={$_.ExtensionData.Runtime.BootTime}} | ft

Name      HAState           ConnectionState PowerState OverallStatus ConfigStatus InMaintenanceMode RebootRequired BootTime
----      -------           --------------- ---------- ------------- ------------ ----------------- -------------- --------
10.90.1.4 connectedToMaster       Connected  PoweredOn        yellow       yellow             False          False 8/27/2024 7:31:10 AM
10.90.1.5 master                  Connected  PoweredOn        yellow       yellow             False          False 9/6/2024 9:08:09 PM

Hope it was useful. Cheers!

Wednesday, June 26, 2024

vSphere with Tanzu using NSX-T - Part32 - Troubleshooting BGP related issues

This article provides basic guidance on troubleshooting BGP related issues.

Sample diagram showing connectivity between Edge Nodes and TOR switches

Verify Tier-0 Gateway status on NSX-T

  • Status of T0 should be Success.


  • Check the interfaces of T0 to identify which all edge nodes are part of it.


  • Check the status of Edge Transport Nodes.


  • As you can see from the T0 interfaces, Edge01/02/03/04 are part of it and in those edge nodes you should be able to see the SR_TIER0 component. Next step is to login to those Edge nodes that are part of T0 and verify BGP summary.

Verify BGP on all Edge nodes that are part of T0 Gateway  

  • SSH into the edge node as admin user.
  • get logical-router
  • Look for SERVICE_ROUTER_TIER0.
sc2-01-nsxt04-r08edge02> get logical-router
Logical Router
UUID                                   VRF    LR-ID  Name                              Type                        Ports   Neighbors
736a80e3-23f6-5a2d-81d6-bbefb2786666   0      0                                        TUNNEL                      4       22/5000
e6d02207-c51e-4cf8-81a6-44afec5ad277   2      84653  DR-t1-domain-c1034:1de3adfa-0ee   DISTRIBUTED_ROUTER_TIER1    5       9/50000
a590f1da-2d79-4749-8153-7b174d23b069   32     85271  DR-t1-domain-c1034:1de3adfa-0ee   DISTRIBUTED_ROUTER_TIER1    5       5/50000
758d9736-6781-4b3a-906f-3d1b03f0924d   33     88016  DR-t1-domain-c1034:1de3adfa-0ee   DISTRIBUTED_ROUTER_TIER1    4       1/50000
5e7bfe98-0b5e-4620-90b1-204634e99127   37     3      SR-sc2-01-nsxt04-tr               SERVICE_ROUTER_TIER0        6       5/50000
  • vrf <SERVICE_ROUTER_TIER0 VRF>
  • get bgp neighbor summary
  • Note: If everything is working fine State should show Estab.
sc2-01-nsxt04-r08edge02> vrf 37
sc2-01-nsxt04-r08edge02(tier0_sr[37])> get bgp neighbor summary
BFD States: NC - Not configured, DC - Disconnected
            AD - Admin down, DW - Down, IN - Init, UP - Up
BGP summary information for VRF default for address-family: ipv4Unicast
Router ID: 10.184.248.2  Local AS: 4259971071

Neighbor                            AS          State Up/DownTime  BFD InMsgs  OutMsgs InPfx  OutPfx

10.184.248.239                      4259970544  Estab 05w1d22h     NC  12641393 12610093 2      568
10.184.248.240                      4259970544  Estab 05w1d23h     NC  12640337 11580431 2      566

  • You should be able to ping to the BGP neighbor IP. If you are unable to ping to neighbor IPs, then there is an issue.
sc2-01-nsxt04-r08edge02(tier0_sr[37])> ping 10.184.248.239
PING 10.184.248.239 (10.184.248.239): 56 data bytes
64 bytes from 10.184.248.239: icmp_seq=0 ttl=255 time=1.788 ms
^C
--- 10.184.248.239 ping statistics ---
2 packets transmitted, 1 packets received, 50.0% packet loss
round-trip min/avg/max/stddev = 1.788/1.788/1.788/0.000 ms

sc2-01-nsxt04-r08edge02(tier0_sr[37])> ping 10.184.248.240
PING 10.184.248.240 (10.184.248.240): 56 data bytes
64 bytes from 10.184.248.240: icmp_seq=0 ttl=255 time=1.925 ms
64 bytes from 10.184.248.240: icmp_seq=1 ttl=255 time=1.251 ms
^C
--- 10.184.248.240 ping statistics ---
3 packets transmitted, 2 packets received, 33.3% packet loss
round-trip min/avg/max/stddev = 1.251/1.588/1.925/0.337 ms

  • Get interfaces | more
sc2-01-nsxt04-r08edge02> vrf 37
sc2-01-nsxt04-r08edge02(tier0_sr[37])> get interfaces | more
Fri Aug 19 2022 UTC 11:07:18.042
Logical Router
UUID                                   VRF    LR-ID  Name                              Type
5e7bfe98-0b5e-4620-90b1-204634e99127   37     3      SR-sc2-01-nsxt04-tr               SERVICE_ROUTER_TIER0
Interfaces (IPv6 DAD Status A-DAD_Success, F-DAD_Duplicate, T-DAD_Tentative, U-DAD_Unavailable)
    Interface     : dd83554d-47c0-5a4e-9fbe-3abb1239a071
    Ifuid         : 335
    Mode          : cpu
    Port-type     : cpu
    Enable-mcast  : false

    Interface     : 008b2b15-17d1-4cc8-9d94-d9c4c2d0eb3a
    Ifuid         : 1000
    Name          : tr-interconnect-edge02
    Fwd-mode      : IPV4_AND_IPV6
    Internal name : uplink-1000
    Mode          : lif
    Port-type     : uplink
    IP/Mask       : 10.184.248.2/24
    MAC           : 02:00:70:51:9d:79
    VLAN          : 1611



Verify BGP on Cisco TOR switches

  • SSH to TOR switch.
  • show ip bgp summary
❯ ssh -o PubkeyAuthentication=no netadmin@sc2-01-r08lswa.xxxxxxxx.com
User Access Verification
(netadmin@sc2-01-r08lswa.xxxxxxxx.com) Password:

Cisco Nexus Operating System (NX-OS) Software

sc2-01-r08lswa# show ip bgp summary
BGP summary information for VRF default, address family IPv4 Unicast
BGP router identifier 10.184.17.248, local AS number 65001.65008
BGP table version is 520374, IPv4 Unicast config peers 10, capable peers 8
5150 network entries and 11372 paths using 2003240 bytes of memory
BGP attribute entries [110/18920], BGP AS path entries [69/1430]
BGP community entries [0/0], BGP clusterlist entries [0/0]
11356 received paths for inbound soft reconfiguration
11356 identical, 0 modified, 0 filtered received paths using 0 bytes

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.184.10.14    4 65011.65000
                        47979514 10570342   520374    0    0     5w1d 4541
10.184.10.78    4 65011.65000
                        47814555 10601750   520374    0    0     5w1d 4541
10.184.248.1    4 65001.65535
                          80831   79447   520374    0    0 02:41:51 566
10.184.248.2    4 65001.65535
                        3215614 3269391   520374    0    0     5w1d 566
10.184.248.3    4 65001.65535
                        3215776 3269344   520374    0    0     1w3d 566
10.184.248.4    4 65001.65535
                        3215676 3269383   520374    0    0 13:51:45 566
10.184.248.5    4 65001.65535
                        3200531 3269384   520374    0    0     5w1d 5
10.184.248.6    4 65001.65535
                        3197752 3266700   520374    0    0     5w1d 5


  • show ip arp
sc2-01-r08lswa# show ip arp 10.184.248.2

Flags: * - Adjacencies learnt on non-active FHRP router
       + - Adjacencies synced via CFSoE
       # - Adjacencies Throttled for Glean
       CP - Added via L2RIB, Control plane Adjacencies
       PS - Added via L2RIB, Peer Sync
       RO - Re-Originated Peer Sync Entry
       D - Static Adjacencies attached to down interface

IP ARP Table
Total number of entries: 1
Address         Age       MAC Address     Interface       Flags
10.184.248.2    00:06:12  0200.7051.9d79  Vlan1611


  • If you compare this IP and MAC, you can see that its the same of your T0 SR uplink of your edge02 node.
IP/Mask       : 10.184.248.2/24
MAC           : 02:00:70:51:9d:79

For further troubleshooting you can do packet capture from the edge nodes and ESXi server and analyze them using Wireshark.

Packet capture from Edge node

  • Capture packets from the T0 SR uplink interface.
sc2-01-nsxt04-r08edge01(tier0_sr[5])> get interfaces | more
Wed Aug 17 2022 UTC 13:52:48.203
Logical Router
UUID                                   VRF    LR-ID  Name                              Type
fb1ad846-8757-4fdf-9cbb-5c22ba772b52   5      2      SR-sc2-01-nsxt04-tr               SERVICE_ROUTER_TIER0
Interfaces (IPv6 DAD Status A-DAD_Success, F-DAD_Duplicate, T-DAD_Tentative, U-DAD_Unavailable)
    Interface     : c8b80ba1-93fc-5c82-a44f-4f4863b6413c
    Ifuid         : 286
    Mode          : cpu
    Port-type     : cpu
    Enable-mcast  : false

    Interface     : 4915d978-9c9a-58bc-84e2-cafe5442cba4
    Ifuid         : 287
    Mode          : blackhole
    Port-type     : blackhole

    Interface     : 899bcf30-83e2-46bb-9be2-8889ec52b354
    Ifuid         : 833
    Name          : tr-interconnect-edge01
    Fwd-mode      : IPV4_AND_IPV6
    Internal name : uplink-833
    Mode          : lif
    Port-type     : uplink
    IP/Mask       : 10.184.248.1/24
    MAC           : 02:00:70:d1:92:b1
    VLAN          : 1611
    Access-VLAN   : untagged
    LS port       : 15b971e9-7caa-43b7-86c1-96ff50453402
    Urpf-mode     : STRICT_MODE
    DAD-mode      : LOOSE
    RA-mode       : SLAAC_DNS_TRHOUGH_RA(M=0, O=0)
    Admin         : up
    Op_state      : up
    Enable-mcast  : False
    MTU           : 9000
    arp_proxy     :


  • Start a continuous ping from the TOR switches to the edge uplink IP (in this case ping 10.184.248.1 from TOR switches) before starting packet capture.
sc2-01-nsxt04-r08edge01> start capture interface 899bcf30-83e2-46bb-9be2-8889ec52b354 file uplink.pcap


Note:
Find the location of uplink.pcap file on TOR switches and SCP it locally to analyze using Wireshark.

 

Packet capture from ESXi

  • In this example, we are capturing packets of sc2-01-nsxt04-r08edge01 VM from the switchports where its interfaces are connected. sc2-01-nsxt04-r08edge01 VM is running on ESXi node sc2-01-r08esx10.
[root@sc2-01-r08esx10:~] esxcli network vm list | grep edge
18790721  sc2-01-nsxt04-r08edge05                                                 3  , ,
18977245  sc2-01-nsxt04-r08edge01                                                 3  , ,

[root@sc2-01-r08esx10:/tmp] esxcli network vm port list -w 18977245
   Port ID: 67109446
   vSwitch: sc2-01-vc16-dvs
   Portgroup:
   DVPort ID: b60a80c0-ecd6-40bd-8d2b-fbd1f06bb172
   MAC Address: 02:00:70:33:a9:67
   IP Address: 0.0.0.0
   Team Uplink: vmnic1
   Uplink Port ID: 2214592517
   Active Filters:

   Port ID: 67109447
   vSwitch: sc2-01-vc16-dvs
   Portgroup:
   DVPort ID: 6e3d8057-fc23-4180-b0ba-bed90381f0bf
   MAC Address: 02:00:70:d1:92:b1
   IP Address: 0.0.0.0
   Team Uplink: vmnic1
   Uplink Port ID: 2214592517
   Active Filters:

   Port ID: 67109448
   vSwitch: sc2-01-vc16-dvs
   Portgroup:
   DVPort ID: c531df19-294d-4079-b39c-89a3b58e30ad
   MAC Address: 02:00:70:30:c7:01
   IP Address: 0.0.0.0
   Team Uplink: vmnic0
   Uplink Port ID: 2214592519
   Active Filters:



  • Start a continuous ping from the TOR switches to the edge uplink IP (in this case ping 10.184.248.1 from TOR switches) before starting packet capture.
[root@sc2-01-r08esx10:/tmp] pktcap-uw --switchport 67109446 --dir 2 -o /tmp/67109446-02:00:70:33:a9:67.pcap --count 1000 & pktcap-uw --switchport 67109447 --dir 2 -o /tmp/67109447-02:00:70:d1:92:b1.pcap --count 1000 & pktcap-uw --switchport 67109448 --dir 2 -o /tmp/67109448-02:00:70:30:c7:01.pcap --count 1000




Note:
SCP the pcap files to laptop and use Wireshark to analyse them.
You can also do packet capture from physical uplinks (vmnic) of the ESXi node if required.

Hope it was useful. Cheers!

Saturday, May 20, 2023

vSphere with Tanzu using NSX-T - Part25 - Spherelet

The Spherelet is based on the Kubernetes “Kubelet” and enables an ESXi hypervisor to act as a Kubernetes worker node. Sometimes you may notice that the worker nodes of your supervisor cluster are having NotReady,SchedulingDisabled status, and it maybe becuase spherelet is not running on those ESXi nodes.

Following are the steps to verify the status of spherelet service, and restart them if required.

Example:
❯ kubectx wdc-01-vcxx
Switched to context "wdc-01-vcxx".
❯ kubectl get node
NAME                               STATUS                        ROLES                  AGE    VERSION
42019f7e751b2818bb0c659028d49fdc   Ready                         control-plane,master   317d   v1.22.6+vmware.wcp.2
4201b0b21aed78d8e72bfb622bb8b98b   Ready                         control-plane,master   317d   v1.22.6+vmware.wcp.2
4201c53dcef2701a8c36463942d762dc   Ready                         control-plane,master   317d   v1.22.6+vmware.wcp.2
wdc-01-rxxesx04.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx05.xxxxxxxxx.com      NotReady,SchedulingDisabled   agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx06.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx32.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx33.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx34.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx35.xxxxxxxxx.com      Ready,SchedulingDisabled      agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx36.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx37.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx38.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx39.xxxxxxxxx.com      NotReady,SchedulingDisabled   agent                  317d   v1.22.6-sph-db56d46
wdc-01-rxxesx40.xxxxxxxxx.com      Ready                         agent                  317d   v1.22.6-sph-db56d46

Logs

  • ssh into the ESXi worker node.
tail -f /var/log/spherelet.log 


Status

  • ssh into the ESXi worker node and run the following:
etc/init.d/spherelet status
  •  You can check status of spherelet using PowerCLI. Following is an example:
> Connect-VIServer wdc-10-vcxx

> Get-VMHost | Get-VMHostService | where {$_.Key -eq "spherelet"}  | select VMHost,Key,Running | ft

VMHost                        Key       Running
------                        ---       -------
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True
wdc-10-r0xxxxxxxxxxxxxxxxxxxx spherelet    True

Restart

  • ssh into the ESXi worker node and run the following:
/etc/init.d/spherelet restart
  • You can also restart spherelet service using PowerCLI. Following is an example to restart spherelet service on ALL the ESXi worker nodes of a cluster:
> Get-Cluster

Name                           HAEnabled  HAFailover DrsEnabled DrsAutomationLevel
                                          Level
----                           ---------  ---------- ---------- ------------------
wdc-10-vcxxc01                 True       1          True       FullyAutomated

> Get-Cluster -Name wdc-10-vcxxc01 | Get-VMHost | foreach { Restart-VMHostService -HostService ($_ | Get-VMHostService | where {$_.Key -eq "spherelet"}) }

Certificates

You may notice the ESXi worker nodes in NotReady state when the following spherelet certs expire.
  • /etc/vmware/spherelet/spherelet.crt
  • /etc/vmware/spherelet/client.crt
 
An example is given below:
❯ kg no
NAME STATUS ROLES AGE VERSION
420802008ec0d8ccaa6ac84140768375 Ready control-plane,master 70d v1.22.6+vmware.wcp.2
42087a63440b500de6cec759bb5900bf Ready control-plane,master 77d v1.22.6+vmware.wcp.2
4208e08c826dfe283c726bc573109dbb Ready control-plane,master 77d v1.22.6+vmware.wcp.2
wdc-08-rxxesx25.xxxxxxxxx.com NotReady agent 370d v1.22.6-sph-db56d46
wdc-08-rxxesx26.
xxxxxxxxx.com NotReady agent 370d v1.22.6-sph-db56d46
wdc-08-rxxesx23.
xxxxxxxxx.com NotReady agent 370d v1.22.6-sph-db56d46
wdc-08-rxxesx24.
xxxxxxxxx.com NotReady agent 370d v1.22.6-sph-db56d46
wdc-08-rxxesx25.
xxxxxxxxx.com NotReady agent 370d v1.22.6-sph-db56d46
wdc-08-rxxesx26.
xxxxxxxxx.com NotReady agent 370d v1.22.6-sph-db56d46

You can ssh into the ESXi worker nodes and verify the validity of the above mentioned certs. They have a life time of one year.
 
Example:
[root@wdc-08-rxxesx25:~] openssl x509 -enddate -noout -in /etc/vmware/spherelet/spherelet.crt
notAfter=Sep 1 08:32:24 2023 GMT
[root@wdc-08-rxxesx25:~] openssl x509 -enddate -noout -in /etc/vmware/spherelet/client.crt
notAfter=Sep 1 08:32:24 2023 GMT
Depending on your support contract, if its a production environment you may need to open a case with VMware GSS for resolving this issue. 
 
Ref KBs: 


Verify

❯ kubectl get node
NAME                               STATUS   ROLES                  AGE     VERSION
42017dcb669bea2962da27fc2f6c16d2   Ready    control-plane,master   5d20h   v1.23.12+vmware.wcp.1
4201b763c766875b77bcb9f04f8840b3   Ready    control-plane,master   5d21h   v1.23.12+vmware.wcp.1
4201dab068e9b2d3af3b8fde450b3d96   Ready    control-plane,master   5d20h   v1.23.12+vmware.wcp.1
wdc-01-rxxesx04.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx05.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx06.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx32.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx33.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx34.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx35.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx36.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx37.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx38.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx39.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
wdc-01-rxxesx40.xxxxxxxxx.com      Ready    agent                  5d19h   v1.23.5-sph-81ef5d1
   

Hope it was useful. Cheers!

Saturday, December 10, 2022

vSphere with Tanzu using NSX-T - Part21 - Pointers while upgrading the stack

Here are some pointers.

The main workflow while upgrading the entire stack is:

  1. upgrade NSX-T
  2. upgrade vCenter server
  3. upgrade ESXi nodes
  4. upgrade vSAN and vDS (if required)
  5. upgrade WCP

Note: Make sure you follow the compatibility matrix while performing upgrades in a production environment.

NSX-T

While NSX-T is getting upgraded, the host transport nodes will also need to be updated. NSX components will be installed/ updated on the ESXi nodes and may need a reboot also. 

In NSX-T: System > Lifecycle Management > Upgrade > Upgrade NSX > Hosts

Plan

-Upgrade order across groups - Serial

-Pause upgrade condition - When an upgrade unit fails to upgrade

Host Groups

-Upgrade order within group - Serial

-Upgrade mode - Maintenance

 

Notes: 

  • There is an in-place upgrade mode for the host groups, and in this mode the NSX component on the ESXi nodes will get updated first and then the node will be placed in MM and then gets rebooted. But we have had some cases where the NSX component update on the ESXi node fails, and the workload/ TKC VMs running on that ESXi loses network connectivity. To fix this the ESXi node has to be rebooted, and while trying to put the node in MM, the TKC VMs running on it fails to get migrated, and the final option is to force reboot the ESXi and that affects the workload running on it. To avoid this case, the safest option is to choose the upgrade mode as Maintenance, so that the TKCs VMs will get migrated off the ESXi node first before updating the NSX component, and even if the component update fails, we are safe to reboot the node as it is in already in MM. 
  • We have also noticed in some cases the ESXi nodes in a WCP cluster fails to enter MM. Following are some common cases:
  • Orphaned or inaccessible VMs are one of the main causes. You can follow this article to find those and clean them up.
  • I have noticed cases like some VMs (vSphere pods) were present on the ESXi host in poweredoff state. In this case the ESXi was stuck 100% in maintenance mode. You will need to check on those VMs and then delete them if required to proceed further.
  • There are cases where some TKC VMs were present on the ESXi host and they fail to get migrated. Check whether these VMs are still present at the Kubernetes layer. If not, delete those VMs!
  • Check if there are any TKC nodes still present on the ESXi node. There can be cases where the TKC node or nodes are unable to migrate to other available nodes in the cluster due to resource constraints. Check whether the TKC nodes that are unable to migrate is using guaranteed vmclass. The solution is to find out unused resources/ TKCs in the cluster, check with the owner/ user and then delete it if not being used. Another way is to check with the user and change the guaranteed vmclass to besteffort if possible or temporarily reduce the number of worker nodes if the user agrees to do so.

  • Verify whether any pods are running on the respective ESXi worker node:
    kubectl get pods -A -o wide | grep ESXi-FQDN
    Safely drain the node.

Hope it was useful. Cheers!