Showing posts with label Edge VM. Show all posts
Showing posts with label Edge VM. Show all posts

Wednesday, June 26, 2024

vSphere with Tanzu using NSX-T - Part33 - Troubleshooting intermittent connection timeouts to apiserver and workloads

In the realm of managing Tanzu Kubernetes clusters (TKCs), we have encountered several challenges that hindered the smooth functioning of our applications. In this blog post, we will discuss three such cases and the workarounds we employed to resolve them.


Case 1: TKC Control Plane Node Connectivity Issues


Symptoms:
  • TKC apiserver connection timeouts when attempting to connect using the kubeconfig.
  • Traffic was not flowing to two of the control plane nodes.
  • NSX-T web UI LB VS stats indicated this issue.


Case 2: TKC Worker Node Connectivity Issues


Symptoms:
  • Workload (example: PostgreSQL cluster) connection timeouts.
  • Traffic was not flowing to two of the worker nodes in the TKC.
  • NSX-T web UI LB VS stats indicated this issue.


Case 3: Load Balancer Connectivity Issues


Symptoms:
  • Connection timeouts when attempting to connect to a PostgreSQL workload through the load balancer VS IP.
  • This issue was observed only when creating new services of type LoadBalancer in the TKC.
  • We noticed datapath mempool usage for the edge nodes was above the threshold value.


Resolution/ work around

  • Find the T1 router that is attached to the TKC which has connectivity issues. 
  • In an Active - Standby HA configuration, you will see that there will be one Edge node that will be Active and another one in Standby status. 
  • First place the Standby Edge node in NSX MM, reboot it, and then exit it from NSX MM. 
  • Now, place the Active Edge node in NSX MM, there will be a slight network disruption during this failover, once it is in NSX MM, reboot it, and then exit NSX MM. 
  • This should resolved the issue.


In conclusion, these cases illustrate the importance of verifying NSX-T components when managing Tanzu Kubernetes clusters. By identifying the root cause of the issues and employing effective workarounds, we were able to restore functionality and maintain the health of our applications. Stay tuned for more insights and best practices in managing Kubernetes clusters.

Hope it was useful. Cheers!

vSphere with Tanzu using NSX-T - Part32 - Troubleshooting BGP related issues

This article provides basic guidance on troubleshooting BGP related issues.

Sample diagram showing connectivity between Edge Nodes and TOR switches

Verify Tier-0 Gateway status on NSX-T

  • Status of T0 should be Success.


  • Check the interfaces of T0 to identify which all edge nodes are part of it.


  • Check the status of Edge Transport Nodes.


  • As you can see from the T0 interfaces, Edge01/02/03/04 are part of it and in those edge nodes you should be able to see the SR_TIER0 component. Next step is to login to those Edge nodes that are part of T0 and verify BGP summary.

Verify BGP on all Edge nodes that are part of T0 Gateway  

  • SSH into the edge node as admin user.
  • get logical-router
  • Look for SERVICE_ROUTER_TIER0.
sc2-01-nsxt04-r08edge02> get logical-router
Logical Router
UUID                                   VRF    LR-ID  Name                              Type                        Ports   Neighbors
736a80e3-23f6-5a2d-81d6-bbefb2786666   0      0                                        TUNNEL                      4       22/5000
e6d02207-c51e-4cf8-81a6-44afec5ad277   2      84653  DR-t1-domain-c1034:1de3adfa-0ee   DISTRIBUTED_ROUTER_TIER1    5       9/50000
a590f1da-2d79-4749-8153-7b174d23b069   32     85271  DR-t1-domain-c1034:1de3adfa-0ee   DISTRIBUTED_ROUTER_TIER1    5       5/50000
758d9736-6781-4b3a-906f-3d1b03f0924d   33     88016  DR-t1-domain-c1034:1de3adfa-0ee   DISTRIBUTED_ROUTER_TIER1    4       1/50000
5e7bfe98-0b5e-4620-90b1-204634e99127   37     3      SR-sc2-01-nsxt04-tr               SERVICE_ROUTER_TIER0        6       5/50000
  • vrf <SERVICE_ROUTER_TIER0 VRF>
  • get bgp neighbor summary
  • Note: If everything is working fine State should show Estab.
sc2-01-nsxt04-r08edge02> vrf 37
sc2-01-nsxt04-r08edge02(tier0_sr[37])> get bgp neighbor summary
BFD States: NC - Not configured, DC - Disconnected
            AD - Admin down, DW - Down, IN - Init, UP - Up
BGP summary information for VRF default for address-family: ipv4Unicast
Router ID: 10.184.248.2  Local AS: 4259971071

Neighbor                            AS          State Up/DownTime  BFD InMsgs  OutMsgs InPfx  OutPfx

10.184.248.239                      4259970544  Estab 05w1d22h     NC  12641393 12610093 2      568
10.184.248.240                      4259970544  Estab 05w1d23h     NC  12640337 11580431 2      566

  • You should be able to ping to the BGP neighbor IP. If you are unable to ping to neighbor IPs, then there is an issue.
sc2-01-nsxt04-r08edge02(tier0_sr[37])> ping 10.184.248.239
PING 10.184.248.239 (10.184.248.239): 56 data bytes
64 bytes from 10.184.248.239: icmp_seq=0 ttl=255 time=1.788 ms
^C
--- 10.184.248.239 ping statistics ---
2 packets transmitted, 1 packets received, 50.0% packet loss
round-trip min/avg/max/stddev = 1.788/1.788/1.788/0.000 ms

sc2-01-nsxt04-r08edge02(tier0_sr[37])> ping 10.184.248.240
PING 10.184.248.240 (10.184.248.240): 56 data bytes
64 bytes from 10.184.248.240: icmp_seq=0 ttl=255 time=1.925 ms
64 bytes from 10.184.248.240: icmp_seq=1 ttl=255 time=1.251 ms
^C
--- 10.184.248.240 ping statistics ---
3 packets transmitted, 2 packets received, 33.3% packet loss
round-trip min/avg/max/stddev = 1.251/1.588/1.925/0.337 ms

  • Get interfaces | more
sc2-01-nsxt04-r08edge02> vrf 37
sc2-01-nsxt04-r08edge02(tier0_sr[37])> get interfaces | more
Fri Aug 19 2022 UTC 11:07:18.042
Logical Router
UUID                                   VRF    LR-ID  Name                              Type
5e7bfe98-0b5e-4620-90b1-204634e99127   37     3      SR-sc2-01-nsxt04-tr               SERVICE_ROUTER_TIER0
Interfaces (IPv6 DAD Status A-DAD_Success, F-DAD_Duplicate, T-DAD_Tentative, U-DAD_Unavailable)
    Interface     : dd83554d-47c0-5a4e-9fbe-3abb1239a071
    Ifuid         : 335
    Mode          : cpu
    Port-type     : cpu
    Enable-mcast  : false

    Interface     : 008b2b15-17d1-4cc8-9d94-d9c4c2d0eb3a
    Ifuid         : 1000
    Name          : tr-interconnect-edge02
    Fwd-mode      : IPV4_AND_IPV6
    Internal name : uplink-1000
    Mode          : lif
    Port-type     : uplink
    IP/Mask       : 10.184.248.2/24
    MAC           : 02:00:70:51:9d:79
    VLAN          : 1611



Verify BGP on Cisco TOR switches

  • SSH to TOR switch.
  • show ip bgp summary
❯ ssh -o PubkeyAuthentication=no netadmin@sc2-01-r08lswa.xxxxxxxx.com
User Access Verification
(netadmin@sc2-01-r08lswa.xxxxxxxx.com) Password:

Cisco Nexus Operating System (NX-OS) Software

sc2-01-r08lswa# show ip bgp summary
BGP summary information for VRF default, address family IPv4 Unicast
BGP router identifier 10.184.17.248, local AS number 65001.65008
BGP table version is 520374, IPv4 Unicast config peers 10, capable peers 8
5150 network entries and 11372 paths using 2003240 bytes of memory
BGP attribute entries [110/18920], BGP AS path entries [69/1430]
BGP community entries [0/0], BGP clusterlist entries [0/0]
11356 received paths for inbound soft reconfiguration
11356 identical, 0 modified, 0 filtered received paths using 0 bytes

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.184.10.14    4 65011.65000
                        47979514 10570342   520374    0    0     5w1d 4541
10.184.10.78    4 65011.65000
                        47814555 10601750   520374    0    0     5w1d 4541
10.184.248.1    4 65001.65535
                          80831   79447   520374    0    0 02:41:51 566
10.184.248.2    4 65001.65535
                        3215614 3269391   520374    0    0     5w1d 566
10.184.248.3    4 65001.65535
                        3215776 3269344   520374    0    0     1w3d 566
10.184.248.4    4 65001.65535
                        3215676 3269383   520374    0    0 13:51:45 566
10.184.248.5    4 65001.65535
                        3200531 3269384   520374    0    0     5w1d 5
10.184.248.6    4 65001.65535
                        3197752 3266700   520374    0    0     5w1d 5


  • show ip arp
sc2-01-r08lswa# show ip arp 10.184.248.2

Flags: * - Adjacencies learnt on non-active FHRP router
       + - Adjacencies synced via CFSoE
       # - Adjacencies Throttled for Glean
       CP - Added via L2RIB, Control plane Adjacencies
       PS - Added via L2RIB, Peer Sync
       RO - Re-Originated Peer Sync Entry
       D - Static Adjacencies attached to down interface

IP ARP Table
Total number of entries: 1
Address         Age       MAC Address     Interface       Flags
10.184.248.2    00:06:12  0200.7051.9d79  Vlan1611


  • If you compare this IP and MAC, you can see that its the same of your T0 SR uplink of your edge02 node.
IP/Mask       : 10.184.248.2/24
MAC           : 02:00:70:51:9d:79

For further troubleshooting you can do packet capture from the edge nodes and ESXi server and analyze them using Wireshark.

Packet capture from Edge node

  • Capture packets from the T0 SR uplink interface.
sc2-01-nsxt04-r08edge01(tier0_sr[5])> get interfaces | more
Wed Aug 17 2022 UTC 13:52:48.203
Logical Router
UUID                                   VRF    LR-ID  Name                              Type
fb1ad846-8757-4fdf-9cbb-5c22ba772b52   5      2      SR-sc2-01-nsxt04-tr               SERVICE_ROUTER_TIER0
Interfaces (IPv6 DAD Status A-DAD_Success, F-DAD_Duplicate, T-DAD_Tentative, U-DAD_Unavailable)
    Interface     : c8b80ba1-93fc-5c82-a44f-4f4863b6413c
    Ifuid         : 286
    Mode          : cpu
    Port-type     : cpu
    Enable-mcast  : false

    Interface     : 4915d978-9c9a-58bc-84e2-cafe5442cba4
    Ifuid         : 287
    Mode          : blackhole
    Port-type     : blackhole

    Interface     : 899bcf30-83e2-46bb-9be2-8889ec52b354
    Ifuid         : 833
    Name          : tr-interconnect-edge01
    Fwd-mode      : IPV4_AND_IPV6
    Internal name : uplink-833
    Mode          : lif
    Port-type     : uplink
    IP/Mask       : 10.184.248.1/24
    MAC           : 02:00:70:d1:92:b1
    VLAN          : 1611
    Access-VLAN   : untagged
    LS port       : 15b971e9-7caa-43b7-86c1-96ff50453402
    Urpf-mode     : STRICT_MODE
    DAD-mode      : LOOSE
    RA-mode       : SLAAC_DNS_TRHOUGH_RA(M=0, O=0)
    Admin         : up
    Op_state      : up
    Enable-mcast  : False
    MTU           : 9000
    arp_proxy     :


  • Start a continuous ping from the TOR switches to the edge uplink IP (in this case ping 10.184.248.1 from TOR switches) before starting packet capture.
sc2-01-nsxt04-r08edge01> start capture interface 899bcf30-83e2-46bb-9be2-8889ec52b354 file uplink.pcap


Note:
Find the location of uplink.pcap file on TOR switches and SCP it locally to analyze using Wireshark.

 

Packet capture from ESXi

  • In this example, we are capturing packets of sc2-01-nsxt04-r08edge01 VM from the switchports where its interfaces are connected. sc2-01-nsxt04-r08edge01 VM is running on ESXi node sc2-01-r08esx10.
[root@sc2-01-r08esx10:~] esxcli network vm list | grep edge
18790721  sc2-01-nsxt04-r08edge05                                                 3  , ,
18977245  sc2-01-nsxt04-r08edge01                                                 3  , ,

[root@sc2-01-r08esx10:/tmp] esxcli network vm port list -w 18977245
   Port ID: 67109446
   vSwitch: sc2-01-vc16-dvs
   Portgroup:
   DVPort ID: b60a80c0-ecd6-40bd-8d2b-fbd1f06bb172
   MAC Address: 02:00:70:33:a9:67
   IP Address: 0.0.0.0
   Team Uplink: vmnic1
   Uplink Port ID: 2214592517
   Active Filters:

   Port ID: 67109447
   vSwitch: sc2-01-vc16-dvs
   Portgroup:
   DVPort ID: 6e3d8057-fc23-4180-b0ba-bed90381f0bf
   MAC Address: 02:00:70:d1:92:b1
   IP Address: 0.0.0.0
   Team Uplink: vmnic1
   Uplink Port ID: 2214592517
   Active Filters:

   Port ID: 67109448
   vSwitch: sc2-01-vc16-dvs
   Portgroup:
   DVPort ID: c531df19-294d-4079-b39c-89a3b58e30ad
   MAC Address: 02:00:70:30:c7:01
   IP Address: 0.0.0.0
   Team Uplink: vmnic0
   Uplink Port ID: 2214592519
   Active Filters:



  • Start a continuous ping from the TOR switches to the edge uplink IP (in this case ping 10.184.248.1 from TOR switches) before starting packet capture.
[root@sc2-01-r08esx10:/tmp] pktcap-uw --switchport 67109446 --dir 2 -o /tmp/67109446-02:00:70:33:a9:67.pcap --count 1000 & pktcap-uw --switchport 67109447 --dir 2 -o /tmp/67109447-02:00:70:d1:92:b1.pcap --count 1000 & pktcap-uw --switchport 67109448 --dir 2 -o /tmp/67109448-02:00:70:30:c7:01.pcap --count 1000




Note:
SCP the pcap files to laptop and use Wireshark to analyse them.
You can also do packet capture from physical uplinks (vmnic) of the ESXi node if required.

Hope it was useful. Cheers!

Friday, March 19, 2021

vSphere with Tanzu using NSX-T - Part5 - Tier-1 Gateway and Segments

In the previous posts we discussed the following: 

Part1: Prerequisites

Part2: Configure NSX-T

Part3: Edge Cluster

Part4: Tier-0 Gateway and BGP peering


The next step is to create a Tier-1 Gateway and network segments. 

  • Add Tier-1 Gateway.
    • Provide name, select the linked T0 Gateway, and select the route advertisement settings.

  • Add Segment.
    • Provide segment name, connected gateway, transport zone, and subnet.
    • Here we are creating an overlay segment and the subnet CIDR 172.16.10.1/24 will be the gateway IP for this segment.

Now, let's verify whether this segment is being advertised (route advertisement) or not. Following is the screenshot from both edge nodes and you can see that the Tier-0 SR is aware of 172.16.10.0/24 network:


As Tier-0 Gateway is connected to the TOR switches via BGP, we can verify whether the TOR switches are aware about this newly created segment. 


You can see that the TORs are aware of 172.16.10.0/24 network via BGP. Let's connect a VM to this segment, assign an IP address, and test network connectivity. 


You can also view the network topology from NSX-T.


This is the traffic flow: VM - Network segment - Tier-1 Gateway - Tier-0 Gateway - BGP peering - TOR switches - NAT VM - External network.

Hope this was useful. Cheers!

Sunday, February 7, 2021

vSphere with Tanzu using NSX-T - Part4 - Tier-0 Gateway and BGP peering

In the previous posts we discussed the following: 

Part1: Prerequisites
Part2: Configure NSX-T
Part3: Edge Cluster


The next step is to create a Tier-0 Gateway, configure its interfaces, and BGP peer with the L3 TOR switches. Following is a high-level logical representation of this configuration:


Configure Tier-0 Gateway


Before creating the T0-Gateway, we need to create two segments.

  • Add Segments.
    • Create a segment "ls-uplink-v54"
      • VLAN: 54
      • Transport Zone: "edge-vlan-tz"
    • Create a segment "ls-uplink-v55"
      • VLAN: 55
      • Transport Zone: "edge-vlan-tz"

  • Add Tier-0 Gateway.
    • Provide the necessary details as shown below.

    • Add 4 interfaces and configure them as per the logical diagram given above.
      • edge-01-uplink1 - 192.168.54.254/24 - connected via segment ls-uplink-v54
      • edge-01-uplink2 - 192.168.55.254/24 - connected via segment ls-uplink-v55

      • edge-02-uplink1 - 192.168.54.253/24 - connected via segment ls-uplink-v54
      • edge-02-uplink2 - 192.168.55.253/24 - connected via segment ls-uplink-v55
    • Verify the status is showing success for all the 4 interfaces that you added.

  • Routing and multicast settings of T0 are as follows:

    • You can see a static route is configured. The next hop for the default route 0.0.0.0/0 is set to 192.168.54.1. 

    • The next hop configuration is given below.

  • BGP settings of T0 are shown below.

    • BGP Neighbor config:

    • Verify the status is showing success for the two BGP Neighbors that you added.

  • Route re-distribution settings of T0:

    • Add route re-distribution.
    • Set route re-distribution.

Now, the T0 configuration is complete. The next step is to configure BGP on the Dell S4048-ON TOR switches.

Configure TOR Switches


---On TOR A---
conf
router bgp 65500
neighbor 192.168.54.254 remote-as 65400
#peering to T0 edge-01 interface
neighbor 192.168.54.254 no shutdown
neighbor 192.168.54.253 remote-as 65400
#peering to T0 edge-02 interface
neighbor 192.168.54.253 no shutdown
neighbor 192.168.54.3 remote-as 65500
#peering to TOR B in VLAN 54
neighbor 192.168.54.3 no shutdown
maximum-paths ebgp 4
maximum-paths ibgp 4


---On TOR B---
conf
router bgp 65500
neighbor 192.168.55.254 remote-as 65400
#peering to T0 edge-01 interface
neighbor 192.168.55.254 no shutdown
neighbor 192.168.55.253 remote-as 65400
#peering to T0 edge-02 interface
neighbor 192.168.55.253 no shutdown
neighbor 192.168.54.2 remote-as 65500
#peering to TOR A in VLAN 54
neighbor 192.168.54.2 no shutdown
maximum-paths ebgp 4
maximum-paths ibgp 4


---Advertising ESXi mgmt and VM traffic networks in BGP on both TORs---

conf
router bgp 65500
network 192.168.41.0/24
network 192.168.43.0/24


Thanks to my friend and vExpert Harikrishnan @hari5611 for helping me with the T0 configs and BGP peering on TORs. Do check out his blog https://vxplanet.com/


Verify BGP Configurations


The next step is to verify the BGP configs on TORs using the following commands:

show running-config bgp

show ip bgp summary

show ip bgp neighbors


Follow the VMware documentation to verify the BGP connections from a Tier-0 Service Router. In the below screenshot you can see that both Edge nodes have the BGP neighbors 192.168.54.2 and 192.168.55.3 with state Estab.


In the next article, I will talk about adding a T1 Gateway, adding new segments for apps, connecting VMs to the segments, and verify connectivity to different internal and external networks. I hope this was useful. Cheers!

Monday, December 28, 2020

vSphere with Tanzu using NSX-T - Part3 - Edge Cluster

In the previous post, we went through basic NSX-T configurations. The next step is to deploy Edge VMs and create an Edge cluster. Before creating Edge VMs, we need to create two trunk port groups on the VDS using the VCSA web UI. Uplinks of the Edge VMs will be connected to these two trunk port groups.

  • Create 2 trunk port groups on the VDS.
    • "Trunk01-NSXT-Edge"
    • "Trunk02-NSXT-Edge"


  • Add Edge Transport Nodes.
    • Create two Edge VMs "edge-01, edge-02".
    • Click ADD EDGE VM, follow the wizard, and provide all the details.






  • Click Finish. Follow the same process and deploy one more Edge VM.

  • The next step is to create an Edge Cluster "edge-cluster-01and add the two Edge VMs to it.


The NSX-T Edge Cluster is now ready. Next, we have to add a Tier-0 Gateway and configure BGP peering with the router or L3 TOR switches. This will be covered in the next part. Hope it was useful. Cheers!

Related posts


Part1: Prerequisites
Part2: Configure NSX-T

Tuesday, December 8, 2020

vSphere with Tanzu using NSX-T - Part2 - Configure NSX

In this post, we will take a look at the different configuration steps that are required before enabling workload management to establish connectivity to the supervisor cluster, supervisor namespaces, and all objects that run inside the namespaces, such as vSphere pods, and Tanzu Kubernetes Clusters. 

At this point, I assume that the vSAN cluster is up and NSX-T 3.0 is installed. NSX-T appliance is connected to the same management network where the VCSA and ESXi nodes are connected. In my case, it will be through VLAN 41. Note that all the ESXi nodes of the vSAN cluster are connected to one vSphere Distributed Switch and has two uplinks from each node that connects to TOR A and TOR B.


NSX-T configurations

  • Add Compute Managers. I've added the vCenter server here.


  • Add Uplink Profiles.
    • Create a host uplink profile "nsx-uplink-profile-lbsrc" (this is for host TEP using VLAN 52).
    • Create an edge uplink profile "nsx-edge-uplink-profile-lbsrc" (this is for edge TEP using VLAN 53).



  • Add Transport Zones.
    • Create an overlay transport zone "tz-overlay".
    • Create a VLAN transport zone "edge-vlan-tz".




  • Add IP Address Pools.
    • Create an IP address pool for host TEPs "TEP-Pool-01" (this is for host TEP using VLAN 52).
    • Create an IP address pool for edge TEPs "Edge-TEP-Pool-01" (this is for Edge TEP using VLAN 53).



  • Add Transport Node Profiles.


  • Configure Host Transport Nodes. Select the required cluster and click configure NSX to convert all the ESXi nodes as transport nodes.

The next step is to deploy Edge VMs (Edge Transport Nodes) and create a Edge Cluster. We will cover it in the next part. Hope it was useful. Cheers!

Related posts


Part1 - Prerequisites


References