Showing posts with label BGP. Show all posts
Showing posts with label BGP. Show all posts

Wednesday, June 26, 2024

vSphere with Tanzu using NSX-T - Part32 - Troubleshooting BGP related issues

This article provides basic guidance on troubleshooting BGP related issues.

Sample diagram showing connectivity between Edge Nodes and TOR switches

Verify Tier-0 Gateway status on NSX-T

  • Status of T0 should be Success.


  • Check the interfaces of T0 to identify which all edge nodes are part of it.


  • Check the status of Edge Transport Nodes.


  • As you can see from the T0 interfaces, Edge01/02/03/04 are part of it and in those edge nodes you should be able to see the SR_TIER0 component. Next step is to login to those Edge nodes that are part of T0 and verify BGP summary.

Verify BGP on all Edge nodes that are part of T0 Gateway  

  • SSH into the edge node as admin user.
  • get logical-router
  • Look for SERVICE_ROUTER_TIER0.
sc2-01-nsxt04-r08edge02> get logical-router
Logical Router
UUID                                   VRF    LR-ID  Name                              Type                        Ports   Neighbors
736a80e3-23f6-5a2d-81d6-bbefb2786666   0      0                                        TUNNEL                      4       22/5000
e6d02207-c51e-4cf8-81a6-44afec5ad277   2      84653  DR-t1-domain-c1034:1de3adfa-0ee   DISTRIBUTED_ROUTER_TIER1    5       9/50000
a590f1da-2d79-4749-8153-7b174d23b069   32     85271  DR-t1-domain-c1034:1de3adfa-0ee   DISTRIBUTED_ROUTER_TIER1    5       5/50000
758d9736-6781-4b3a-906f-3d1b03f0924d   33     88016  DR-t1-domain-c1034:1de3adfa-0ee   DISTRIBUTED_ROUTER_TIER1    4       1/50000
5e7bfe98-0b5e-4620-90b1-204634e99127   37     3      SR-sc2-01-nsxt04-tr               SERVICE_ROUTER_TIER0        6       5/50000
  • vrf <SERVICE_ROUTER_TIER0 VRF>
  • get bgp neighbor summary
  • Note: If everything is working fine State should show Estab.
sc2-01-nsxt04-r08edge02> vrf 37
sc2-01-nsxt04-r08edge02(tier0_sr[37])> get bgp neighbor summary
BFD States: NC - Not configured, DC - Disconnected
            AD - Admin down, DW - Down, IN - Init, UP - Up
BGP summary information for VRF default for address-family: ipv4Unicast
Router ID: 10.184.248.2  Local AS: 4259971071

Neighbor                            AS          State Up/DownTime  BFD InMsgs  OutMsgs InPfx  OutPfx

10.184.248.239                      4259970544  Estab 05w1d22h     NC  12641393 12610093 2      568
10.184.248.240                      4259970544  Estab 05w1d23h     NC  12640337 11580431 2      566

  • You should be able to ping to the BGP neighbor IP. If you are unable to ping to neighbor IPs, then there is an issue.
sc2-01-nsxt04-r08edge02(tier0_sr[37])> ping 10.184.248.239
PING 10.184.248.239 (10.184.248.239): 56 data bytes
64 bytes from 10.184.248.239: icmp_seq=0 ttl=255 time=1.788 ms
^C
--- 10.184.248.239 ping statistics ---
2 packets transmitted, 1 packets received, 50.0% packet loss
round-trip min/avg/max/stddev = 1.788/1.788/1.788/0.000 ms

sc2-01-nsxt04-r08edge02(tier0_sr[37])> ping 10.184.248.240
PING 10.184.248.240 (10.184.248.240): 56 data bytes
64 bytes from 10.184.248.240: icmp_seq=0 ttl=255 time=1.925 ms
64 bytes from 10.184.248.240: icmp_seq=1 ttl=255 time=1.251 ms
^C
--- 10.184.248.240 ping statistics ---
3 packets transmitted, 2 packets received, 33.3% packet loss
round-trip min/avg/max/stddev = 1.251/1.588/1.925/0.337 ms

  • Get interfaces | more
sc2-01-nsxt04-r08edge02> vrf 37
sc2-01-nsxt04-r08edge02(tier0_sr[37])> get interfaces | more
Fri Aug 19 2022 UTC 11:07:18.042
Logical Router
UUID                                   VRF    LR-ID  Name                              Type
5e7bfe98-0b5e-4620-90b1-204634e99127   37     3      SR-sc2-01-nsxt04-tr               SERVICE_ROUTER_TIER0
Interfaces (IPv6 DAD Status A-DAD_Success, F-DAD_Duplicate, T-DAD_Tentative, U-DAD_Unavailable)
    Interface     : dd83554d-47c0-5a4e-9fbe-3abb1239a071
    Ifuid         : 335
    Mode          : cpu
    Port-type     : cpu
    Enable-mcast  : false

    Interface     : 008b2b15-17d1-4cc8-9d94-d9c4c2d0eb3a
    Ifuid         : 1000
    Name          : tr-interconnect-edge02
    Fwd-mode      : IPV4_AND_IPV6
    Internal name : uplink-1000
    Mode          : lif
    Port-type     : uplink
    IP/Mask       : 10.184.248.2/24
    MAC           : 02:00:70:51:9d:79
    VLAN          : 1611



Verify BGP on Cisco TOR switches

  • SSH to TOR switch.
  • show ip bgp summary
❯ ssh -o PubkeyAuthentication=no netadmin@sc2-01-r08lswa.xxxxxxxx.com
User Access Verification
(netadmin@sc2-01-r08lswa.xxxxxxxx.com) Password:

Cisco Nexus Operating System (NX-OS) Software

sc2-01-r08lswa# show ip bgp summary
BGP summary information for VRF default, address family IPv4 Unicast
BGP router identifier 10.184.17.248, local AS number 65001.65008
BGP table version is 520374, IPv4 Unicast config peers 10, capable peers 8
5150 network entries and 11372 paths using 2003240 bytes of memory
BGP attribute entries [110/18920], BGP AS path entries [69/1430]
BGP community entries [0/0], BGP clusterlist entries [0/0]
11356 received paths for inbound soft reconfiguration
11356 identical, 0 modified, 0 filtered received paths using 0 bytes

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.184.10.14    4 65011.65000
                        47979514 10570342   520374    0    0     5w1d 4541
10.184.10.78    4 65011.65000
                        47814555 10601750   520374    0    0     5w1d 4541
10.184.248.1    4 65001.65535
                          80831   79447   520374    0    0 02:41:51 566
10.184.248.2    4 65001.65535
                        3215614 3269391   520374    0    0     5w1d 566
10.184.248.3    4 65001.65535
                        3215776 3269344   520374    0    0     1w3d 566
10.184.248.4    4 65001.65535
                        3215676 3269383   520374    0    0 13:51:45 566
10.184.248.5    4 65001.65535
                        3200531 3269384   520374    0    0     5w1d 5
10.184.248.6    4 65001.65535
                        3197752 3266700   520374    0    0     5w1d 5


  • show ip arp
sc2-01-r08lswa# show ip arp 10.184.248.2

Flags: * - Adjacencies learnt on non-active FHRP router
       + - Adjacencies synced via CFSoE
       # - Adjacencies Throttled for Glean
       CP - Added via L2RIB, Control plane Adjacencies
       PS - Added via L2RIB, Peer Sync
       RO - Re-Originated Peer Sync Entry
       D - Static Adjacencies attached to down interface

IP ARP Table
Total number of entries: 1
Address         Age       MAC Address     Interface       Flags
10.184.248.2    00:06:12  0200.7051.9d79  Vlan1611


  • If you compare this IP and MAC, you can see that its the same of your T0 SR uplink of your edge02 node.
IP/Mask       : 10.184.248.2/24
MAC           : 02:00:70:51:9d:79

For further troubleshooting you can do packet capture from the edge nodes and ESXi server and analyze them using Wireshark.

Packet capture from Edge node

  • Capture packets from the T0 SR uplink interface.
sc2-01-nsxt04-r08edge01(tier0_sr[5])> get interfaces | more
Wed Aug 17 2022 UTC 13:52:48.203
Logical Router
UUID                                   VRF    LR-ID  Name                              Type
fb1ad846-8757-4fdf-9cbb-5c22ba772b52   5      2      SR-sc2-01-nsxt04-tr               SERVICE_ROUTER_TIER0
Interfaces (IPv6 DAD Status A-DAD_Success, F-DAD_Duplicate, T-DAD_Tentative, U-DAD_Unavailable)
    Interface     : c8b80ba1-93fc-5c82-a44f-4f4863b6413c
    Ifuid         : 286
    Mode          : cpu
    Port-type     : cpu
    Enable-mcast  : false

    Interface     : 4915d978-9c9a-58bc-84e2-cafe5442cba4
    Ifuid         : 287
    Mode          : blackhole
    Port-type     : blackhole

    Interface     : 899bcf30-83e2-46bb-9be2-8889ec52b354
    Ifuid         : 833
    Name          : tr-interconnect-edge01
    Fwd-mode      : IPV4_AND_IPV6
    Internal name : uplink-833
    Mode          : lif
    Port-type     : uplink
    IP/Mask       : 10.184.248.1/24
    MAC           : 02:00:70:d1:92:b1
    VLAN          : 1611
    Access-VLAN   : untagged
    LS port       : 15b971e9-7caa-43b7-86c1-96ff50453402
    Urpf-mode     : STRICT_MODE
    DAD-mode      : LOOSE
    RA-mode       : SLAAC_DNS_TRHOUGH_RA(M=0, O=0)
    Admin         : up
    Op_state      : up
    Enable-mcast  : False
    MTU           : 9000
    arp_proxy     :


  • Start a continuous ping from the TOR switches to the edge uplink IP (in this case ping 10.184.248.1 from TOR switches) before starting packet capture.
sc2-01-nsxt04-r08edge01> start capture interface 899bcf30-83e2-46bb-9be2-8889ec52b354 file uplink.pcap


Note:
Find the location of uplink.pcap file on TOR switches and SCP it locally to analyze using Wireshark.

 

Packet capture from ESXi

  • In this example, we are capturing packets of sc2-01-nsxt04-r08edge01 VM from the switchports where its interfaces are connected. sc2-01-nsxt04-r08edge01 VM is running on ESXi node sc2-01-r08esx10.
[root@sc2-01-r08esx10:~] esxcli network vm list | grep edge
18790721  sc2-01-nsxt04-r08edge05                                                 3  , ,
18977245  sc2-01-nsxt04-r08edge01                                                 3  , ,

[root@sc2-01-r08esx10:/tmp] esxcli network vm port list -w 18977245
   Port ID: 67109446
   vSwitch: sc2-01-vc16-dvs
   Portgroup:
   DVPort ID: b60a80c0-ecd6-40bd-8d2b-fbd1f06bb172
   MAC Address: 02:00:70:33:a9:67
   IP Address: 0.0.0.0
   Team Uplink: vmnic1
   Uplink Port ID: 2214592517
   Active Filters:

   Port ID: 67109447
   vSwitch: sc2-01-vc16-dvs
   Portgroup:
   DVPort ID: 6e3d8057-fc23-4180-b0ba-bed90381f0bf
   MAC Address: 02:00:70:d1:92:b1
   IP Address: 0.0.0.0
   Team Uplink: vmnic1
   Uplink Port ID: 2214592517
   Active Filters:

   Port ID: 67109448
   vSwitch: sc2-01-vc16-dvs
   Portgroup:
   DVPort ID: c531df19-294d-4079-b39c-89a3b58e30ad
   MAC Address: 02:00:70:30:c7:01
   IP Address: 0.0.0.0
   Team Uplink: vmnic0
   Uplink Port ID: 2214592519
   Active Filters:



  • Start a continuous ping from the TOR switches to the edge uplink IP (in this case ping 10.184.248.1 from TOR switches) before starting packet capture.
[root@sc2-01-r08esx10:/tmp] pktcap-uw --switchport 67109446 --dir 2 -o /tmp/67109446-02:00:70:33:a9:67.pcap --count 1000 & pktcap-uw --switchport 67109447 --dir 2 -o /tmp/67109447-02:00:70:d1:92:b1.pcap --count 1000 & pktcap-uw --switchport 67109448 --dir 2 -o /tmp/67109448-02:00:70:30:c7:01.pcap --count 1000




Note:
SCP the pcap files to laptop and use Wireshark to analyse them.
You can also do packet capture from physical uplinks (vmnic) of the ESXi node if required.

Hope it was useful. Cheers!

Saturday, April 24, 2021

vSphere with Tanzu using NSX-T Blog Series


Part1 - Prerequisites
Part2 - Configure NSX
Part3 - Edge Cluster
Part4 - Tier-0 Gateway and BGP peering
Part5 - Tier-1 Gateway and Segments
Part6 - Create tags, storage policy, and content library
Part7 - Enable workload management
Part8 - Create namespace and deploy Tanzu Kubernetes Cluster
Part9 - Monitoring
Part10 - Upgrade Tanzu Kubernetes Cluster
Part11 - Troubleshooting Tanzu Kubernetes Cluster
Part12 - Deploy application on TKC and access it
Part13 - Export WCP admin kubeconfig
Part14 - Testing TKC storage using kubestr
Part15 - Working with etcd on TKC with one control plane
Part16 - Troubleshooting content library related issues
Part17 - Troubleshooting TKC stuck at updating phase
Part18 - Troubleshooting vSphere pods with ProviderFailed status
Part19 - Troubleshooting TKC stuck at creating phase
Part20 - Safely deleting NotReady nodes from a TKC
Part21 - Pointers while upgrading the stack
Part22 - Working with NGINX Ingress Controller
Part23 - Supervisor cluster certificates expiry
Part24 - Kubernetes component certs in TKC
Part25 - Spherelet
Part26 - Jumpbox kubectl plugin to SSH to TKC node
Part27 - nullfinalizer kubectl plugin
Part28 - Create a custom VM Class
Part29 - Logging using Loki stack
Part30 - Troubleshooting inaccesssible TKC with server pool members missing in the LB VS
Part31 - Troubleshooting inaccessible TKC with expired control plane certs
Part32 - Troubleshooting BGP related issues
Part33 - Troubleshooting intermittent connection timeouts to apiserver and workloads
Part34 - CPU and Memory utilization of a supervisor cluster
Part35 - Monitoring supervisor cluster health with Python and vCenter APIs

Friday, March 19, 2021

vSphere with Tanzu using NSX-T - Part5 - Tier-1 Gateway and Segments

In the previous posts we discussed the following: 

Part1: Prerequisites

Part2: Configure NSX-T

Part3: Edge Cluster

Part4: Tier-0 Gateway and BGP peering


The next step is to create a Tier-1 Gateway and network segments. 

  • Add Tier-1 Gateway.
    • Provide name, select the linked T0 Gateway, and select the route advertisement settings.

  • Add Segment.
    • Provide segment name, connected gateway, transport zone, and subnet.
    • Here we are creating an overlay segment and the subnet CIDR 172.16.10.1/24 will be the gateway IP for this segment.

Now, let's verify whether this segment is being advertised (route advertisement) or not. Following is the screenshot from both edge nodes and you can see that the Tier-0 SR is aware of 172.16.10.0/24 network:


As Tier-0 Gateway is connected to the TOR switches via BGP, we can verify whether the TOR switches are aware about this newly created segment. 


You can see that the TORs are aware of 172.16.10.0/24 network via BGP. Let's connect a VM to this segment, assign an IP address, and test network connectivity. 


You can also view the network topology from NSX-T.


This is the traffic flow: VM - Network segment - Tier-1 Gateway - Tier-0 Gateway - BGP peering - TOR switches - NAT VM - External network.

Hope this was useful. Cheers!

Sunday, February 7, 2021

vSphere with Tanzu using NSX-T - Part4 - Tier-0 Gateway and BGP peering

In the previous posts we discussed the following: 

Part1: Prerequisites
Part2: Configure NSX-T
Part3: Edge Cluster


The next step is to create a Tier-0 Gateway, configure its interfaces, and BGP peer with the L3 TOR switches. Following is a high-level logical representation of this configuration:


Configure Tier-0 Gateway


Before creating the T0-Gateway, we need to create two segments.

  • Add Segments.
    • Create a segment "ls-uplink-v54"
      • VLAN: 54
      • Transport Zone: "edge-vlan-tz"
    • Create a segment "ls-uplink-v55"
      • VLAN: 55
      • Transport Zone: "edge-vlan-tz"

  • Add Tier-0 Gateway.
    • Provide the necessary details as shown below.

    • Add 4 interfaces and configure them as per the logical diagram given above.
      • edge-01-uplink1 - 192.168.54.254/24 - connected via segment ls-uplink-v54
      • edge-01-uplink2 - 192.168.55.254/24 - connected via segment ls-uplink-v55

      • edge-02-uplink1 - 192.168.54.253/24 - connected via segment ls-uplink-v54
      • edge-02-uplink2 - 192.168.55.253/24 - connected via segment ls-uplink-v55
    • Verify the status is showing success for all the 4 interfaces that you added.

  • Routing and multicast settings of T0 are as follows:

    • You can see a static route is configured. The next hop for the default route 0.0.0.0/0 is set to 192.168.54.1. 

    • The next hop configuration is given below.

  • BGP settings of T0 are shown below.

    • BGP Neighbor config:

    • Verify the status is showing success for the two BGP Neighbors that you added.

  • Route re-distribution settings of T0:

    • Add route re-distribution.
    • Set route re-distribution.

Now, the T0 configuration is complete. The next step is to configure BGP on the Dell S4048-ON TOR switches.

Configure TOR Switches


---On TOR A---
conf
router bgp 65500
neighbor 192.168.54.254 remote-as 65400
#peering to T0 edge-01 interface
neighbor 192.168.54.254 no shutdown
neighbor 192.168.54.253 remote-as 65400
#peering to T0 edge-02 interface
neighbor 192.168.54.253 no shutdown
neighbor 192.168.54.3 remote-as 65500
#peering to TOR B in VLAN 54
neighbor 192.168.54.3 no shutdown
maximum-paths ebgp 4
maximum-paths ibgp 4


---On TOR B---
conf
router bgp 65500
neighbor 192.168.55.254 remote-as 65400
#peering to T0 edge-01 interface
neighbor 192.168.55.254 no shutdown
neighbor 192.168.55.253 remote-as 65400
#peering to T0 edge-02 interface
neighbor 192.168.55.253 no shutdown
neighbor 192.168.54.2 remote-as 65500
#peering to TOR A in VLAN 54
neighbor 192.168.54.2 no shutdown
maximum-paths ebgp 4
maximum-paths ibgp 4


---Advertising ESXi mgmt and VM traffic networks in BGP on both TORs---

conf
router bgp 65500
network 192.168.41.0/24
network 192.168.43.0/24


Thanks to my friend and vExpert Harikrishnan @hari5611 for helping me with the T0 configs and BGP peering on TORs. Do check out his blog https://vxplanet.com/


Verify BGP Configurations


The next step is to verify the BGP configs on TORs using the following commands:

show running-config bgp

show ip bgp summary

show ip bgp neighbors


Follow the VMware documentation to verify the BGP connections from a Tier-0 Service Router. In the below screenshot you can see that both Edge nodes have the BGP neighbors 192.168.54.2 and 192.168.55.3 with state Estab.


In the next article, I will talk about adding a T1 Gateway, adding new segments for apps, connecting VMs to the segments, and verify connectivity to different internal and external networks. I hope this was useful. Cheers!