This article provides basic guidance on troubleshooting BGP related issues.
Sample diagram showing connectivity between Edge Nodes and TOR switches |
Verify Tier-0 Gateway status on NSX-T
- Status of T0 should be Success.
- Check the interfaces of T0 to identify which all edge nodes are part of it.
- Check the status of Edge Transport Nodes.
- As you can see from the T0 interfaces, Edge01/02/03/04 are part of it and in those edge nodes you should be able to see the SR_TIER0 component. Next step is to login to those Edge nodes that are part of T0 and verify BGP summary.
Verify BGP on all Edge nodes that are part of T0 Gateway
- SSH into the edge node as admin user.
- get logical-router
- Look for SERVICE_ROUTER_TIER0.
sc2-01-nsxt04-r08edge02> get logical-router Logical Router UUID VRF LR-ID Name Type Ports Neighbors 736a80e3-23f6-5a2d-81d6-bbefb2786666 0 0 TUNNEL 4 22/5000 e6d02207-c51e-4cf8-81a6-44afec5ad277 2 84653 DR-t1-domain-c1034:1de3adfa-0ee DISTRIBUTED_ROUTER_TIER1 5 9/50000 a590f1da-2d79-4749-8153-7b174d23b069 32 85271 DR-t1-domain-c1034:1de3adfa-0ee DISTRIBUTED_ROUTER_TIER1 5 5/50000 758d9736-6781-4b3a-906f-3d1b03f0924d 33 88016 DR-t1-domain-c1034:1de3adfa-0ee DISTRIBUTED_ROUTER_TIER1 4 1/50000 5e7bfe98-0b5e-4620-90b1-204634e99127 37 3 SR-sc2-01-nsxt04-tr SERVICE_ROUTER_TIER0 6 5/50000
- vrf <SERVICE_ROUTER_TIER0 VRF>
- get bgp neighbor summary
- Note: If everything is working fine State should show Estab.
sc2-01-nsxt04-r08edge02> vrf 37 sc2-01-nsxt04-r08edge02(tier0_sr[37])> get bgp neighbor summary BFD States: NC - Not configured, DC - Disconnected AD - Admin down, DW - Down, IN - Init, UP - Up BGP summary information for VRF default for address-family: ipv4Unicast Router ID: 10.184.248.2 Local AS: 4259971071 Neighbor AS State Up/DownTime BFD InMsgs OutMsgs InPfx OutPfx 10.184.248.239 4259970544 Estab 05w1d22h NC 12641393 12610093 2 568 10.184.248.240 4259970544 Estab 05w1d23h NC 12640337 11580431 2 566
- You should be able to ping to the BGP neighbor IP. If you are unable to ping to neighbor IPs, then there is an issue.
sc2-01-nsxt04-r08edge02(tier0_sr[37])> ping 10.184.248.239 PING 10.184.248.239 (10.184.248.239): 56 data bytes 64 bytes from 10.184.248.239: icmp_seq=0 ttl=255 time=1.788 ms ^C --- 10.184.248.239 ping statistics --- 2 packets transmitted, 1 packets received, 50.0% packet loss round-trip min/avg/max/stddev = 1.788/1.788/1.788/0.000 ms sc2-01-nsxt04-r08edge02(tier0_sr[37])> ping 10.184.248.240 PING 10.184.248.240 (10.184.248.240): 56 data bytes 64 bytes from 10.184.248.240: icmp_seq=0 ttl=255 time=1.925 ms 64 bytes from 10.184.248.240: icmp_seq=1 ttl=255 time=1.251 ms ^C --- 10.184.248.240 ping statistics --- 3 packets transmitted, 2 packets received, 33.3% packet loss round-trip min/avg/max/stddev = 1.251/1.588/1.925/0.337 ms
- Get interfaces | more
sc2-01-nsxt04-r08edge02> vrf 37 sc2-01-nsxt04-r08edge02(tier0_sr[37])> get interfaces | more Fri Aug 19 2022 UTC 11:07:18.042 Logical Router UUID VRF LR-ID Name Type 5e7bfe98-0b5e-4620-90b1-204634e99127 37 3 SR-sc2-01-nsxt04-tr SERVICE_ROUTER_TIER0 Interfaces (IPv6 DAD Status A-DAD_Success, F-DAD_Duplicate, T-DAD_Tentative, U-DAD_Unavailable) Interface : dd83554d-47c0-5a4e-9fbe-3abb1239a071 Ifuid : 335 Mode : cpu Port-type : cpu Enable-mcast : false Interface : 008b2b15-17d1-4cc8-9d94-d9c4c2d0eb3a Ifuid : 1000 Name : tr-interconnect-edge02 Fwd-mode : IPV4_AND_IPV6 Internal name : uplink-1000 Mode : lif Port-type : uplink IP/Mask : 10.184.248.2/24 MAC : 02:00:70:51:9d:79 VLAN : 1611
Verify BGP on Cisco TOR switches
- SSH to TOR switch.
- show ip bgp summary
❯ ssh -o PubkeyAuthentication=no netadmin@sc2-01-r08lswa.xxxxxxxx.com User Access Verification (netadmin@sc2-01-r08lswa.xxxxxxxx.com) Password: Cisco Nexus Operating System (NX-OS) Software sc2-01-r08lswa# show ip bgp summary BGP summary information for VRF default, address family IPv4 Unicast BGP router identifier 10.184.17.248, local AS number 65001.65008 BGP table version is 520374, IPv4 Unicast config peers 10, capable peers 8 5150 network entries and 11372 paths using 2003240 bytes of memory BGP attribute entries [110/18920], BGP AS path entries [69/1430] BGP community entries [0/0], BGP clusterlist entries [0/0] 11356 received paths for inbound soft reconfiguration 11356 identical, 0 modified, 0 filtered received paths using 0 bytes Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 10.184.10.14 4 65011.65000 47979514 10570342 520374 0 0 5w1d 4541 10.184.10.78 4 65011.65000 47814555 10601750 520374 0 0 5w1d 4541 10.184.248.1 4 65001.65535 80831 79447 520374 0 0 02:41:51 566 10.184.248.2 4 65001.65535 3215614 3269391 520374 0 0 5w1d 566 10.184.248.3 4 65001.65535 3215776 3269344 520374 0 0 1w3d 566 10.184.248.4 4 65001.65535 3215676 3269383 520374 0 0 13:51:45 566 10.184.248.5 4 65001.65535 3200531 3269384 520374 0 0 5w1d 5 10.184.248.6 4 65001.65535 3197752 3266700 520374 0 0 5w1d 5
- show ip arp
sc2-01-r08lswa# show ip arp 10.184.248.2 Flags: * - Adjacencies learnt on non-active FHRP router + - Adjacencies synced via CFSoE # - Adjacencies Throttled for Glean CP - Added via L2RIB, Control plane Adjacencies PS - Added via L2RIB, Peer Sync RO - Re-Originated Peer Sync Entry D - Static Adjacencies attached to down interface IP ARP Table Total number of entries: 1 Address Age MAC Address Interface Flags 10.184.248.2 00:06:12 0200.7051.9d79 Vlan1611
- If you compare this IP and MAC, you can see that its the same of your T0 SR uplink of your edge02 node.
IP/Mask : 10.184.248.2/24 MAC : 02:00:70:51:9d:79
For further troubleshooting you can do packet capture from the edge nodes and ESXi server and analyze them using Wireshark.
Packet capture from Edge node
- Capture packets from the T0 SR uplink interface.
sc2-01-nsxt04-r08edge01(tier0_sr[5])> get interfaces | more Wed Aug 17 2022 UTC 13:52:48.203 Logical Router UUID VRF LR-ID Name Type fb1ad846-8757-4fdf-9cbb-5c22ba772b52 5 2 SR-sc2-01-nsxt04-tr SERVICE_ROUTER_TIER0 Interfaces (IPv6 DAD Status A-DAD_Success, F-DAD_Duplicate, T-DAD_Tentative, U-DAD_Unavailable) Interface : c8b80ba1-93fc-5c82-a44f-4f4863b6413c Ifuid : 286 Mode : cpu Port-type : cpu Enable-mcast : false Interface : 4915d978-9c9a-58bc-84e2-cafe5442cba4 Ifuid : 287 Mode : blackhole Port-type : blackhole Interface : 899bcf30-83e2-46bb-9be2-8889ec52b354 Ifuid : 833 Name : tr-interconnect-edge01 Fwd-mode : IPV4_AND_IPV6 Internal name : uplink-833 Mode : lif Port-type : uplink IP/Mask : 10.184.248.1/24 MAC : 02:00:70:d1:92:b1 VLAN : 1611 Access-VLAN : untagged LS port : 15b971e9-7caa-43b7-86c1-96ff50453402 Urpf-mode : STRICT_MODE DAD-mode : LOOSE RA-mode : SLAAC_DNS_TRHOUGH_RA(M=0, O=0) Admin : up Op_state : up Enable-mcast : False MTU : 9000 arp_proxy :
- Start a continuous ping from the TOR switches to the edge uplink IP (in this case ping 10.184.248.1 from TOR switches) before starting packet capture.
sc2-01-nsxt04-r08edge01> start capture interface 899bcf30-83e2-46bb-9be2-8889ec52b354 file uplink.pcap
Note:
Find the location of uplink.pcap file on TOR switches and SCP it locally to analyze using Wireshark.
Packet capture from ESXi
- In this example, we are capturing packets of sc2-01-nsxt04-r08edge01 VM from the switchports where its interfaces are connected. sc2-01-nsxt04-r08edge01 VM is running on ESXi node sc2-01-r08esx10.
[root@sc2-01-r08esx10:~] esxcli network vm list | grep edge 18790721 sc2-01-nsxt04-r08edge05 3 , , 18977245 sc2-01-nsxt04-r08edge01 3 , , [root@sc2-01-r08esx10:/tmp] esxcli network vm port list -w 18977245 Port ID: 67109446 vSwitch: sc2-01-vc16-dvs Portgroup: DVPort ID: b60a80c0-ecd6-40bd-8d2b-fbd1f06bb172 MAC Address: 02:00:70:33:a9:67 IP Address: 0.0.0.0 Team Uplink: vmnic1 Uplink Port ID: 2214592517 Active Filters: Port ID: 67109447 vSwitch: sc2-01-vc16-dvs Portgroup: DVPort ID: 6e3d8057-fc23-4180-b0ba-bed90381f0bf MAC Address: 02:00:70:d1:92:b1 IP Address: 0.0.0.0 Team Uplink: vmnic1 Uplink Port ID: 2214592517 Active Filters: Port ID: 67109448 vSwitch: sc2-01-vc16-dvs Portgroup: DVPort ID: c531df19-294d-4079-b39c-89a3b58e30ad MAC Address: 02:00:70:30:c7:01 IP Address: 0.0.0.0 Team Uplink: vmnic0 Uplink Port ID: 2214592519 Active Filters:
- Start a continuous ping from the TOR switches to the edge uplink IP (in this case ping 10.184.248.1 from TOR switches) before starting packet capture.
[root@sc2-01-r08esx10:/tmp] pktcap-uw --switchport 67109446 --dir 2 -o /tmp/67109446-02:00:70:33:a9:67.pcap --count 1000 & pktcap-uw --switchport 67109447 --dir 2 -o /tmp/67109447-02:00:70:d1:92:b1.pcap --count 1000 & pktcap-uw --switchport 67109448 --dir 2 -o /tmp/67109448-02:00:70:30:c7:01.pcap --count 1000
Note:
SCP the pcap files to laptop and use Wireshark to analyse them.
You can also do packet capture from physical uplinks (vmnic) of the ESXi node if required.
Hope it was useful. Cheers!