Date created: Monday, December 7, 2015 2:15:58 PM. Last modified: Friday, June 29, 2018 9:17:14 AM
6500/7600 Forwarding Debug
In this example MPLS is being run to the CPE (global routing table: 10.5.4.82) using an eBGP VPNv4 session and pinging from a PE router other than the one the customer is connected to inside one MPLS L3 VPN (a specific VRF, CUST1) is failing. On the PE the customer is connected to everything is fine (labelled traffic is flowing as expected). So this is packet loss in one single VRF/VPN from all PEs except the customer connected one. Traffic inside all other VRFs/VPNs are fine:
One the customer connecting PE (abr1) we can see all the routing information is present and correct, the CPE has advertised the route 10.254.253.70/32 inside the VRF CUST1, a label is advertised with that route and both the route and label are in both FIB and LFIB, and MLS and CEF adjacencies (which should then be programmed down into hardware by the PFC) are present and valid:
! Ping to customer from the customer facing PE is fine abr1#ping vrf CUST1 10.254.253.70 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 10.254.253.70, timeout is 2 seconds: !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/4 ms ! Valid route inside VRF, shows label 150 towards customer CPE learnt via eBGP VPNv4 (MPLS Option B) abr1#show ip route vrf CUST1 10.254.253.70 Routing Table: CUST1 Routing entry for 10.254.253.70/32 Known via "bgp 55555", distance 20, metric 0 Tag 64536, type external Last update from 10.50.4.82 2d22h ago Routing Descriptor Blocks: * 10.50.4.82 (default), from 10.50.4.82, 2d22h ago, recursive-via-conn Route metric is 0, traffic share count is 1 AS Hops 1 Route tag 64536 MPLS label: 150 MPLS Flags: MPLS Required ! Here is the LFIB entry abr1#show mpls forwarding-table vrf CUST1 10.254.253.70 Local Outgoing Prefix Bytes Label Outgoing Next Hop Label Label or Tunnel Id Switched interface 91953 150 10.254.253.70/32[V] \ 43334779 Vl213 10.50.4.82 ! Here we can see the VRF is using per-prefix labelling so the other service provider PEs are getting a label (91953 as above) from this PE which will be specific to this route abr1#show ip vrf detail CUST1 VRF CUST1 (VRF Id = 44); default RD 55555:2047; default VPNID Interfaces: Gi4/5.2046 Gi4/6.701 Vl2047 Vl323 Vl312 Vl271 Vl270 Vl819 Vl817 Vl810 Vl822 Gi4/21.1223 Vl1220 Vl1310 Vl1364 Vl1366 Vl1215 Vl1217 Vl1368 Vl1370 Vl1372 Vl802 VRF Table ID = 44 Export VPN route-target communities RT:55555:2047 RT:55555:2500 Import VPN route-target communities RT:55555:2047 RT:64515:2047 RT:65200:2047 RT:64516:2047 RT:64520:2047 RT:55555:2501 RT:55555:2500 RT:99999:2047 No import route-map No global export route-map Export route-map: plc_SNMP VRF label distribution protocol: not configured VRF label allocation mode: per-prefix vrf-conn-aggr for connected and BGP aggregates (Label 61) Prefix protection with additional path enabled ! Cef has created the following adjacency abr1#show ip cef vrf CUST1 exact-route 10.254.253.1 10.254.253.70 10.254.253.1 -> 10.254.253.70 => label 150 label implicit-null TAG adj out of Vlan213, addr 10.50.4.82 abr1#show ip cef vrf CUST1 10.254.253.70 internal 10.254.253.70/32, epoch 1, flags rib defined all labels, RIB[B], refcount 6, per-destination sharing sources: RIB, Adj, LTE feature space: IPRM: 0x00018000 Broker: linked, distributed at 3rd priority LFD: 10.254.253.70/32 1 local label local label info: other/91953 contains path extension list disposition chain 0x3AC7E2B8 label switch chain 0x3AC7E2B8 subblocks: Adj source: IP adj out of Vlan2047, addr 10.254.253.70 476C46A0 Dependent covered prefix type adjfib, cover 10.254.253.0/24 ifnums: (none) path 46A88324, path list 4E251080, share 1/1, type recursive, for IPv4, flags must-be-labelled, recursive-via-connected MPLS short path extensions: MOI flags = 0x0 label 150 recursive via 10.50.4.82[IPv4:Default] label 150, fib 3E9C1080, 1 terminal fib, v4:Default:10.50.4.82/32 path 4D862364, path list 4E253E70, share 1/1, type attached host, for IPv4 MPLS short path extensions: MOI flags = 0x1 label implicit-null attached to Vlan213, adjacency IP adj out of Vlan213, addr 10.50.4.82 3E3E5680 output chain: label 150 label implicit-null TAG adj out of Vlan213, addr 10.50.4.82 3E3E54E0 abr1#show mls cef vrf CUST1 10.254.253.70 det Codes: M - mask entry, V - value entry, A - adjacency index, P - priority bit D - full don't switch, m - load balancing modnumber, B - BGP Bucket sel V0 - Vlan 0,C0 - don't comp bit 0,V1 - Vlan 1,C1 - don't comp bit 1 RVTEN - RPF Vlan table enable, RVTSEL - RPF Vlan table select Format: IPV4_DA - (8 | xtag vpn pi cr recirc tos prefix) Format: IPV4_SA - (9 | xtag vpn pi cr recirc prefix) M(10490 ): E | 1 FFF 0 0 0 0 255.255.255.255 V(10490 ): 8 | 1 264 0 0 0 0 10.254.253.70 (A:394943 ,P:1,D:0,m:0 ,B:0 ) abr1#show mls cef adjacency entry 394943 det Index: 394943 smac: e8b7.482a.2c00, dmac: c89c.1dd1.e880 mtu: 1568, vlan: 213, dindex: 0x0, l3rw_vld: 1 format: MPLS, flags: 0x1000208518 label0: 0, exp: 0, ovr: 0 label1: 0, exp: 0, ovr: 0 label2: 150, exp: 0, ovr: 0 op: PUSH_LABEL2 packets: 0, bytes: 0 abr1#show mls cef adjacency flags 000208518 43 TTL_DEC NO_CACHE TCP_SEQ L3_RW_VALID L3_RW SAME_VLAN IQI IQO MPLS_EOM ! Lets look at the adjacency on the interface facing another PE abr1#show adjacency te6/5 Protocol Interface Address IP TenGigabitEthernet6/5 46.x.x.249(148822) TAG TenGigabitEthernet6/5 46.x.x.249(20) abr1#show adjacency te6/5 detail Protocol Interface Address IP TenGigabitEthernet6/5 46.x.x.249(148821) 2212727861 packets, 1464480757355 bytes epoch 0 sourced in sev-epoch 187 Encap length 14 C471FE025E00E8B7482A2C000800 L2 destination address byte offset 0 L2 destination address byte length 6 Link-type after encap: ip ARP TAG TenGigabitEthernet6/5 46.x.x.249(20) 469816658 packets, 232961648002 bytes epoch 0 sourced in sev-epoch 187 Encap length 14 C471FE025E00E8B7482A2C008847 L2 destination address byte offset 0 L2 destination address byte length 6 Link-type after encap: tagswitch ARP ! No ACL applied to the inter-PE interface: abr1#show tcam interface tenGigabitEthernet 6/5 acl in ip detail * Global Defaults shared ------------------------------------------------------------------------------------------------------------------- DPort - Destination Port SPort - Source Port TCP-F - U -URG Pro - Protocol I - Inverted LOU TOS - TOS Value - A -ACK rtr - Router MRFM - M -MPLS Packet TN - T -Tcp Control - P -PSH COD - C -Bank Care Flag - R -Recirc. Flag - N -Non-cachable - R -RST - I -OrdIndep. Flag - F -Fragment Flag CAP - Capture Flag - S -SYN - D -Dynamic Flag - M -More Fragments F-P - FlowMask-Prior. - F -FIN T - V(Value)/M(Mask)/R(Result) X - XTAG (*) - Bank Priority ------------------------------------------------------------------------------------------------------------------- Interface: 4080 label: 1537 lookup_type: 0 protocol: IP packet-type: 0 +-+-----+---------------+---------------+---------------+---------------+-------+---+----+-+---+--+---+---+ |T|Index| Dest Ip Addr | Source Ip Addr| DPort | SPort | TCP-F |Pro|MRFM|X|TOS|TN|COD|F-P| +-+-----+---------------+---------------+---------------+---------------+-------+---+----+-+---+--+---+---+ V 17823 0.0.0.0 0.0.0.0 P=0 P=0 ------ 0 ---- 0 0 -- C-- 0-0
Some commands can be run check the overall device operating health:
show mls cef exception status
show mls cef maximum-routes
show platform hardware capacity forwarding
show platform hardware pfc mode
show fabric utilization all
show mls cef hardware
show cef state
show cef error
show cef background detail
show platform eobc all remote command switch show mls rate-limit hw-det
remote command module 5 show platform hardware earl status | beg Forwarding statistics for kuma
remote command switch show platform hardware earl status
remote command switch show platform hardware earl statistics
Everything looks fine from this PE (abr1) as we can even ping the CPE inside the CUST1 VRF. But from no other PE can the 10.254.253.70 IP inside CUST1 VRF be pinged. Even though all other PEs have the same info, a valid route (via this PE, abr1, loopback0 due to next-hop-self on the iBGP VPNv4 peerings), a valid label, and traffic inside all other VRFs towards this CPE is fine from all PEs.
At this point one needs to investigate a hardware programming issue. Traffic will be coming in label switched from other PEs. abr1 is originating label 91953 to other PEs. For traffic originated from other PEs the traffic must come to abr1 with label 91953 which is then swapped for 150 (as above) and forwarded on to the MPLS enabeled CPE. For traffic originating from abr1, label 150 is being push on, there is no pop-and-push (swap), so perhaps something about incoming traffic labelled with 91953 is wrong:
! Some basic hardware info show diagnostic events show diagnostic result module 6 !(SUP/RSP module in this example) # sudo tcpdump -nlASX -s 0 -vvv -i eth3 'mpls and host 10.254.253.70 and icmp and not host 192.168.30.204' # CAPTURE ON 1st 7600 PE connected to CPE, ping from 2nd 7600 PE is failing # RUN A ELAM CAPTURE WITH MPLS TAG 91953, SOURCE IP 10.254.254.1, DEST IP 10.254.253.70, IP PROTO = 1 (ICMP) # # 0 = DestMAC # 0 = SrcMAC # 0 = Dot1Q tag # 0x8847 = Ethertype 0x8847 = MPLS Unicast # 1673 0x1 = First MPLS label (label 91953) # 1 = MPLS BoS bit # 00 = MPLS TTL # 4500 0x00000000 0x00000001 0x00000000 0x00000AFE 0xFD460000 = IPv4 header show platform capture elam trigger dbus others if data = 0 0 0 0x88471673 0x11004500 0x00000000 0x00000001 0x00000AFE 0xFE010AFE 0xFD460000 [ 0 0 0 0xffffffff 0xff00ff00 0 0x0000000f 0x0000ffff 0xffffffff 0xffff0000 ] abr1#show platform capture elam start abr1#show platform capture elam status active ELAM info: Slot Cpu Asic Inst Ver PB Elam ---- --- -------- ---- --- -- ---- 6 0 ST_SMAN 0 3.2 Y DBUS trigger: FORMAT=OTHERS DATA = 0 0 0 0X88471673 0X11004500 0X00000000 0X00000001 0X00000AFE 0XFE010AFE 0XFD460000 [ 0 0 0 0XFFFFFFFF 0XFF00FF00 0 0X0000000F 0X0000FFFF 0XFFFFFFFF 0XFFFF0000 ] elam capture in progress abr1#show platform capture elam status active ELAM info: Slot Cpu Asic Inst Ver PB Elam ---- --- -------- ---- --- -- ---- 6 0 ST_SMAN 0 3.2 Y DBUS trigger: FORMAT=OTHERS DATA = 0 0 0 0X88471673 0X11004500 0X00000000 0X00000001 0X00000AFE 0XFE010AFE 0XFD460000 [ 0 0 0 0XFFFFFFFF 0XFF00FF00 0 0X0000000F 0X0000FFFF 0XFFFFFFFF 0XFFFF0000 ] elam capture completed abr1#show platform capture elam data DBUS data: SEQ_NUM [5] = 0xB QOS [3] = 0 QOS_TYPE [1] = 0 TYPE [4] = 0 [ETHERNET] STATUS_BPDU [1] = 0 IPO [1] = 1 NO_ESTBLS [1] = 0 RBH [3] = b000 CR [1] = 0 TRUSTED [1] = 1 NOTIFY_IL [1] = 0 NOTIFY_NL [1] = 0 DISABLE_NL [1] = 0 DISABLE_IL [1] = 0 DONT_FWD [1] = 0 INDEX_DIRECT [1] = 0 DONT_LEARN [1] = 0 COND_LEARN [1] = 0 BUNDLE_BYPASS [1] = 0 QOS_TIC [1] = 0 INBAND [1] = 0 IGNORE_QOSO [1] = 0 IGNORE_QOSI [1] = 0 IGNORE_ACLO [1] = 0 IGNORE_ACLI [1] = 0 PORT_QOS [1] = 1 CACHE_CNTRL [2] = 0 [NORMAL] VLAN [12] = 4080 SRC_FLOOD [1] = 0 SRC_INDEX [19] = 0x144 LEN [16] = 122 FORMAT [2] = 3 [OTHERS] PACKET_TYPE [3] = 0 [ETHERNET] L3_PROTOCOL [4] = 15 [INVALID] L3_PT [8] = 0 FF [1] = 0 MN [1] = 0 RF [1] = 0 SC [1] = 0 CARD_TYPE [4] = 0x0 ISL [16] = 0x0 DATA [592] 0000: E8 B7 48 2A 2C 00 C4 71 FE 02 5E 00 88 47 16 73 "..H*,..q..^..G.s" 0010: 11 FF 45 00 00 64 10 16 00 00 FF 01 9A 3E 0A FE "..E..d.......>.." 0020: FE 01 0A FE FD 46 08 00 C9 A6 01 71 00 00 00 00 ".....F.....q...." 0030: 00 01 02 31 B1 00 AB CD AB CD AB CD AB CD AB CD "...1............" 0040: AB CD AB CD AB CD AB CD AB CD ".........." CRC [16] = 0x77E8 ".........." RBUS data: SEQ_NUM [5] = 0xB CCC [3] = b101 [L2_POLICE] CAP1 [1] = 0 CAP2 [1] = 0 QOS [3] = 0 EGRESS [1] = 0 DT [1] = 1 [GENERIC] TL [1] = 0 [B32] FLOOD [1] = 0 DEST_INDEX [19] = 0x7FFF VLAN [12] = 4080 RBH [3] = b001 RDT [1] = 1 GENERIC [1] = 0 EXTRA_CICLE [1] = 0 FABRIC_PRIO [1] = 0 L2 [1] = 0 FCS1 [8] = 0x1 DELTA_LEN [8] = 0 REWRITE_INFO i0 - no rewrite. FCS2 [8] = 0x0 Control signals: rb_stat [3] = 0x7 ! The DBUS data in the ELAM capture shows the source interface is 0x144 and source VLAN 4080 ! (this will be an internal VLAN since the PE links are point-to-point L3 links) abr1#remote command switch test mcast ltl-info index 144 index 0x144 contain ports 6/5 ! Te6/5 is the link facing the other PE so the packet has come in the expected interface abr1#show vlan internal usage | i 4080 4080 TenGigabitEthernet6/5 ! However the RBUS data shows [L2_POLICE] which indicates the packet has been forwarded to a hardware rate limiter ! or other policing/dropping mechanism, if this packet was destined to the RSP this might be a sign CoPP is dropping the packet. ! The destination interface index 0x7FFF is a drop index though. Under normal operations one would expect CCC = [L3_RW] abr1#remote command switch test mcast ltl-info index 7FFF index 0x7FFF contain ports * empty * ! We can see that MLS hardware rate limiters are configured abr1#show run | in mls rate mls rate-limit multicast ipv4 fib-miss 2000 10 mls rate-limit multicast ipv4 non-rpf 10 10 mls rate-limit multicast ipv4 igmp 2000 10 mls rate-limit multicast ipv4 partial 2000 10 mls rate-limit unicast cef glean 200 50 mls rate-limit unicast ip rpf-failure 10 10 mls rate-limit unicast ip icmp redirect 0 mls rate-limit unicast ip icmp unreachable no-route 10 10 mls rate-limit unicast ip icmp unreachable acl-drop 10 10 mls rate-limit unicast ip errors 10 10 mls rate-limit all ttl-failure 200 50 mls rate-limit all mtu-failure 10 10 mls rate-limit layer2 pdu 20 20 ! However this destination index 0x7FFF isn't related to them (its not in the list below), its a reserved drop index abr1#remote command switch show mls rate-limit hw-det Hw ID Status Packets/s Burst Index Type ----- -------- --------- ------- ------- ------ 0 Disabled 0 0 0x0 - 1 Enabled 200 50 0x7F05 LTL 2 Enabled 10 10 0x3 ADJ 3 Enabled 10 10 0x1A ADJ 4 Enabled 200 50 0x1B ADJ 5 Enabled 10 10 0x7F0A LTL 6 Enabled 2000 1 0x7E05 LTL 7 Enabled 2000 10 0x0 ADJ 8 Disabled 0 0 0x0 - 9 Enabled 10 1 0x7F08 LTL 10 Enabled 2000 10 0x7F09 LTL 11 Enabled 20 20 0x7F0C LTL 12 Enabled 2000 10 0x2000 LTL
Everything seems fine in software, but in hardware packets from other PEs are being sent to a bit bucket destination interface index (0x7FFF). In the end, hard clearing the BGP session to the CPE fixed the issue. This caused the route to be dropped and re-learnt, a new label was associated and so a hardware update was triggered to push the new forwarding and adjacency information into the hardware. A new ping from another PE and ELAM saw that everything was working once again as expected again.
! Just for educational purposes we can see which destination indexes have the value 7FFF and would drop packets abr1#remote command switch show platform hardware tycho register 0 1794 | i 7FFF 0x005A: NF_HW_ACC_DATA_8 = 0xFF7FFFEF [4286578671] 0x0060: NF_HW_ACC_DATA_14 = 0xFF7FFFFF [4286578687] 0x008F: SE_MASK_NT_0_ALL_1 = 0x007FFFFF [8388607 ] 0x0181: PP_RF_SRC_IDX2 = 0x00007FFF [32767 ] 0x044F: RED_PLC = 0x00007FFF [32767 ] 0x0455: RED_FD_IDX = 0x00007FFF [32767 ] 0x0457: RED_BBKT_IDX = 0x00007FFF [32767 ] 0x045C: RED_MPLS_ERR_IDX = 0x00007FFF [32767 ] 0x049C: TY_DE_AJ_RMT6 = 0x0007FFFF [524287 ]
For packets being dropped we can actually update those values to point to the interface index of another interface and run a packet capture on that interface:
! If one wanted to redirect packets out of interface Te3/2, we must find its index, ! one can do this by looking at the index of the internal VLAN for that interface abr1#show vlan internal usage | i 3/2 4027 TenGigabitEthernet3/2 abr1#remote command switch test mcast ltl-info vlan 4027 routed interface src index 0x81 contain ports 3/2 multicast flood index 0xCFBB for vlan 4027 contain ports 3/2, 6/R ! 0x81 is the Te3/2 index. If one knows the interface index already we can double check with the following abr1-thnlon.core#remote command switch test mcast ltl-info index 81 index 0x81 contain ports 3/2 ! Now the register pointing to a drop (0x7FFF) or HWRL index can be updated to point to the real interface (RED_MPLS_ERR_IDX from above) remote command switch show platform hardware tycho poke 45c 81
Previous page: 6500/7600 FIB TCAM Allocation
Next page: 6500/7600 Forwarding Hardware