Date created: 12/07/15 14:15:58. Last modified: 06/29/18 09:17:14

6500/7600 Forwarding Debug

In this example MPLS is being run to the CPE (global routing table: 10.5.4.82) using an eBGP VPNv4 session and pinging from a PE router other than the one the customer is connected to inside one MPLS L3 VPN (a specific VRF, CUST1) is failing. On the PE the customer is connected to everything is fine (labelled traffic is flowing as expected). So this is packet loss in one single VRF/VPN from all PEs except the customer connected one. Traffic inside all other VRFs/VPNs are fine:

One the customer connecting PE (abr1) we can see all the routing information is present and correct, the CPE has advertised the route 10.254.253.70/32 inside the VRF CUST1, a label is advertised with that route and both the route and label are in both FIB and LFIB, and MLS and CEF adjacencies (which should then be programmed down into hardware by the PFC) are present and valid:

! Ping to customer from the customer facing PE is fine
abr1#ping vrf CUST1 10.254.253.70
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.254.253.70, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/4 ms


! Valid route inside VRF, shows label 150 towards customer CPE learnt via eBGP VPNv4 (MPLS Option B)
abr1#show ip route vrf CUST1 10.254.253.70

Routing Table: CUST1
Routing entry for 10.254.253.70/32
  Known via "bgp 55555", distance 20, metric 0
  Tag 64536, type external
  Last update from 10.50.4.82 2d22h ago
  Routing Descriptor Blocks:
  * 10.50.4.82 (default), from 10.50.4.82, 2d22h ago, recursive-via-conn
      Route metric is 0, traffic share count is 1
      AS Hops 1
      Route tag 64536
      MPLS label: 150
      MPLS Flags: MPLS Required



! Here is the LFIB entry
abr1#show mpls forwarding-table vrf CUST1 10.254.253.70
Local      Outgoing   Prefix           Bytes Label   Outgoing   Next Hop    
Label      Label      or Tunnel Id     Switched      interface              
91953      150        10.254.253.70/32[V]   \
                                       43334779      Vl213      10.50.4.82  


! Here we can see the VRF is using per-prefix labelling so the other service provider PEs are getting a label (91953 as above) from this PE which will be specific to this route
abr1#show ip vrf detail CUST1
VRF CUST1 (VRF Id = 44); default RD 55555:2047; default VPNID 
  Interfaces:
    Gi4/5.2046               Gi4/6.701                Vl2047                  
    Vl323                    Vl312                    Vl271                   
    Vl270                    Vl819                    Vl817                   
    Vl810                    Vl822                    Gi4/21.1223             
    Vl1220                   Vl1310                   Vl1364                  
    Vl1366                   Vl1215                   Vl1217                  
    Vl1368                   Vl1370                   Vl1372                  
    Vl802                   
VRF Table ID = 44
  Export VPN route-target communities
    RT:55555:2047            RT:55555:2500           
  Import VPN route-target communities
    RT:55555:2047            RT:64515:2047            RT:65200:2047
    RT:64516:2047            RT:64520:2047            RT:55555:2501
    RT:55555:2500            RT:99999:2047          
  No import route-map
  No global export route-map
  Export route-map: plc_SNMP
  VRF label distribution protocol: not configured
  VRF label allocation mode: per-prefix
    vrf-conn-aggr for connected and BGP aggregates (Label 61)
  Prefix protection with additional path enabled


! Cef has created the following adjacency
abr1#show ip cef vrf CUST1 exact-route 10.254.253.1 10.254.253.70
10.254.253.1 -> 10.254.253.70 => label 150 label implicit-null TAG adj out of Vlan213, addr 10.50.4.82


abr1#show ip cef vrf CUST1 10.254.253.70 internal
10.254.253.70/32, epoch 1, flags rib defined all labels, RIB[B], refcount 6, per-destination sharing
  sources: RIB, Adj, LTE 
  feature space:
   IPRM: 0x00018000
   Broker: linked, distributed at 3rd priority
   LFD: 10.254.253.70/32 1 local label
   local label info: other/91953
        contains path extension list
        disposition chain 0x3AC7E2B8
        label switch chain 0x3AC7E2B8
  subblocks:
   Adj source: IP adj out of Vlan2047, addr 10.254.253.70 476C46A0
    Dependent covered prefix type adjfib, cover 10.254.253.0/24
  ifnums: (none)
  path 46A88324, path list 4E251080, share 1/1, type recursive, for IPv4, flags must-be-labelled, recursive-via-connected
    MPLS short path extensions: MOI flags = 0x0 label 150
  recursive via 10.50.4.82[IPv4:Default] label 150, fib 3E9C1080, 1 terminal fib, v4:Default:10.50.4.82/32
    path 4D862364, path list 4E253E70, share 1/1, type attached host, for IPv4
      MPLS short path extensions: MOI flags = 0x1 label implicit-null
    attached to Vlan213, adjacency IP adj out of Vlan213, addr 10.50.4.82 3E3E5680
  output chain: label 150 label implicit-null TAG adj out of Vlan213, addr 10.50.4.82 3E3E54E0


abr1#show mls cef vrf CUST1 10.254.253.70 det

Codes: M - mask entry, V - value entry, A - adjacency index, P - priority bit
       D - full don't switch, m - load balancing modnumber, B - BGP Bucket sel
       V0 - Vlan 0,C0 - don't comp bit 0,V1 - Vlan 1,C1 - don't comp bit 1
       RVTEN - RPF Vlan table enable, RVTSEL - RPF Vlan table select
Format: IPV4_DA - (8 | xtag vpn pi cr recirc tos prefix)
Format: IPV4_SA - (9 | xtag vpn pi cr recirc prefix)
M(10490  ): E | 1 FFF  0 0 0 0   255.255.255.255
V(10490  ): 8 | 1 264  0 0 0 0   10.254.253.70      (A:394943 ,P:1,D:0,m:0 ,B:0 )


abr1#show mls cef adjacency entry 394943 det

Index: 394943  smac: e8b7.482a.2c00, dmac: c89c.1dd1.e880
               mtu: 1568, vlan: 213, dindex: 0x0, l3rw_vld: 1
               format: MPLS, flags: 0x1000208518 
               label0: 0, exp: 0, ovr: 0
               label1: 0, exp: 0, ovr: 0
               label2: 150, exp: 0, ovr: 0
               op: PUSH_LABEL2
               packets: 0, bytes: 0


abr1#show mls cef adjacency flags 000208518 43

TTL_DEC NO_CACHE TCP_SEQ L3_RW_VALID L3_RW SAME_VLAN 
IQI IQO MPLS_EOM



! Lets look at the adjacency on the interface facing another PE
abr1#show adjacency te6/5
Protocol Interface                 Address
IP       TenGigabitEthernet6/5     46.x.x.249(148822)
TAG      TenGigabitEthernet6/5     46.x.x.249(20)


abr1#show adjacency te6/5 detail
Protocol Interface                 Address
IP       TenGigabitEthernet6/5     46.x.x.249(148821)
                                   2212727861 packets, 1464480757355 bytes
                                   epoch 0
                                   sourced in sev-epoch 187
                                   Encap length 14
                                   C471FE025E00E8B7482A2C000800
                                   L2 destination address byte offset 0
                                   L2 destination address byte length 6
                                   Link-type after encap: ip
                                   ARP
TAG      TenGigabitEthernet6/5     46.x.x.249(20)
                                   469816658 packets, 232961648002 bytes
                                   epoch 0
                                   sourced in sev-epoch 187
                                   Encap length 14
                                   C471FE025E00E8B7482A2C008847
                                   L2 destination address byte offset 0
                                   L2 destination address byte length 6
                                   Link-type after encap: tagswitch
                                   ARP



! No ACL applied to the inter-PE interface:
abr1#show tcam interface tenGigabitEthernet 6/5 acl in ip detail

* Global Defaults shared

-------------------------------------------------------------------------------------------------------------------
DPort - Destination Port   SPort - Source Port        TCP-F - U -URG             Pro   - Protocol
I     - Inverted LOU       TOS   - TOS Value                - A -ACK             rtr   - Router
MRFM  - M -MPLS Packet     TN    - T -Tcp Control           - P -PSH             COD   - C -Bank Care Flag
      - R -Recirc. Flag          - N -Non-cachable          - R -RST                   - I -OrdIndep. Flag
      - F -Fragment Flag   CAP   - Capture Flag             - S -SYN                   - D -Dynamic Flag
      - M -More Fragments  F-P   - FlowMask-Prior.          - F -FIN             T     - V(Value)/M(Mask)/R(Result)
X     - XTAG               (*)   - Bank Priority
-------------------------------------------------------------------------------------------------------------------




Interface: 4080   label: 1537   lookup_type: 0
protocol: IP   packet-type: 0

+-+-----+---------------+---------------+---------------+---------------+-------+---+----+-+---+--+---+---+
|T|Index|  Dest Ip Addr | Source Ip Addr|     DPort     |     SPort     | TCP-F |Pro|MRFM|X|TOS|TN|COD|F-P|
+-+-----+---------------+---------------+---------------+---------------+-------+---+----+-+---+--+---+---+
 V 17823         0.0.0.0         0.0.0.0       P=0             P=0        ------   0 ---- 0   0 -- C-- 0-0

Some commands can be run check the overall device operating health:

show mls cef exception status
show mls cef maximum-routes
show platform hardware capacity forwarding
show platform hardware pfc mode
show fabric utilization all
show mls cef hardware
show cef state
show cef error
show cef background detail
show platform eobc all remote command switch show mls rate-limit hw-det
remote command module 5 show platform hardware earl status | beg Forwarding statistics for kuma
remote command switch show platform hardware earl status
remote command switch show platform hardware earl statistics

Everything looks fine from this PE (abr1) as we can even ping the CPE inside the CUST1 VRF. But from no other PE can the 10.254.253.70 IP inside CUST1 VRF be pinged. Even though all other PEs have the same info, a valid route (via this PE, abr1, loopback0 due to next-hop-self on the iBGP VPNv4 peerings), a valid label, and traffic inside all other VRFs towards this CPE is fine from all PEs.

At this point one needs to investigate a hardware programming issue. Traffic will be coming in label switched from other PEs. abr1 is originating label 91953 to other PEs. For traffic originated from other PEs the traffic must come to abr1 with label 91953 which is then swapped for 150 (as above) and forwarded on to the MPLS enabeled CPE. For traffic originating from abr1, label 150 is being push on, there is no pop-and-push (swap), so perhaps something about incoming traffic labelled with 91953 is wrong:

! Some basic hardware info
show diagnostic events
show diagnostic result module 6 !(SUP/RSP module in this example)


# sudo tcpdump -nlASX -s 0 -vvv -i eth3 'mpls and host 10.254.253.70 and icmp and not host 192.168.30.204'
# CAPTURE ON 1st 7600 PE connected to CPE, ping from 2nd 7600 PE is failing
# RUN A ELAM CAPTURE WITH MPLS TAG 91953, SOURCE IP 10.254.254.1, DEST IP 10.254.253.70, IP PROTO = 1 (ICMP)
#


# 0 = DestMAC
# 0 = SrcMAC
# 0 = Dot1Q tag
# 0x8847 = Ethertype 0x8847 = MPLS Unicast
# 1673 0x1 = First MPLS label (label 91953)
# 1 = MPLS BoS bit
# 00 = MPLS TTL
# 4500 0x00000000 0x00000001 0x00000000 0x00000AFE 0xFD460000 = IPv4 header

show platform capture elam trigger dbus others if data = 0 0 0 0x88471673 0x11004500 0x00000000 0x00000001 0x00000AFE 0xFE010AFE 0xFD460000 [ 0 0 0 0xffffffff 0xff00ff00 0 0x0000000f 0x0000ffff 0xffffffff 0xffff0000 ]




abr1#show platform capture elam start


abr1#show platform capture elam status
active ELAM info:
Slot Cpu   Asic   Inst Ver PB Elam
---- --- -------- ---- --- -- ----
6    0   ST_SMAN  0    3.2    Y
DBUS trigger: FORMAT=OTHERS DATA = 0 0 0 0X88471673 0X11004500 0X00000000 0X00000001 0X00000AFE 0XFE010AFE 0XFD460000 [ 0 0 0 0XFFFFFFFF 0XFF00FF00 0 0X0000000F 0X0000FFFF 0XFFFFFFFF 0XFFFF0000 ]
elam capture in progress


abr1#show platform capture elam status
active ELAM info:
Slot Cpu   Asic   Inst Ver PB Elam
---- --- -------- ---- --- -- ----
6    0   ST_SMAN  0    3.2    Y
DBUS trigger: FORMAT=OTHERS DATA = 0 0 0 0X88471673 0X11004500 0X00000000 0X00000001 0X00000AFE 0XFE010AFE 0XFD460000 [ 0 0 0 0XFFFFFFFF 0XFF00FF00 0 0X0000000F 0X0000FFFF 0XFFFFFFFF 0XFFFF0000 ]
elam capture completed

abr1#show platform capture elam data
DBUS data:
SEQ_NUM                          [5] = 0xB
QOS                              [3] = 0
QOS_TYPE                         [1] = 0
TYPE                             [4] = 0 [ETHERNET]
STATUS_BPDU                      [1] = 0
IPO                              [1] = 1
NO_ESTBLS                        [1] = 0
RBH                              [3] = b000
CR                               [1] = 0
TRUSTED                          [1] = 1
NOTIFY_IL                        [1] = 0
NOTIFY_NL                        [1] = 0
DISABLE_NL                       [1] = 0
DISABLE_IL                       [1] = 0
DONT_FWD                         [1] = 0
INDEX_DIRECT                     [1] = 0
DONT_LEARN                       [1] = 0
COND_LEARN                       [1] = 0
BUNDLE_BYPASS                    [1] = 0
QOS_TIC                          [1] = 0
INBAND                           [1] = 0
IGNORE_QOSO                      [1] = 0
IGNORE_QOSI                      [1] = 0
IGNORE_ACLO                      [1] = 0
IGNORE_ACLI                      [1] = 0
PORT_QOS                         [1] = 1
CACHE_CNTRL                      [2] = 0 [NORMAL]
VLAN                             [12] = 4080
SRC_FLOOD                        [1] = 0
SRC_INDEX                        [19] = 0x144
LEN                              [16] = 122
FORMAT                           [2] = 3 [OTHERS]
PACKET_TYPE                      [3] = 0 [ETHERNET]
L3_PROTOCOL                      [4] = 15 [INVALID]
L3_PT                            [8] = 0
FF                               [1] = 0
MN                               [1] = 0
RF                               [1] = 0
SC                               [1] = 0
CARD_TYPE                        [4] = 0x0
ISL                              [16] = 0x0
DATA [592]
0000:  E8 B7 48 2A 2C 00 C4 71 FE 02 5E 00 88 47 16 73   "..H*,..q..^..G.s"
0010:  11 FF 45 00 00 64 10 16 00 00 FF 01 9A 3E 0A FE   "..E..d.......>.."
0020:  FE 01 0A FE FD 46 08 00 C9 A6 01 71 00 00 00 00   ".....F.....q...."
0030:  00 01 02 31 B1 00 AB CD AB CD AB CD AB CD AB CD   "...1............"
0040:  AB CD AB CD AB CD AB CD AB CD                     ".........."
CRC                              [16] = 0x77E8                   ".........."


RBUS data:
SEQ_NUM                          [5] = 0xB
CCC                              [3] = b101 [L2_POLICE]
CAP1                             [1] = 0
CAP2                             [1] = 0
QOS                              [3] = 0
EGRESS                           [1] = 0
DT                               [1] = 1 [GENERIC]
TL                               [1] = 0 [B32]
FLOOD                            [1] = 0
DEST_INDEX                       [19] = 0x7FFF
VLAN                             [12] = 4080
RBH                              [3] = b001
RDT                              [1] = 1
GENERIC                          [1] = 0
EXTRA_CICLE                      [1] = 0
FABRIC_PRIO                      [1] = 0
L2                               [1] = 0
FCS1                             [8] = 0x1
DELTA_LEN                        [8] = 0
REWRITE_INFO
 i0  - no rewrite.
FCS2                             [8] = 0x0

Control signals:
rb_stat                          [3] = 0x7


! The DBUS data in the ELAM capture shows the source interface is 0x144 and source VLAN 4080
! (this will be an internal VLAN since the PE links are point-to-point L3 links)
abr1#remote command switch test mcast ltl-info index  144
index 0x144 contain ports 6/5


! Te6/5 is the link facing the other PE so the packet has come in the expected interface
abr1#show vlan internal usage | i 4080
4080 TenGigabitEthernet6/5


! However the RBUS data shows [L2_POLICE] which indicates the packet has been forwarded to a hardware rate limiter
! or other policing/dropping mechanism, if this packet was destined to the RSP this might be a sign CoPP is dropping the packet.
! The destination interface index 0x7FFF is a drop index though. Under normal operations one would expect CCC = [L3_RW]
abr1#remote command switch test mcast ltl-info index 7FFF
index 0x7FFF contain ports * empty *


! We can see that MLS hardware rate limiters are configured
abr1#show run | in mls rate
mls rate-limit multicast ipv4 fib-miss 2000 10
mls rate-limit multicast ipv4 non-rpf 10 10
mls rate-limit multicast ipv4 igmp 2000 10
mls rate-limit multicast ipv4 partial 2000 10
mls rate-limit unicast cef glean 200 50
mls rate-limit unicast ip rpf-failure 10 10
mls rate-limit unicast ip icmp redirect 0 
mls rate-limit unicast ip icmp unreachable no-route 10 10
mls rate-limit unicast ip icmp unreachable acl-drop 10 10
mls rate-limit unicast ip errors 10 10
mls rate-limit all ttl-failure 200 50
mls rate-limit all mtu-failure 10 10
mls rate-limit layer2 pdu 20 20


! However this destination index 0x7FFF isn't related to them (its not in the list below), its a reserved drop index
abr1#remote command switch show mls rate-limit hw-det

    Hw ID   Status   Packets/s   Burst    Index    Type
    -----  --------  ---------  -------  -------  ------
      0    Disabled          0      0    0x0       -    
      1    Enabled         200     50    0x7F05    LTL  
      2    Enabled          10     10    0x3       ADJ  
      3    Enabled          10     10    0x1A      ADJ  
      4    Enabled         200     50    0x1B      ADJ  
      5    Enabled          10     10    0x7F0A    LTL  
      6    Enabled        2000      1    0x7E05    LTL  
      7    Enabled        2000     10    0x0       ADJ  
      8    Disabled          0      0    0x0       -    
      9    Enabled          10      1    0x7F08    LTL  
     10    Enabled        2000     10    0x7F09    LTL  
     11    Enabled          20     20    0x7F0C    LTL  
     12    Enabled        2000     10    0x2000    LTL  

Everything seems fine in software, but in hardware packets from other PEs are being sent to a bit bucket destination interface index (0x7FFF). In the end, hard clearing the BGP session to the CPE fixed the issue. This caused the route to be dropped and re-learnt, a new label was associated and so a hardware update was triggered to push the new forwarding and adjacency information into the hardware. A new ping from another PE and ELAM saw that everything was working once again as expected again.

! Just for educational purposes we can see which destination indexes have the value 7FFF and would drop packets
abr1#remote command switch show platform hardware tycho register 0 1794 | i 7FFF
 0x005A:          NF_HW_ACC_DATA_8 = 0xFF7FFFEF [4286578671]
 0x0060:         NF_HW_ACC_DATA_14 = 0xFF7FFFFF [4286578687]
 0x008F:        SE_MASK_NT_0_ALL_1 = 0x007FFFFF [8388607   ]
 0x0181:            PP_RF_SRC_IDX2 = 0x00007FFF [32767     ]
 0x044F:                   RED_PLC = 0x00007FFF [32767     ]
 0x0455:                RED_FD_IDX = 0x00007FFF [32767     ]
 0x0457:              RED_BBKT_IDX = 0x00007FFF [32767     ]
 0x045C:          RED_MPLS_ERR_IDX = 0x00007FFF [32767     ]
 0x049C:             TY_DE_AJ_RMT6 = 0x0007FFFF [524287    ]

For packets being dropped we can actually update those values to point to the interface index of another interface and run a packet capture on that interface:

! If one wanted to redirect packets out of interface Te3/2, we must find its index,
! one can do this by looking at the index of the internal VLAN for that interface
abr1#show vlan internal usage | i 3/2
4027 TenGigabitEthernet3/2

abr1#remote command switch test mcast ltl-info vlan 4027

routed interface
src index 0x81 contain ports 3/2
multicast flood index 0xCFBB for vlan 4027 contain ports 3/2, 6/R


! 0x81 is the Te3/2 index. If one knows the interface index already we can double check with the following
abr1-thnlon.core#remote command switch test mcast ltl-info index 81

index 0x81 contain ports 3/2


! Now the register pointing to a drop (0x7FFF) or HWRL index can be updated to point to the real interface (RED_MPLS_ERR_IDX from above)
remote command switch show platform hardware tycho poke 45c 81