Date created: Wednesday, August 26, 2015 9:03:33 AM. Last modified: Thursday, August 11, 2022 8:39:07 AM

IOS-XR Troubleshooting and Diagnostics

For ASR9000 Series Routers

References:
https://supportforums.cisco.com/document/135941/asr9000xr-understanding-platform-diags-3-puntfabricdatapathfailed#What_to_collect_if_there_is_still_an_issue
BRKARC-2017 - Packet Journey Inside ASR 9000
BRKSPG-3612  Troubleshooting IOS-XR
http://www.cisco.com/c/en/us/support/docs/routers/asr-9000-series-aggregation-services-routers/116999-problem-line-card-00.html

Contents:
System OS and RSP
Forwarding Plane
Line Card / MPA Level
NP / Bridge / FIA Level
PHY / MAC / Interface / NP Level

 

System/OS and RSP:

Top 10 processes by CPU usage, by default this command refreshes every 1 seconds for 5 iterations. Check other variants with "run top --help" and "run top_procs --help":

RP/0/RSP0/CPU0:abr1#run top_procs -D -d 1 -i 5 -c -n 10
Wed Dec 23 14:35:00.634 UTC
Computing times...
node0_RSP0_CPU0: 343 procs, 4 cpus, 1.04 delta, 418:20:11 uptime
Memory: 8192 MB total, 5.395 GB free, sample time: Wed Dec 23 14:35:01 2015
cpu 0 idle: 73.10%, cpu 1 idle: 50.29%, cpu 2 idle: 54.48%, cpu 3 idle: 52.78%, total idle: 57.66%, kernel:  1.26%

      pid   mem MB   user cpu kernel cpu   delta  % ker  % tot name
 12689629  325.039 547565.102  42929.053   0.836   0.77  20.81 bgp
 12681378   15.496 318671.569  15964.698   0.515   0.24  12.82 mpls_lsd
 12124317    3.792 165476.455   4056.820   0.124   0.02   3.08 fib_mgr
   192592    8.167 145014.753  22639.398   0.047   0.12   1.17 gsp
   565552  200.871  62974.333   1847.591   0.040   0.02   0.99 ipv4_rib
   241797    6.503  12548.085   1469.881   0.024   0.02   0.59 netio
    57385   53.511  64834.412  10031.991   0.017   0.12   0.42 eth_server
   192607    3.027  30455.305   1447.194   0.014   0.02   0.34 cluster_clm_rp
   241793    1.578  10269.683   1606.126   0.010   0.02   0.24 nrssvr
   192608    1.375   9321.330   1919.364   0.005   0.04   0.12 cluster_dlm_rp

One can view and restart processes with:

show processes fib_mgr location 0/0/CPU0
process restart fib_mgr location 0/0/CPU0

or 

process restart bgp location all

Process/thread/job debugging:

In this example BGP is stuck in a loop and chewing up one of four CPU cores at nearly 100% (25% overall)

RP/0/RSP0/CPU0:abr1#show proc cpu | ex 0%
Mon Dec 21 13:53:20.237 UTC

CPU utilization for one minute: 61%; five minutes: 54%; fifteen minutes: 53%

PID    1Min    5Min    15Min Process
57385    1%      1%       1% eth_server
176174   1%      1%       1% sysdb_svr_admin
192592   6%      3%       3% gsp
192605   2%      1%       1% sysdb_mc
192607   1%      1%       1% cluster_clm_rp
241776   1%      1%       1% sysdb_shared_nc
565552   1%      1%       1% ipv4_rib
569666   1%      1%       1% snmpd
569671   2%      2%       2% mibd_route
12124317   6%      6%       6% fib_mgr
12681378  14%     14%      14% mpls_lsd
12689629  23%     23%      23% bgp


RP/0/RSP0/CPU0:abr1#show processes distribution bgp all
Mon Dec 21 14:00:55.321 UTC
1 process found
NODE            PID        JID  #THR  TYPE PROGRAM
0/RSP0/CPU0   12689629       1049    23  RP   bgp


RP/0/RSP0/CPU0:abr1#show processes bgp loc 0/RSP0/CPU0

Mon Dec 21 13:48:53.521 UTC
                  Job Id: 1049
                     PID: 12689629
         Executable path: /disk0/iosxr-routing-4.3.4.sp10-1.0.0/0x100000/bin/bgp


RP/0/RSP0/CPU0:abr1#show processes threadname 1049 location 0/RSP0/CPU0
Mon Dec 21 13:50:27.921 UTC
JID    TID  ThreadName      pri state    TimeInState       NAME
1049   1    bgp-io-control  10 Receive        0:00:01:0425 bgp
1049   2    cdm_evm_thread  10 Receive      161:22:27:0247 bgp
1049   3                    10 Receive      161:22:27:0228 bgp
1049   4    bgp-rpki        10 Receive        0:00:00:0376 bgp
1049   5    spp_nsr_client_conn_monitor 10 Receive      161:22:25:0691 bgp
1049   6                    10 Sigwaitinfo  161:22:26:0585 bgp
1049   7    bgp-label       10 Reply          0:00:00:0000 bgp
1049   8    bgp-mgmt        10 Receive        0:01:17:0202 bgp
1049   9    bgp-rib-upd-0   10 Receive        0:00:00:0113 bgp
1049   10   lspv_lib BGPv4  10 Nanosleep      0:00:01:0148 bgp
1049   11   bgp-rib-upd-1   10 Receive        0:00:04:0836 bgp
1049   12   bgp-io-read     10 Receive        0:00:00:0001 bgp
1049   13   bgp-io-write    10 Receive        0:00:00:0001 bgp
1049   14   bgp-router      10 Receive        0:00:00:0000 bgp
1049   15   bgp-import      10 Receive        0:00:00:0000 bgp
1049   16   bgp-upd-gen     10 Receive        0:00:00:0153 bgp
1049   17   bgp-sync-active 10 Receive        0:01:04:0920 bgp
1049   18   bgp-crit-event  10 Receive        0:00:10:0717 bgp
1049   19   bgp-event       10 Receive        0:00:37:0026 bgp
1049   20   bgp-mib-trap    10 Receive      161:22:25:0509 bgp
1049   21   bgp-io-ka       10 Receive        0:00:01:0609 bgp
1049   22   bgp-l2vpn-thr   10 Receive        0:01:04:0847 bgp
1049   23   async           10 Receive      161:22:24:0439 bgp


RP/0/RSP0/CPU0:abr1#show processes blocked
Mon Dec 21 13:52:49.491 UTC
  Jid       Pid Tid            Name State   TimeInState    Blocked-on
65551     16399   1             ksh Reply  369:37:44:0916   16395  devc-conaux
   97     53279   2      umass-enum Reply  369:37:49:0704       1  kernel
   97     53279   6      umass-enum Reply  369:37:47:0517   57380  io-usb
   97     53279   7      umass-enum Reply  369:37:47:0517       1  kernel
65568     65568   2      devb-umass Reply    0:00:00:0041   57380  io-usb
   52     90154   2         attachd Reply  369:37:46:0151   57385  eth_server
   52     90154   3         attachd Reply  369:37:46:0108   24595  mqueue
   89     90155   6            qnet Reply    0:00:03:0533   57385  eth_server
   51     90161   2   attach_server Reply  369:37:45:0843   24595  mqueue
  432    192602   1     tftp_server Reply  161:37:06:0385   24595  mqueue
  315    557276   2         lpts_fm Reply    0:00:00:0239  245879  lpts_pa
 1049  12689629   7             bgp Reply    0:00:00:0002 12681378  mpls_lsd
  326    569668   7     mibd_entity Reply    0:00:00:0003  192605  sysdb_mc
65870  21451086   1            exec Reply    0:00:00:0258       1  kernel
65877  21598549   1            more Reply    0:00:00:0010   24593  pipe
65878  21598550   1  show_processes Reply    0:00:00:0000       1  kernel

RP/0/RSP0/CPU0:abr1#show proc pidin | in bgp

RP/0/RSP0/CPU0:abr1#follow process 12689629

RP/0/RSP0/CPU0:abr1#follow process 12689629 thread 7 verbose location 0/RSP0/CPU0


Below the "ce_switch" process (the EOBC process in IOS-XR) is stuck on the active RSP in an ASR9006 chewing up one of our CPU cores at 100% (so 25% overall):

RP/0/RSP1/CPU0:ASR9006#show processes distribution ce_switch all
Wed Sep 20 15:19:31.209 BST
2 processes found
NODE            PID        JID  #THR  TYPE PROGRAM
0/RSP0/CPU0     106542         54    17  RP   ce_switch
0/RSP1/CPU0     106542         54    17  RP   ce_switch


RP/0/RSP1/CPU0:ASR9006#show proc cpu location 0/RSP1/CPU0 | ex 0%
Wed Sep 20 15:24:14.247 BST

CPU utilization for one minute: 26%; five minutes: 26%; fifteen minutes: 26%

PID    1Min    5Min    15Min Process
106542  25%     25%      25% ce_switch




RP/0/RSP1/CPU0:ASR9006#sh proc block location 0/RSP1/CPU0
Wed Sep 20 15:20:35.944 BST
  Jid       Pid Tid            Name State   TimeInState    Blocked-on
65548     12300   1             ksh Reply 18114:21:39:0973   12299  devc-conaux
   95     36892   2      umass-enum Reply 18114:21:41:0191       1  kernel
   95     36892   6      umass-enum Reply 18114:21:38:0829  106546  io-usb
   95     36892   7      umass-enum Reply 18114:21:38:0829       1  kernel
   53    102433   2         attachd Reply 18114:21:41:0542   98343  eth_server
   53    102433   3         attachd Reply 18114:21:41:0543   16399  mqueue
   87    102441   6            qnet Reply    0:00:00:0023   98343  eth_server
   52    106543   2   attach_server Reply 18114:21:41:0507   16399  mqueue
65587    106547   2      devb-umass Reply    0:00:08:0132  106546  io-usb
  443    307290   1     tftp_server Reply  912:30:15:0642   16399  mqueue
  212    340088   1          envmon Mutex    0:00:01:0948  340088-05 #1
  326    667887   2         lpts_fm Reply    0:00:00:0051  352397  lpts_pa
 1181    676137  13       l2vpn_mgr Reply 6021:12:51:0240  680280  lspv_server
 1048    680281  12        mpls_ldp Reply 6021:12:51:0251  680280  lspv_server
65882 542187866   1            exec Reply    0:00:00:0077       1  kernel
 1054    684387  11             bgp Reply 18114:17:00:0032  680280  lspv_server
65899 542634347   1            more Reply    0:00:00:0013   16397  pipe
65901 542392685   1            exec Reply    0:05:26:0223  667876  devc-vty
65902 542634350   1  show_processes Reply    0:00:00:0000       1  kernel


RP/0/RSP1/CPU0:ASR9006#sh processes threadname location 0/RSP1/CPU0 | i "NAME|ce_switch"
Wed Sep 20 15:21:09.067 BST
JID    TID  ThreadName      pri state    TimeInState       NAME
54     1    main            10 Receive        0:00:01:0634 ce_switch
54     2                    10 Receive     18114:21:30:0688 ce_switch
54     3    bcmDPC          50 Running        0:00:00:0000 ce_switch
54     4    bcmCNTR.0       10 Nanosleep      0:00:00:0069 ce_switch
54     5    bcmTX           56 Sem         18114:22:09:0575 ce_switch
54     6    bcmXGS3AsyncTX  56 Sem         18114:22:09:0573 ce_switch
54     7    bcmLINK.0       50 Nanosleep      0:00:00:0039 ce_switch
54     8    pfm_svr         10 Receive        0:00:00:0451 ce_switch
54     9    interruptThread 56 Intr           0:00:00:0000 ce_switch
54     10   udld_thread     56 Receive        0:00:00:0027 ce_switch
54     11   clm punch counter 10 Receive        0:00:00:0009 ce_switch
54     12   bcmRX           56 Nanosleep      0:00:00:0004 ce_switch
54     13   clm_eth_server_rx_thread 10 Receive        0:00:00:0399 ce_switch
54     14   clm_timer_event 10 Receive     18114:21:30:0688 ce_switch
54     15   clm status thread 10 Receive        0:00:00:0057 ce_switch
54     16   clm_active_eobc_periodic_update_thread 10 Nanosleep      0:00:00:0460 ce_switch
54     17   async           10 Receive        5:48:11:0857 ce_switch


RP/0/RSP1/CPU0:ASR9006#show proc pidin location 0/RSP1/CPU0 | in "STATE|ce_switch"
Wed Sep 20 15:23:32.012 BST
     pid tid name               prio STATE       Blocked
  106542   1 pkg/bin/ce_switch   10r RECEIVE     1
  106542   2 pkg/bin/ce_switch   10r RECEIVE     5
  106542   3 pkg/bin/ce_switch   50r SEM
  106542   4 pkg/bin/ce_switch   10r NANOSLEEP
  106542   5 pkg/bin/ce_switch   56r SEM         f9dec890
  106542   6 pkg/bin/ce_switch   56r SEM         f9dec8a4
  106542   7 pkg/bin/ce_switch   50r NANOSLEEP
  106542   8 pkg/bin/ce_switch   10r RECEIVE     8
  106542   9 pkg/bin/ce_switch   56r INTR
  106542  10 pkg/bin/ce_switch   56r RECEIVE     15
  106542  11 pkg/bin/ce_switch   10r RECEIVE     13
  106542  12 pkg/bin/ce_switch   56r NANOSLEEP
  106542  13 pkg/bin/ce_switch   10r RECEIVE     23
  106542  14 pkg/bin/ce_switch   10r RECEIVE     19
  106542  15 pkg/bin/ce_switch   10r RECEIVE     26
  106542  16 pkg/bin/ce_switch   10r NANOSLEEP
  106542  17 pkg/bin/ce_switch   10r RECEIVE     29



RP/0/RSP1/CPU0:ASR9006#follow process 106542 stackonly iteration 1

RP/0/RSP1/CPU0:ASR9006#top
Wed Sep 20 15:26:41.362 BST
Computing times...
365 processes; 1934 threads;
CPU states: 69.0% idle, 30.7% user, 0.1% kernel
Memory: 6144M total, 3286M avail, page size 4K
Time: Wed Sep 20 15:28:33.814 BST

   JID   TID LAST_CPU PRI STATE  HH:MM:SS      CPU  COMMAND
    54     3   2      50 Run  1487:43:37    17.47% ce_switch
    54     9   1      56 Intr  464:55:26     5.74% ce_switch
   426     3   2      10 Rcv    16:17:58     0.66% sysdb_mc
    61    15   1      10 Rcv    42:37:35     0.57% eth_server
   251    11   1      10 CdV    28:25:13     0.45% gsp
   251    13   1      10 CdV    32:14:29     0.44% gsp
   251    12   3      10 CdV    27:40:11     0.44% gsp
    61     1   1      10 Rcv    42:35:16     0.43% eth_server
    54     4   2      10 NSlp   46:58:51     0.40% ce_switch
   232     1   1      10 Rcv   199:31:01     0.38% fiarsp


RP/0/RSP1/CPU0:ASR9006#follow process 106542 stackonly thread 3 verbose location 0/RSP1/CPU0
Wed Sep 20 15:29:53.565 BST

Attaching to process pid = 106542 (pkg/bin/ce_switch)

Iteration 1 of 5
------------------------------

Current process = "pkg/bin/ce_switch", PID = 106542 TID = 3 (bcmDPC)
EAX 0x00000000  EBX 0x00000000  ECX 0x0415df50  EDX 0x08297252
ESI 0xffffffff  EDI 0x10000368  EIP 0x08297252
ESP 0x0415df50  EBP 0x0415df6c  EFL 0x00001046
PC  0x08297252  FP  0x0415df6c  RA  0x087805aa
Priority: 50     real_priority: 50
Last system call: 85
pid: 106542
State: Running
Elapsed Time(h:m:s:ms): 1487:44:33:0051


trace_back: #0 0x087c1222 [soc_schan_op]

ENDOFSTACKTRACE

One can observe if CEF is OOR (out-of-resources) with "show cef platform resource summary location 0/0/CPU0" or "show cef platform oor location 0/0/CPU0"

abr1#show cef platform oor location 0/0/CPU0
The "PD USAGE" column should not be relied upon
for accurate tracking of the PD resources.
This is due to Asynchronous nature of CEF programming
between PD and PRM.

OBJECT              PD USAGE(MAX)       PRM USAGE(MAX)      PRM CREDITS

RPF_STRICT(0)       0(262144)           0(262144)           262144
IPv4_LEAF_P(1)      573380(4194304)     573380(4194304)     3620924
IPv6_LEAF_P(2)      37962(2097152)      286690(2097152)     1810462
LEAF(3)             573539(4194304)     573539(4194304)     3620765
TX_ADJ(4)           799(524288)         799(524288)         523489
NR_LDI(5)           1336(2097152)       1336(2097152)       2095816
TE_NH_ADJ(6)        0(65536)            0(65536)            65536
RX_ADJ(7)           71(131072)          71(131072)          131001
R_LDI(8)            901(131072)         901(131072)         130171
L2VPN_LDI(9)        2(32768)            2(32768)            32766
EXT_LSPA(10)        0(524288)           630(524288)         523658
IPv6_LL_LEAF_P(11)  0(262144)           0(262144)           262144


RP/0/RSP0/CPU0:abr1#show cef platform resource location 0/0/CPU0
Wed Dec 23 12:11:12.785 UTC

        Node: 0/0/CPU0
----------------------------------------------------------------

IP_LEAF_P usage is same on all NPs

NP: 0  struct 23: IP_LEAF_P       (maps to ucode stru = 9)

Used Entries: 11066 Max Entries: 4194304
 -------------------------------------------------------------

IP_LEAF_P usage is same on all NPs

NP: 0  struct 24: IP_LEAF_P       (maps to ucode stru = 42)

Used Entries: 483 Max Entries: 2097152
 -------------------------------------------------------------

NP: 0  struct 4: IP_LEAF         (maps to ucode stru = 11)

Used Entries: 573471 Max Entries: 4194304
 -------------------------------------------------------------

NP: 1  struct 4: IP_LEAF         (maps to ucode stru = 11)

Used Entries: 573470 Max Entries: 4194304
 -------------------------------------------------------------

RP/0/RSP0/CPU0:abr1#show cef resource location 0/0/cPU0
Wed Dec 23 12:01:33.203 UTC
CEF resource availability summary state: GREEN
CEF will work normally
  ipv4 shared memory resource: GREEN
  ipv6 shared memory resource: GREEN
  mpls shared memory resource: GREEN
  common shared memory resource: GREEN
  DATA_TYPE_TABLE_SET hardware resource: GREEN
  DATA_TYPE_TABLE hardware resource: GREEN
  DATA_TYPE_IDB hardware resource: GREEN
  DATA_TYPE_IDB_EXT hardware resource: GREEN
  DATA_TYPE_LEAF hardware resource: GREEN
  DATA_TYPE_LOADINFO hardware resource: GREEN
  DATA_TYPE_PATH_LIST hardware resource: GREEN
  DATA_TYPE_NHINFO hardware resource: GREEN
  DATA_TYPE_LABEL_INFO hardware resource: GREEN
  DATA_TYPE_FRR_NHINFO hardware resource: GREEN
  DATA_TYPE_ECD hardware resource: GREEN
  DATA_TYPE_RECURSIVE_NH hardware resource: GREEN
  DATA_TYPE_TUNNEL_ENDPOINT hardware resource: GREEN
  DATA_TYPE_LOCAL_TUNNEL_INTF hardware resource: GREEN
  DATA_TYPE_ECD_TRACKER hardware resource: GREEN
  DATA_TYPE_ECD_V2 hardware resource: GREEN
  DATA_TYPE_ATTRIBUTE hardware resource: GREEN
  DATA_TYPE_LSPA hardware resource: GREEN
  DATA_TYPE_LDI_LW hardware resource: GREEN
  DATA_TYPE_LDSH_ARRAY hardware resource: GREEN
  DATA_TYPE_TE_TUN_INFO hardware resource: GREEN
  DATA_TYPE_DUMMY hardware resource: GREEN
  DATA_TYPE_IDB_VRF_LCL_CEF hardware resource: GREEN
  DATA_TYPE_TABLE_UNRESOLVED hardware resource: GREEN
  DATA_TYPE_MOL hardware resource: GREEN
  DATA_TYPE_MPI hardware resource: GREEN
  DATA_TYPE_SUBS_INFO hardware resource: GREEN
  DATA_TYPE_GRE_TUNNEL_INFO hardware resource: GREEN
  DATA_TYPE_LISP_RLOC hardware resource: GREEN
  DATA_TYPE_LSM_ID hardware resource: GREEN
  DATA_TYPE_INTF_LIST hardware resource: GREEN
  DATA_TYPE_TUNNEL_ENCAP_STR hardware resource: GREEN
  DATA_TYPE_LABEL_RPF hardware resource: GREEN
  DATA_TYPE_L2_SUBS_INFO hardware resource: GREEN
  DATA_TYPE_LISP_IID_MAPPING hardware resource: GREEN
  DATA_TYPE_LISP_RLOC_TBL hardware resource: GREEN

Trident-based line card can support a maximum of 512,000 Layer 3 (L3) prefixes by default. Typhoon line cards support a maximum of four million IPv4 and two million IPv6 prefixes by default. Both can tuned by changing the scaling profile.

IPv4/VPNv4/IPv6/VPNv6 current and max prefix limits:

#show cef misc | inc Num cef entries
Thu Oct 12 08:57:28.193 UTC
    Num cef entries : 653811 gbl, 22108 vrf ! IPv4 GRT prefixes, VPNv4 prefixes
    Num cef entries : 40883 gbl, 707 vrf    ! IPv6 GRT prefixes, VPNv6 prefixes

#show controllers np struct IPV4-LEAF-FAST-P np0 | in Entries
Reserved Entries: 0, Used Entries: 653842, Max Entries: 4194304   ! IPv4 GRT prefixes, IPv4 GRT and VPNv4 routes share the same 4M limit

#show controllers np struct IPV4-LEAF-P np0 | in Entries
Reserved Entries: 0, Used Entries: 22291, Max Entries: 4194304    ! VPNv4 prefixes, IPv4 GRT and VPNv4 routes share the same 4M limit

#show controllers np struct IPV6-LEAF-P np0 | in Entries
Reserved Entries: 0, Used Entries: 41591, Max Entries: 2097152    ! There is one shared counter for IPv6 GRT and VPNv6 prefixes, they also share the same TCAM space as IPv4 but use two TCAM entries per IPv6 prefixs, so max is 2M On Typhoon

#show controllers np struct Label-UFIB np0 | in Entries
Reserved Entries: 0, Used Entries: 216475, Max Entries: 2097152   ! MPLS TCAM space used and free; device both numbers in half, 108k labels in use and 1M entries available. Typhoon TCAM has in+out label entries so it displays 1M label space as 2M spaces.

 

Forwarding Plane:

When working with issues to do with routes not being programmed into hardware, label recycling, state cef entries etc, the following actions can be performed which should be non service affecting:

! Restart the IPv4 RIB manager process, it does not stop forwarding
process restart ipv4_rib
!wait 10s-20s
process restart ipv4_rib

! Then redownload cef into the line MPA/LC
clear cef linecard location 0/0/CPU0

! Restart the MPLS labal switch database manager process, as above doesn't remove the entries or stop forwarding, its the manager process
process restart mpls_lsd

 

One can view  which features are enabled at the hardware level to check for a discrepancy between what is configured in software and what has been pushed down into the hardware:

RP/0/RSP0/CPU0:abr1#show uidb data location 0/0/CPU0 TenGigE 0/0/2/0 ingress | i Enable.*0x1
  QOS Enable                       0x1
  MPLS Enable                      0x1

RP/0/RSP0/CPU0:abr1#show uidb data shadow location 0/0/CPU0 TenGigE 0/0/2/0 ingress | i Enable.*0x1
  QOS Enable                       0x1
  MPLS Enable                      0x1

RP/0/RSP0/CPU0:abr1#show uidb data compare location 0/0/CPU0 Te0/0/2/0 ingress
--------------------------------------------------------------------------
  Location = 0/0/CPU0
  Ifname = TenGigE0_0_2_0
  Index = 19
  INGRESS table
------------
  Layer 3
------------
  No differences were detected between hardware and shadow memory.

One can see interface indexes with "show uidb index"

 

Line Card / MPA Level:

To clear stale CEF information and trigger a reprogram use "clear cef linecard location ..."

To show ASIC errors on a line card "show asic-errors all location 0/0/CPU0"

To reboot a linecard "hw-module location 0/0/cpu0 reload".

Check MPA online status with "admin show shelfmgr status location 0/1/CPU0".

Check the fabric connectivity of the line card:
show asic-errors arbiter 0 all location 0/RSP0/CPU0
show asic-errors crossbar 0 all location 0/RSP0/CPU0
show asic-errors fia 0 all location 0/1/CPU0

Show the prefix-carrying capacity of the linecard: "show tbm ipv4 unicast dual detail location <loc>"

Show the number of free and used pages per memory channel (memory useage for prefixes): "show plu server summary ingress location 0/0/cpu0"

 

NP / Bridge / FIA Level:

Check overall NP health:
show controllers np summary all
show controllers np counters all

Check which NP a port is attached to using "show controllers np ports all".

View NP statistics (the meanings of some of these statistics are here, if the version of IOS-XR supports it they should all be described in the output from "show controllers np descriptions location 0/0/CPU0"):

RP/0/RSP0/CPU0:abr1#show controllers np counters np0 location 0/0/CPU0
Mon Dec 21 12:11:01.109 UTC

                Node: 0/0/CPU0:
----------------------------------------------------------------

Show global stats counters for NP0, revision v2

Read 99 non-zero NP counters:
Offset  Counter                                         FrameValue   Rate (pps)
-------------------------------------------------------------------------------
   0  NULL_STAT_0                                          3170536           6
  16  MDF_TX_LC_CPU                                       54875532         109
  17  MDF_TX_WIRE                                      97615034464      246143
  21  MDF_TX_FABRIC                                    88731095971      240104
  29  PARSE_FAB_RECEIVE_CNT                            97604054631      246124
  33  PARSE_INTR_RECEIVE_CNT                              12816383          22
  37  PARSE_INJ_RECEIVE_CNT                               30960923          30
  41  PARSE_ENET_RECEIVE_CNT                           88918940116      240595
  45  PARSE_TM_LOOP_RECEIVE_CNT                           41453197         135
  49  PARSE_TOP_LOOP_RECEIVE_CNT                         878705137        3259
  57  PARSE_ING_DISCARD                                    5305245           9
  63  PRS_HEALTH_MON                                       2788521           5
  68  INTR_FRAME_TYPE_3                                     185276           0
  72  INTR_FRAME_TYPE_7                                    6875512          12
...

The ASR9000 series NPUs have thousands of registers that store information; some are counters, some are settings, some are flags etc. This text file shows all the NPU registers on an ASR9001 (Typhoon NPU). It also shows at the start the "blocks" that the registers are grouped together in. One can either query for the value of a specific register using "show controllers np register <reg-id> np<np-id>" or for a group/block of registers using "show controllers np register block <block-id> np<np-id>".

View bridge statistics with "show controllers fabric fia bridge stats location 0/0/CPU0" (bridge is non-blocking so drops are very rare here, any drops are likely QoS back-pressure downstream).

Via Fabric Interconnect ASIC statistics with "show controllers fabric fia stats location 0/0/CPU0".

Check for drops across the NPs, Bridges and FIAs (reset the stats with "clear controller np counters all"):

RP/0/RSP0/CPU0:abr1#show drops location 0/0/CPU0
Mon Dec 21 14:54:19.769 UTC

                Node: 0/0/CPU0:
----------------------------------------------------------------

NP 0 Drops:
----------------------------------------------------------------
PARSE_EGR_INJ_PKT_TYP_UNKNOWN                                503011
PARSE_DROP_IN_UIDB_TCAM_MISS                                 174899472
PARSE_DROP_IN_UIDB_DOWN                                      37
PARSE_DROP_IPV6_DISABLED                                     343125
PARSE_L3_TAGGED_PUNT_DROP                                    2043450
UNKNOWN_L2_ON_L3_DISCARD                                     3009072
RSV_DROP_ING_BFD                                             5
RSV_DROP_ING_IFIB_OPT                                        1
RSV_DROP_MPLS_RXADJ_DROP                                     7
RSV_DROP_IPV4_NRLDI_NOT_LOCAL                                1
RSV_DROP_EGR_LAG_NO_MATCH                                    1
RSV_DROP_IPV4_URPF_CHK                                       159339
RSV_DROP_MPLS_LEAF_NO_MATCH                                  5618
RSV_DROP_IN_L3_NOT_MYMAC                                     2
RSV_ING_VPWS_ERR_DROP                                        56
PUNT_NO_MATCH_EXCD                                           8406
PUNT_IPV4_ADJ_NULL_RTE_EXCD                                  299166
MDF_PUNT_POLICE_DROP                                         307572
MODIFY_PUNT_REASON_MISS_DROP                                 1
----------------------------------------------------------------

NP 1 Drops:
----------------------------------------------------------------
PARSE_EGR_INJ_PKT_TYP_UNKNOWN                                18531
PARSE_DROP_IN_UIDB_TCAM_MISS                                 147
PARSE_DROP_IN_UIDB_DOWN                                      18
PARSE_DROP_IPV6_DISABLED                                     49315
PARSE_L3_TAGGED_PUNT_DROP                                    1459380
UNKNOWN_L2_ON_L3_DISCARD                                     19469
RSV_DROP_ING_BFD                                             1
RSV_DROP_IPV4_NRLDI_NOT_LOCAL                                10
RSV_DROP_IPV4_URPF_CHK                                       170600
RSV_DROP_MPLS_LEAF_NO_MATCH                                  3
RSV_ING_VPWS_ERR_DROP                                        1
PUNT_IPV4_ADJ_NULL_RTE_EXCD                                  22351
MDF_PUNT_POLICE_DROP                                         22351
MODIFY_PUNT_REASON_MISS_DROP                                 1
----------------------------------------------------------------

No Bridge 0 Drops
----------------------------------------------------------------

No Bridge 1 Drops
----------------------------------------------------------------

FIA 0 Drops:
----------------------------------------------------------------
Total drop:                                                  6
Ingress drop:                                                6
Ingress sp0 align fail                                       3
Ingress sp1 align fail                                       3
----------------------------------------------------------------

FIA 1 Drops:
----------------------------------------------------------------
Total drop:                                                  7
Ingress drop:                                                7
Ingress sp0 align fail                                       3
Ingress sp1 crc err                                          1
Ingress sp1 align fail                                       3
----------------------------------------------------------------

One can look into the hardware structs that are hard limits on NPs/LCs:

! Shows prefix usage for compressed ACL tables
show controller np struct SACL-PREFIX summary loc0/0/CPU0

! Show the number of VRRP MAC entries
show controller np struct 33 det all-entries np0 loc 0/0/CPU0


! Below, show TCAM usage

RP/0/RSP0/CPU0:abr1#show controllers np struct TCAM-Results summary
Mon Dec 21 12:19:28.977 UTC

                Node: 0/0/CPU0:
----------------------------------------------------------------
NP: 0  Struct 0: TCAM_RESULTS
Struct is a PHYSICAL entity
Reserved Entries: 0, Used Entries: 351, Max Entries: 524288

NP: 1  Struct 0: TCAM_RESULTS
Struct is a PHYSICAL entity
Reserved Entries: 0, Used Entries: 303, Max Entries: 524288


! Show IPv6 usage in search memory
show controllers np struct IPV6-LEAF-P np0

! Show IPv4 usage in search memory
show controllers np struct IPV4-LEAF-FAST-P np0
show controllers np struct IPV4-LEAF-P np0

! Show MPLS label usage in search memory
show controllers np struct Label-UFIB summary

! Show LFIB summary
show mpls forwarding summary

! Show all search memory usage
show cef platform resource summary location 0/0/CPU0

One can look for errors in the NP drivers because not all NP errors trigger syslog messages / alarms / etc and meaning they are "silent" unless the NP driver log is explicitly checked:

show controllers np drvlog

One can also check for NP interruptes, the count "Cnt" should be zero for all of them in normal working conditions:

show controllers np interrupts all al

 

PHY / MAC / Interface / NP Level:

Show interface statistics with "show controllers te0/0/1/0 stats". Show live interface statistics with "monitor interface te0/0/1/0".

Show interface controller/device/driver configuration and state with "show controllers te0/0/1/0 all|control|internal|mac|phy|regs|xgxs", "show ethernet infra internal all trunks"

View the interface uIDB (Micro Interface Descriptor Block), the NP uIDB for a specific interface, with "show uidb data location 0/0/CPU0 Gi0/0/0/6.100 ingress | ex 0x0". This will show which features are enabled in hardware for this interface. This is also linked to LTPS, if a feature is missing such as BGP for example then BGP packets to the control-plane on this interface will be filtered.

Similarly the command "show im database interface Hu0/1/0/3.3003 [detail]" shows information about the interface flags, encapsulation type, protocols enabled on an interface, the MTU of those protocols.