Date created: Saturday, July 23, 2016 11:01:27 AM. Last modified: Sunday, December 3, 2023 4:28:59 PM

Forwarding Hardware

References:
http://www.cisco.com/networkers/nw04/presos/docs/RST-2311.pdf
https://en.wikipedia.org/wiki/Content-addressable_memory#Ternary_CAMs
http://www.firstpr.com.au/ip/sram-ip-forwarding/router-fib/#Cisco_others_TCAM
http://www.cisco.com/c/dam/global/sr_rs/assets/events/expo_08/pdfs/Arhitektura_C7600_Aleksandar_Vidakovic.pdf
https://supportforums.cisco.com/document/63276/acl-tcam-and-lous-catalyst-6500
BRKARC-2350 - IOS Routing Internals (2013)
Inside Cisco IOS Software Architecture

Contents:
Linear Lookups
CAM vs TCAM vs RAM
Cisco TCAM Specifics
Cisco EARL (TCAM)
Task Optimised Processors (TOPs)

Linear Lookups

A standard ACL consisting of N ACEs will take O(n) time to read every entry linearly. Using compiled ACLs hashing the packet to generate a table lookup ID for a table of stored hashed ACEs provides O(1) lookup time:

The same performance limitation applies to forwarding lookups if a flat prefix list is used, O(n), in the case that the list is "ordered" circa 50% of lookups would match no later than half way through the table to give an average upper bound of O(n/2):

Using a binary trie for forwarding lookups instead of a flat table improves efficiency by cutting out 50% of the tree branches, the initial tree root will be 0 or 1, and so on down the trie, producing O(log(n)) for the upper bound:

Cisco devices with distributed control plane and forwarding planes (6500/7600s, Nexus 7000s, ASR9000s etc) take routing information from multiple protocols (administrative distance is the tie breaker amongst protocols offering the same route) and build a RIB in software. The RIB contains the next hop (the RIB is usually a hash index table). The route with next hop info, egress interface info and egress layer 2 rewrite info is all built into the FIB which sits in memory (software CEF, often as a trie) which means all the required forwarding information is provided in a single lookup to the FIB. The FIB is then programmed into hardware (hardware CEF) in TCAMs for line rate lookups in high end routers.

 

Cisco uses a proprietary form of trie code to create what they call call M-Tries (for the software FIB/software CEF tables). Many Cisco routers that use m-tries use 256-way mtrie structures (with each node having 256 children, so after 4 "levels" all 2^32 bit IPv4 addresses are covered), the child of each of the final octet nodes is CEF adjacency table entry (next hop details) for that node (IP route):­­­­

In the output below we can see a Cisco ASR920 is using a stride pattern of 8-8-8-8 for it's MTRIE:

ASR920#show ip cef tree
Table Default tree information:
 MTRIE/RTREE storing IPv4 addresses
 21 entries (21/0 fwd/non-fwd)
 Forwarding tree:
  Forwarding lookup routine: IPv4 mtrie 8-8-8-8 optimized
  69 inserts, 48 deletes
  8-8-8-8 stride pattern
  short mask protection enabled for <= 4 bits without process suspension
  21 leaves (588 bytes), 22 nodes (22880 bytes)
  25132 total bytes
  leaf ops: 69 inserts, 48 deletes
  leaf ops with short mask protection: 4 inserts, 2 deletes
  per-prefix length stats: lookup off, insert off, delete off
  refcounts: 5686 leaf, 5632 node
  node pools:
   pool[C/8 bits]: 22 allocated (0 failed), 22880 bytes
 Non-Forwarding tree:
  115 inserts, 115 deletes
  0 leaves (0 bytes), 0 nodes (0 bytes)
  0 total bytes

 

CAM vs TCAM vs RAM

Content-addressable memory (CAM) is a special type of computer memory used in certain very-high-speed searching applications. It compares input search data against a table of stored data, and returns the address of matching data (or in the case of associative memory, the matching data). Unlike standard computer memory (RAM) in which the user supplies a memory address and the RAM returns the data word stored at that address, a CAM is designed such that the user supplies a data word and the CAM searches its entire memory to see if that data word is stored anywhere in it. If the data word is found, the CAM returns a list of one or more storage addresses where the word was found (and in some architectures, it also returns the contents of that storage address, or other associated pieces of data). Thus a CAM is the hardware embodiment of what in software terms would be called an associative array.

Because a CAM is designed to search its entire memory in a single operation, it is much faster than RAM in virtually all search applications. There are cost disadvantages to CAM however. Unlike a RAM chip, which has simple storage cells, each individual memory bit in a fully parallel CAM must have its own associated comparison circuit to detect a match between the stored bit and the input bit. Additionally, match outputs from each cell in the data word must be combined to yield a complete data word match signal. The additional circuitry increases the physical size of the CAM chip which increases manufacturing cost. The extra circuitry also increases power dissipation since every comparison circuit is active on every clock cycle. Consequently, CAM is only used in specialized applications where searching speed cannot be accomplished using a less costly method. To achieve a different balance between speed, memory size and cost, some implementations emulate the function of CAM by using standard tree search or hashing designs in hardware, using hardware tricks like replication or pipelining to speed up effective performance.

Binary CAM is the simplest type of CAM which uses data search words consisting entirely of 1s and 0s. Ternary CAM (TCAM) allows a third matching state of "X" or "don't care" for one or more bits in the stored data word, thus adding flexibility to the search. For example, a ternary CAM might have a stored word of "10XX0" which will match any of the four search words "10000", "10010", "10100", or "10110". The added search flexibility comes at an additional cost over binary CAM as the internal memory cell must now encode three possible states instead of the two of binary CAM. This additional state is typically implemented by adding a mask bit ("care" or "don't care" bit) to every memory cell.

Ternary CAMs are often used in network devices, where each address has two parts: the network address, which can vary in size depending on the subnet configuration, and the host address, which occupies the remaining bits. Each subnet has a network mask that specifies which bits of the address are the network address and which bits are the host address. Routing is performed by consulting a routing table maintained by the device which contains each known destination network address, the associated network mask, and the information needed to route packets to that destination. Without CAM, the router compares the destination address of the packet to be routed with each entry in the routing table, performing a logical AND with the network mask and comparing it with the network address. If they are equal, the corresponding routing information is used to forward the packet. This is a basic LPM match which gives an upper bound of O(n). Using a ternary CAM for the routing table makes the lookup process very efficient. The addresses are stored using "don't care" for the host part of the address, so looking up the destination address in the CAM immediately retrieves the correct routing entry; both the masking and comparison are done by the CAM hardware. This works if (a) the entries are stored in order of decreasing network mask length, and (b) the hardware returns only the first matching entry; thus, the match with the longest network mask (longest prefix match) is used but much faster than iterating through a flat table.

Updates to TCAM hardware can be expensive for the CPU though because the TCAM efficiency heavily relies or prefix information being stored in order.

The TCAMs often return a result which is a location in SRAM, the location in SRAM stores the egress interface details (or alternatively an internal punt interface ID or drop interface ID) and the rewrite information for the egress path. SRAM is significantly faster than DRAM (its is the typical kind of RAM used for CPU caches) and doesn't need refershing but this means that both TCAM and SRAM are expensive with regards to manufactoring costs and power consumption. This in turn means that even though "memory is cheap" an RSP720-3CXL for a Cisco 7600 chassis, has a fixed 1 million IPv4 limit in TCAM/SRAM, the Cisco ASR9001 has a 4 million IPv4 prefix TCAM limit.

Various Cisco products (6500/7600, ASR9000s) use TCAM forwarding and can use multiple TCAMS, one for FIB (packet forwarding lookups), one for ACLs (packet filtering lookups), one for QoS rules, and so on. The ASR9000 1st and 2nd generation line cards uses the EZchip NP-3C ASICs for forwarding lookups (with external RAM, more detail below/here) and the 3rd generation line cards use the Broadcom Tomahawk ASIC. Various Juniper routers don't use TCAMs, the M120 and MX960 both instead uses a chip called the I-chip with RLDRAM ("Reduced Latency" RAM which is faster than DRAM but slower than SRAM). Some Juniper switches however like the EXs use Marvell chips for TCAM like operations.

Juniper also have the Trio chip set for the MX range (I-Chip/DPC cards being phased out, DPCE too?) which like the I-Chip use external RLDRAM. The Trio chipset contains a "Lookup Block". A key feature in the Lookup Block is that it supports Deep Packet Inspection (DPI) and is able to look over 256 bytes into the packet. As packets are received by the Buffering Block, the packet headers are sent to the Lookup Block for additional processing. All processing is completed in one pass through the Lookup Block regardless of the complexity of the workflow. Once the Lookup Block has finished processing, it sends the modified packet headers back to the Buffering Block to send the packet to its final destination. In order to process data at line rate, the Lookup Block has a large bucket of reduced-latency dynamic random access memory (RLDRAM) that is essential for packet processing.

RAM typically stores data in single octet cells that are usually individually addressable. A four byte integer is stored in four continuous cells in RAM for example. TCAM tends to stored data in fixed blocks that are large enough to store an entire route in one block or "cell". Typical sizes are 72 bits on the RSP720 & ES+ NP-3C for 7600s (L2 TCAM on the Superman ASIC) and Wahoo TCAM on 12000's, double sized blocks (144 bits) are used for IPv6 and multicast  routing entries on these platforms (288 bits blocks are even used on the Wahoo TCAM), ASR920's use 80 bit blocks on the Cylon ASIC TCAM and again 80 bit blocks/cells on the NILE ASIC TCAM in ME3600X/ME3800X. Each block of e.g. 72 bits/9 bytes (or whatever size is used) falls under a single address to read from. 72-bit TCAM blocks are 8 byte (64-bit) blocks + 1 byte of parity. Thus 144-bit blocks are 2x 8 byte blocks each with 1 byte of parity, etc.

 

 

 

Neither TCAMs or Tries in RLDRAM/SRAM are a better solution than the other, they offer different performance advantages in different scenarios:

Because Cisco use a funnelling approach of protocol RIBs feeding into a central RIB, which is then compiled into a FIB with the forwarding and adjacency information included, which is then programmed into a hardware TCAM; learning the same routes from multiple sources doesn't increase usage within the TCAM. For example 3 eBGP peers sending the same routes into the BGP RIB, BGP will chose the single best path amongst the three to offer into the central RIB. And/or if other protocols are offering the same routes into the central RIB, only the single best path is chosen. This is compiled into the FIB and programmed into hardware. This is shown in the diagram below:

Exceptions to this include bonded links at layer 2 (LACP etc) and (un-)ECMP at layer 3 (BGP multipath etc). In these cases where multiple forwarding paths exist a hash function is used to return a different result from the TCAM in each lookup (by adding some variance to the lookup key).

A mixture of TCAM and Tree memory is very typical using TCAMs for ACls and QoS and Trees for forwarding:

 

 

 

TCAM In Detail:

Since TCAM scaling is costly compared to using a software algorithms (like hashing, tries, LPM etc) some vendors mix the methods, Cisco for example implement their M-Trie in hardware:

 

 

Cisco TCAM Specifics:

Cisco often use TCAMs for ACL and QoS lookups. In this example the Cisco 6500/7600 TCAM is explored. For ACLs inside TACM on a 6500/7600 the TCAM is split into Masks and Values. A value is an ACE which usually contains the source IP address, destination IP address, IP protocol number, source port and destination port (other values are supported depending on the TCAM in use). The Mask is a bitmask (like a subnet mask in IP addressing) which says which parts of the Value needed to be checked.

For example, assume the Value is 0x0A000001 0x00000000 0x06 0x0000 0x0000 (the field order is as above so the source IP here is 10.0.0.1 to 0.0.0.0 which is any destination, IP protocol 0x06 is TCP and 0x0000 is any source and destination port). The mask is a 104 bit field (the Value field is 13 bytes long), if this ACL entries wants to match a source IP of 10.0.0.1 to any destination where the IP protocol is TCP the Mask value should be 0xFFFFFFFF 0x000000 0xFF 0x0000 0x0000.

In the 6500/7600 specifically, for each Mask eight Values can be assigned. Each value has its own Result (permit or deny for example). This helps to compress ACEs with matching properties (six ACEs that allow TCP port 25 from the Internet to six different individual internal host IPs would all have the same Mask, each mask would have its own result, and they would all be permit in this case).

The TCAM also supports Logical Operation Units (LOUs) which can be logical conditions like greater than, equal to, less than, not equal to or within a specific range (again the available instructions varies by TCAM). 6500 TCAM has 128 hardware LOUs which are a shared global resource to store values used in LOUs in more complex ACEs (like a port range statement, "TCP port equal to/greater than 25 and less than/equal to 30"). The LOUs are also used for matching TCP flags.

The 128 LOUs in the 6500 TCAM are divided in half for source and destination port operations and 2 LOUs are need to match a port range, so in total 32 source port ranges and 32 destination port ranges can be programmed into the hardware (this is total aggregate across all ACLs, not across ACEs within a single ACL). For the 6500/7600 any single ACL can use a maximum of 10 LOUs.

If there are more port ranges in the ACL than the available LOUs and L4OPs they are expanded in software into equivalent entries. This is known as PORT EXPANSION.

In the below outputs from two different Cisco 6500 series setups, the TCAM entries and SRAM references can be seen for ACL entries and IPv4 forwarding rules:

! Cisco 6513 Non-E-Chassis with SUP720, WS-X6748-GE-TX using DFC3B

6513#show mod
Mod Ports Card Type                              Model              Serial No.
--- ----- -------------------------------------- ------------------ -----------
  2    6  Firewall Module                        WS-SVC-FWM-1       XXXXXXXXXXX
  3    1  Application Control Engine Module      ACE30-MOD-K9       XXXXXXXXXXX
  7    2  Supervisor Engine 720 (Active)         WS-SUP720-3B       XXXXXXXXXXX
  9   48  CEF720 48 port 10/100/1000mb Ethernet  WS-X6748-GE-TX     XXXXXXXXXXX
 10    4  CEF720 4 port 10-Gigabit Ethernet      WS-X6704-10GE      XXXXXXXXXXX

Mod  Sub-Module                  Model              Serial       Hw     Status
---- --------------------------- ------------------ ----------- ------- -------
  3  ACE Expansion Card  1       ACEMOD-EXPN-DC     XXXXXXXXXXX  1.1    Ok
  3  ACE Expansion Card  2       ACEMOD-EXPN-DC     XXXXXXXXXXX  1.1    Ok
  7  Policy Feature Card 3       WS-F6K-PFC3B       XXXXXXXXXXX  2.3    Ok
  7  MSFC3 Daughterboard         WS-SUP720          XXXXXXXXXXX  2.6    Ok
  9  Distributed Forwarding Card WS-F6700-DFC3B     XXXXXXXXXXX  4.9    Ok
 10  Centralized Forwarding Card WS-F6700-CFC       XXXXXXXXXXX  4.2    Ok


6513#show tcam counts
           Used        Free        Percent Used       Reserved
           ----        ----        ------------       --------
 Labels:(in)  6        4066            0                    24
 Labels:(eg)  1        4071            0                    24

ACL_TCAM
--------
  Masks:     94        4002            2                    72
Entries:     79       32689            0                   576

QOS_TCAM
--------
  Masks:     34        4062            0                    25
Entries:     42       32726            0                   200

    LOU:      0         128            0
  ANDOR:      0          16            0
  ORAND:      0          16            0
    ADJ:      3        2045            0


6513#show platform hardware capacity acl
ACL/QoS TCAM Resources
  Key: ACLent - ACL TCAM entries, ACLmsk - ACL TCAM masks, AND - ANDOR,
       QoSent - QoS TCAM entries, QOSmsk - QoS TCAM masks, OR - ORAND,
       Lbl-in - ingress label, Lbl-eg - egress label, LOUsrc - LOU source,
       LOUdst - LOU destination, ADJ - ACL adjacency

  Module ACLent ACLmsk QoSent QoSmsk Lbl-in Lbl-eg LOUsrc LOUdst  AND  OR  ADJ
  7          1%     2%     1%     1%     1%     1%     0%     0%   0%  0%   1%
  9          1%     2%     1%     1%     1%     1%     0%     0%   0%  0%   1%


6513-dfc9#show tcam details acl
ACL_TCAM
               Lbl     DestAddr            SrcAddr    Dpt  Spt  L4OP TOS Frag More  prot type bank lkup Rec Xtag Mpls  Rslt/Cnt
17818 Entry :    1       10.192.1.0          0.0.0.0    0    0    0   0   0    0     0    0    0    1    0    0    0   C0080003
17819 Mask  : 4095    255.255.255.0          0.0.0.0    0    0    0   0   0    0     F    0    0    1    0    0    0      1




! Cisco 6506-E chassis, SUP2T, WS-X6816-10GE with DFC4...

6506-E#show mod
Mod Ports Card Type                              Model              Serial No.
--- ----- -------------------------------------- ------------------ -----------
  1   16  CEF720 16 port 10GE                    WS-X6816-10GE      XXXXXXXXXXX
  2   48  CEF720 48 port 10/100/1000mb Ethernet  WS-X6748-GE-TX     XXXXXXXXXXX
  3    3  ASA Service Module                     WS-SVC-ASA-SM1     XXXXXXXXXXX
  4    6  Firewall Module                        WS-SVC-FWM-1       XXXXXXXXXXX
  6    5  Supervisor Engine 2T 10GE w/ CTS (Acti VS-SUP2T-10G       XXXXXXXXXXX

Mod  Sub-Module                  Model              Serial       Hw     Status
---- --------------------------- ------------------ ----------- ------- -------
  1  Distributed Forwarding Card WS-F6K-DFC4-EXL    XXXXXXXXXXX  1.2    Ok
  2  Centralized Forwarding Card WS-F6700-CFC       XXXXXXXXXXX  3.1    Ok
 3/0 ASA Application Processor   SVC-APP-PROC-1     XXXXXXXXXXX  1.0    Ok
  6  Policy Feature Card 4       VS-F6K-PFC4XL      XXXXXXXXXXX  3.0    Ok
  6  CPU Daughterboard           VS-F6K-MSFC5       XXXXXXXXXXX  3.0    Ok


6506-E#remote login module 1

6506-E-dfc1#show platform hardware acl lou
Dumping h/w lou values

Index  lou_mux_sel  A_Opcode  A_Value  B_Opcode  B_Value
-----  -----------  --------  -------  --------  -------
    0            7       NEQ        0       NEQ        0
    1            0       NEQ        0       NEQ        0
    2            0       NEQ        0       NEQ        0
....


6506-E-dfc1#show platform hardware acl tcam
----------------------------------------------------------------------------------------------------------------------------------------
ENTRY TYPE:
A - ARP   I - IPv4   M - MPLS   O - MAC Entry   S - IPv6(Six)   C - Compaction    L - L2V4
Suffix: D - dynamic entry  E - exception entry R - reserved entry

FIELDS:
FS  - first_seen/from_rp   ACOS  - acos/group_id   F - ip_frag               FF  - frag_flag
DPORT - dest_port          SPORT - src_port        LM  - L2_miss             GP  - gpid_present
ETYPE - enc_etype          CEVLD - ce_vlan_valid   MM - mpls_mcast           FN - exp_from_null
IV - ip_hdr_vld            MV - mpls_valid         E_CAU - exception_cause   UK - U_key
ACO - acos                 A/R - arp_rarp          RR - req_repl             GM - global_acl_fmt_match
D-S-S-A - dest_mac_bcast, src_snd_mac_same, snd_tar_mac_same, arp_rarp_vld   OM - ofe_mode
SVLAN - Src_vlan
----------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------------------------------
I     INDEX  LABEL FS ACOS    AS           IP_SA           IP_DA  L4OP F FF L4PROT  DPORT  SPORT               RSLT                  CNT
S     INDEX  LABEL FS ACOS    AS      IPV6_SA_ID      IPV6_DA_ID  L4OP F FF L4PROT  DPORT  SPORT               RSLT                  CNT
C     INDEX                                                                            IPV6_ADDR               RSLT                  CNT
O     INDEX  LABEL FS ACOS    LM GP ETYPE          SRC_MAC         DEST_MAC CEVLAN CEVLD DSCP_ID               RSLT                  CNT
A     INDEX  LABEL     A/R  RR           IP_SA           IP_DA         SRC_MAC  D-S-S-A GM LM OM               RSLT                  CNT
M     INDEX  LABEL    ACOS GP FS MM EXP_0 FN MPLS_LABEL EXP STK IV      L4OP MV E_CAU UK LM  ACO               RSLT                  CNT
L     INDEX          SPORT          MAC_ADDR         IP_ADDR  IS1Q COS1Q SVLAN CE1Q CEQOS CEVLAN               RSLT                  CNT
------------------------------------------------------------------------------------------------------------------------------------------

 TCAM A bank 0 entries
IR V      1   6139  0    0     0      127.0.0.16      127.0.0.17 0x000 0  0      2     67     68 0x007FE00FFF7ED7DB                    1type     = 0


! Also see: show platform hardware cef tcam utilization detail
! Also see: show platform hardware cef tcam keys count
6506-E#show platform hardware cef tcam shadow detail

Util summary for Pool 0: 524288 keys, 1024 segs, 36 Mb

 Type  KeyUse   SegKeyUse  SegUse  Util  Free
 0     333453   333453     682     95    337
 1     20       20         3       1     337
 2     4        4          2       0     336
 3     0        0          0       0     252
 4     0        0          0       0     336
 Tot   333477   333477     687     94    337

Util summary for Pool 1: 524288 keys, 1024 segs, 36 Mb

 Type  KeyUse   SegKeyUse  SegUse  Util  Free
 0     263148   263148     672     76    352
 1     0        0          0       0     352
 2     0        0          0       0     352
 3     0        0          0       0     264
 4     0        0          0       0     352
 Tot   263148   263148     672     76    352

Util summary for Pool 4: 1048576 keys, 2048 segs, 72 Mb

 Type  KeyUse   SegKeyUse  SegUse  Util  Free
 0     596601   596601     1354    86    689
 1     20       20         3       1     689
 2     4        4          2       0     688
 3     0        0          0       0     516
 4     0        0          0       0     688
 Tot   596625   596625     1359    85    689


Key counts per Class per Pool:
                             Pri   Pri
  Class          ID   Pool   >=    <=    Key-Cnt
  IPv4           0    *      0     68    596599
    .            .    0      0     42    333451
    .            .    1      42    68    263148
  IPv4-Mcast     1    *      0     68    9
    .            .    0      0     68    9
    .            .    1      68    68    0
  MPLS           2    *      0     17    1
    .            .    0      0     17    1
    .            .    1      17    17    0
  EOMPLS         3    *      0     19    1
    .            .    0      0     19    1
    .            .    1      19    19    0
  MPLS-VPN       4    *      0     9     0
    .            .    0      0     9     0
    .            .    1      9     9     0
  Diags          5    *      0     5     0
    .            .    0      0     5     0
    .            .    1      5     5     0
  IPv6-Local     6    *      0     390   0
    .            .    0      0     390   0
    .            .    1      390   390   0
  IPv6-Mcast     7    *      0     261   1
    .            .    0      0     261   1
    .            .    1      261   261   0
  IPv6-Global    8    *      0     390   1
    .            .    0      0     390   1
    .            .    1      390   390   0
  VPLSv4-Mcast   9    *      0     69    0
    .            .    0      0     69    0
    .            .    1      69    69    0
  VPLSv6-Mcast   10   *      0     261   0
    .            .    0      0     261   0
    .            .    1      261   261   0

Key counts per Pri:
  Class          ID   Pool   Pri   Key-Cnt
  IPv4           0    0      34    120
    .            .    .      35    4
    .            .    .      36    52
    .            .    .      37    7
    .            .    .      38    9
    .            .    .      39    19
    .            .    .      40    4
    .            .    .      41    1
    .            .    .      42    333235
  IPv4           0    1      42    7
    .            .    .      43    58696
    .            .    .      44    66610
    .            .    .      45    38914
    .            .    .      46    37051
    .            .    .      47    24661
    .            .    .      48    12666
    .            .    .      49    7680
    .            .    .      50    13081
    .            .    .      51    1768
    .            .    .      52    1051
    .            .    .      53    514
    .            .    .      54    269
    .            .    .      55    101
    .            .    .      56    35
    .            .    .      57    13
    .            .    .      58    22
    .            .    .      62    6
    .            .    .      67    3
  IPv4-Mcast     1    0      3     6
    .            .    .      10    3
  MPLS           2    0      17    1
  EOMPLS         3    0      19    1
  IPv6-Mcast     7    0      115   1
  IPv6-Global    8    0      390   1

Priorities in exception:
  Class          ID   Pri (>=)   Max   Key-Cnt   Pri-Cnt
  IPv4           0    68         68    0
  IPv4-Mcast     1    68         68    0
  MPLS           2    17         17    0
  EOMPLS         3    19         19    0
  MPLS-VPN       4    9          9     0
  Diags          5    5          5     0
  IPv6-Local     6    390        390   0
  IPv6-Mcast     7    261        261   0
  IPv6-Global    8    390        390   0
  VPLSv4-Mcast   9    69         69    0
  VPLSv6-Mcast   10   261        261   0

 

In the output below it can be seen that the Cisco IOS is actively trying to optimise the ACLs and ACEs before they are programmed into the TCAM (for example since eight Value fields can be assigned to one Mask). In the first TCAM output below it can be seen the entire ACL has been squashed to just one "permit ip any any" style rule because all states are permit rules followed by a "permit ip any any" which has the same effect as if the preceding statements weren’t present. In the second set of TCAM outputs, where some explicit deny statements have been added before the "permit ip any any" statement it can been seen that only these deny statements are programmed into the TCAM, the other permit statements are not requirements because of the explicit "permit" at the end.

interface TenGigabitEthernet1/1
 ip address x.x.x.x 255.255.255.252
 ip access-group core-ipv4-iacl-in in
 ip access-group core-ipv4-iacl-out out
end


7606#show run | sec access
ip access-list extended core-ipv4-iacl-out
 permit ip 10.0.0.0 0.255.255.255 any
 permit ip 172.0.0.0 0.255.255.255 any
 permit ip 192.0.0.0 0.255.255.255 any
 permit ip any any
 
 
7606#show tcam int ten1/1 acl out ip detail 

* Global Defaults shared

-------------------------------------------------------------------------------------------------------------------
DPort - Destination Port   SPort - Source Port        TCP-F - U -URG             Pro   - Protocol         
I     - Inverted LOU       TOS   - TOS Value                - A -ACK             rtr   - Router           
MRFM  - M -MPLS Packet     TN    - T -Tcp Control           - P -PSH             COD   - C -Bank Care Flag
      - R -Recirc. Flag          - N -Non-cachable          - R -RST                   - I -OrdIndep. Flag
      - F -Fragment Flag   CAP   - Capture Flag             - S -SYN                   - D -Dynamic Flag  
      - M -More Fragments  F-P   - FlowMask-Prior.          - F -FIN             T     - V(Value)/M(Mask)/R(Result)
X     - XTAG               (*)   - Bank Priority      
-------------------------------------------------------------------------------------------------------------------
          
Interface: 1017   label: 1   lookup_type: 3
protocol: IP   packet-type: 0

+-+-----+---------------+---------------+---------------+---------------+-------+---+----+-+---+--+---+---+
|T|Index|  Dest Ip Addr | Source Ip Addr|     DPort     |     SPort     | TCP-F |Pro|MRFM|X|TOS|TN|COD|F-P|
+-+-----+---------------+---------------+---------------+---------------+-------+---+----+-+---+--+---+---+
 V 17823         0.0.0.0         0.0.0.0       P=0             P=0        ------   0 ---- 0   0 -- C-- 0-0   
 M 17828         0.0.0.0         0.0.0.0         0               0        ------   0 ---- 0   0              
 R rslt: PERMIT_RESULT (*)             rtr_rslt: PERMIT_RESULT (*)                   hit_cnt=0   

 V 18400         0.0.0.0         0.0.0.0       P=0             P=0        ------   0 ---- 0   0 -- --- 0-0   
 M 18404         0.0.0.0         0.0.0.0         0               0        ------   0 ---- 0   0              
 R rslt: L3_DENY_RESULT                rtr_rslt: L3_DENY_RESULT                      hit_cnt=0   

 V 36832         0.0.0.0         0.0.0.0       P=0             P=0        ------   0 ---- 0   0 -- --- 0-0   
 M 36836         0.0.0.0         0.0.0.0         0               0        ------   0 ---- 0   0              
 R rslt: L3_DENY_RESULT (*)            rtr_rslt: L3_DENY_RESULT (*)                  hit_cnt=1   


 


7606(config)#ip access-list extended core-ipv4-iacl-out
7606(config-ext-nacl)#5 deny ip any 1.1.1.1 0.0.0.0
7606(config-ext-nacl)#6 deny ip host 1.1.1.1 any 
7606(config-ext-nacl)#^Z
7606#show tcam int ten1/1 acl out ip detail 

* Global Defaults shared

-------------------------------------------------------------------------------------------------------------------
DPort - Destination Port   SPort - Source Port        TCP-F - U -URG             Pro   - Protocol         
I     - Inverted LOU       TOS   - TOS Value                - A -ACK             rtr   - Router           
MRFM  - M -MPLS Packet     TN    - T -Tcp Control           - P -PSH             COD   - C -Bank Care Flag
      - R -Recirc. Flag          - N -Non-cachable          - R -RST                   - I -OrdIndep. Flag
      - F -Fragment Flag   CAP   - Capture Flag             - S -SYN                   - D -Dynamic Flag  
      - M -More Fragments  F-P   - FlowMask-Prior.          - F -FIN             T     - V(Value)/M(Mask)/R(Result)
X     - XTAG               (*)   - Bank Priority      
-------------------------------------------------------------------------------------------------------------------
       
Interface: 1017   label: 1   lookup_type: 3
protocol: IP   packet-type: 0

+-+-----+---------------+---------------+---------------+---------------+-------+---+----+-+---+--+---+---+
|T|Index|  Dest Ip Addr | Source Ip Addr|     DPort     |     SPort     | TCP-F |Pro|MRFM|X|TOS|TN|COD|F-P|
 V 17809         0.0.0.0         1.1.1.1       P=0             P=0        ------   0 ---- 0   0 -- C-- 0-0   
 M 17810         0.0.0.0 255.255.255.255         0               0        ------   0 ---- 0   0              
 R rslt: L3_DENY_RESULT (*)            rtr_rslt: L3_DENY_RESULT (*)                  hit_cnt=0   

 V 17818         1.1.1.1         0.0.0.0       P=0             P=0        ------   0 ---- 0   0 -- C-- 0-0   
 M 17819 255.255.255.255         0.0.0.0         0               0        ------   0 ---- 0   0              
 R rslt: L3_DENY_RESULT (*)            rtr_rslt: L3_DENY_RESULT (*)                  hit_cnt=0   

 V 17823         0.0.0.0         0.0.0.0       P=0             P=0        ------   0 ---- 0   0 -- C-- 0-0   
 M 17828         0.0.0.0         0.0.0.0         0               0        ------   0 ---- 0   0              
 R rslt: PERMIT_RESULT (*)             rtr_rslt: PERMIT_RESULT (*)                   hit_cnt=0   

 V 18400         0.0.0.0         0.0.0.0       P=0             P=0        ------   0 ---- 0   0 -- --- 0-0   
 M 18404         0.0.0.0         0.0.0.0         0               0        ------   0 ---- 0   0              
 R rslt: L3_DENY_RESULT                rtr_rslt: L3_DENY_RESULT                      hit_cnt=0   

 V 36832         0.0.0.0         0.0.0.0       P=0             P=0        ------   0 ---- 0   0 -- --- 0-0   
 M 36836         0.0.0.0         0.0.0.0         0               0        ------   0 ---- 0   0              
 R rslt: L3_DENY_RESULT (*)            rtr_rslt: L3_DENY_RESULT (*)                  hit_cnt=1   

 

Cisco EARL (TCAM)

For Cisco the EARL (Encoded Address Resolution Logic) chipsets are one of their chipsets that provide the TCAM features (as well as other features like MAC learning and port relationships). In a Nexus 7K its the EARL8 chipset and in the SUP2T moduels for 6500s. In 6500/7600 using SUP720/RSP720 modules its the EARL7. The DFC modules for line cards in the 6500/7600 range include an EARL7. 7600 ES+ line cards use EARL 7.5 which is compatible with the RSP720. The Sup6-E (used in Cat4500 chassis) uses what Cisco call the "TCAM4" chipset, this is also used in ASR1000 series routers and the ME3600X and ME3800X switches. The 12000 series routers use the Wahoo TCAM module.

 

Task Optimised Processors (TOPs)

The following is from the book Network Processors - Architecture, Programming, and Implementation by Ran Giladi. The book is specifically about the EZChip Technologies Ltd. range of network processors however TOPs are quite general across many manufacturers including Cisco and Juniper who both use EZChip ASICs:

There are four types of such Task-Optimized Processors (TOP engines) employed to perform the four main tasks of packet processing, that is, parsing, searching, resolving, and modifying. Each type of TOP processor employs architecture with a customized, function specific data path, functional units, and instruction set that supports the minimum number of machine cycles required for complex seven layer packet processing.

Each TOP processor is based on a four-stage instruction pipeline that effectively results in an instruction executed every clock cycle (1 IPC). Although the TOP engines are specific for each stage, they share a common instruction set that requires only minor adjustments for operating the specific functional units and capabilities of each of the TOPs. Since there are multiple instances of the TOPs in each stage of the pipeline, the overall architecture resembles super-scalar architecture of a high degree, with the degree depending on the stage (the number of TOPs at each stage).

There are two stages in the NP pipeline that contain multiple instances of embedded search engines, also based on the TOPs architecture. These are used for lookups in a combination of tables with potentially millions of entries for implementing diverse applications in layer 2 to 4 switching and routing and layer 5 to 7 deep packet processing. Three types of lookup tables can be used by the NP: direct access tables, hash tables, and trees—each flexibly defined and used for various applications. Tables may be used for forwarding and routing, flow classification, access control, and so on. Numerous tables of each type can be defined, stored in embedded memory and/or external memory, and searched through for each packet. The key size, result size (i.e., associated data), and number of entries are all user-defined per table. Variable length entries, longest match as well as first match and random wildcards are supported.

As mentioned before, there are four types of TOP engines, each tailored to perform its respective function in the various stages of packet processing, that is, parsing, lookup and classify, forwarding and decisions, and packet modifications. TOPparse identifies and extracts various packet headers, tags, addresses, ports, protocols, fields, patterns, and keywords throughout the frame. TOPparse can parse packets of any format, encapsulation method, proprietary tags, and so on. TOPsearch uses the parsed fields as keys for performing lookups in the relevant routing, classification, and policy tables. TOPresolve makes forwarding and QoS decisions, and updates tables and session state information. TOPsearchII optionally performs additional lookups after the TOPresolve decisions have been made. TOPmodify modifies packet contents, and performs overwrite, add or insert operations anywhere in the packet. The architecture of the NP is shown in Figure 10.2.

Multiple instances of each TOP type at the same pipeline stage enable simultaneous processing of multiple packets.

*End of EZChip book extract*.

Below are outputs showing utilistion stats for some parts of the Typhoon NPU on an ASR9001, there is more info on ASR9000 NP architecture and TOPs here: /index.php?page=asr9000-series-hardware-overview#np-architecture

! IOS-XE 5.3.3 on ASR9001

RP/0/RSP0/CPU0:abr1#show controllers np ports all

                Node: 0/0/CPU0:
----------------------------------------------------------------

NP Bridge Fia                       Ports
-- ------ --- ---------------------------------------------------
0  0      0   GigabitEthernet0/0/0/0 - GigabitEthernet0/0/0/19
0  0      0   TenGigE0/0/2/0, TenGigE0/0/2/1
1  1      1   TenGigE0/0/1/0 - TenGigE0/0/1/3
1  1      1   TenGigE0/0/2/2, TenGigE0/0/2/3

RP/0/RSP0/CPU0:abr1#run

# attach 0/0/CPU0

attach: Starting session 1 to node 0/0/CPU0

# np_perf -e1 -f
Fabric interface average utilization

             RX PPS       RX gbps      TX PPS       TX gbps
----------------------------------------------------------------
ILKN 0                n/a          n/a          n/a          n/a
ILKN 1              54541       0.2832       106665       0.7267
ILKN 2              49498       0.4054        49517       0.2011

# np_perf -e1 -l
NP engine load

parse    resolve  modify   pps rate
-------------------------------------
      0%       0%       0%     236899

NP1 search thread utilization summary

Group  Free   Usage
--------------------
     0     23     4%
     1     23     4%
     2     23     4%
     3     23     4%
     4     23     4%
     5     23     4%
     6     23     4%
     7     23     4%

Total Free average: 184
Total Utilization:  4%

# np_perf -e0 -s
TOPsearch engine schmoo

Groups     Packets/s
---------------------
 8             388581
 7             376864
 6             386964
 5             362936
 4             362799
 3             388922
 2             377396
 1             369282

TOPsearch context schmoo

Contexts   Packets/s
---------------------
24             367906
23             376623
22             419413
21             418647
20             430571
19             431915
18             381925
17             394736
16             386776
15             389609
14             407298
13             398723
12             375517
11             371825
10             343673
 9             358955
 8             343795
 7             360151
 6             373494
 5             362459
 4             356575
 3             373847
 2             348161
 1             418218

TOPresolve engine schmoo

Engines    Packets/s
---------------------
32             393119
31             353874
30             417807
29             365656
28             419051
27             388641
26             373994
25             394134
24             326924
23             370288
22             360729
21             369059
20             354754
19             355619
18             410171
17             371541
16             358454
15             391014
14             389342
13             355862
12             334330
11             367828
10             340210
 9             342984
 8             340151
 7             330883
 6             310596
 5             341946
 4             354317
 3             385271
 2             372225
 1             369239

TOPmodify engine schmoo

Engines    Packets/s
---------------------
32             333307
31             333798
30             345171
29             350981
28             338295
27             327597
26             360886
25             346606
24             334926
23             326030
22             330583
21             313526
20             348284
19             341778
18             335951
17             321818
16             343373
15             371739
14             353275
13             375070
12             355199
11             355713
10             360545
 9             359599
 8             351959
 7             345543
 6             337246
 5             348807
 4             354029
 3             383284
 2             381524
 1             375551

TOPparse engine schmoo

Engines    Packets/s
---------------------
32             365697
31             384796
30             351258
29             356521
28             344785
27             374156
26             335971
25             321786
24             313367
23             346504
22             321895
21             336797
20             323037
19             318640
18             327259
17             341193
16             353686
15             354764
14             396907
13             367175
12             348639
11             349437
10             357534
 9             359265
 8             364936
 7             411514
 6             383595
 5             355361
 4             354243
 3             377395
 2             383685
 1             395822

# np_perf -e0 -R
RFD per-port usage and flow-control average time

                         FC    RFD Used    Guaranteed   |                           FC    RFD Used    Guaranteed
Interface                Pct   Avg   Peak  Avg   Peak   |  Interface                Pct   Avg   Peak  Avg   Peak
--------------------------------------------------------+--------------------------------------------------------
NetIF  0                     0     0    18     0    19  |  NetIF  1                     0     0     0     0     0
NetIF  2                     0     2    25     2    25  |  NetIF  3                     0     0     0     0     0
NetIF  4                     0     0     0     0     0  |  NetIF  5                     0     0     0     0     0
NetIF  6                     0     0     0     0     0  |  NetIF  7                     0     0     0     0     0
TenGigE0_0_2_0               0     0    11     0     8  |  TenGigE0_0_2_1               0     2    32     2    29
SGMII  0                     0     0     2     0     0  |  SGMII  1                     0     0     0     0     0
SGMII  2                     0     0     0     0     0  |  SGMII  3                     0     0     0     0     0
SGMII  4                     0     0    12     0     0  |  SGMII  5                     0     0     2     0     0
GigabitEthernet0_0_0_0       0     0     0     0     0  |  GigabitEthernet0_0_0_1       0     0     0     0     0
GigabitEthernet0_0_0_2       0     0     0     0     0  |  GigabitEthernet0_0_0_3       0     0     0     0     0
GigabitEthernet0_0_0_8       0     0     9     0     0  |  GigabitEthernet0_0_0_9       0     0     0     0     0
SGMII 12                     0     0     0     0     0  |  SGMII 13                     0     0     0     0     0
GigabitEthernet0_0_0_10      0     0     0     0     0  |  GigabitEthernet0_0_0_11      0     0     0     0     0
GigabitEthernet0_0_0_12      0     0     0     0     0  |  GigabitEthernet0_0_0_13      0     0     0     0     0
GigabitEthernet0_0_0_18      0     0     0     0     0  |  GigabitEthernet0_0_0_19      0     0     0     0     0
SGMII 20                     0     0     0     0     0  |  SGMII 21                     0     0     0     0     0
SGMII 22                     0     0     0     0     0  |  SGMII 23                     0     0     0     0     0
SGMII 24                     0     0     0     0     0  |  SGMII 25                     0     0     0     0     0
SGMII 26                     0     0     0     0     0  |  SGMII 27                     0     0     0     0     0
SGMII 28                     0     0     0     0     0  |  SGMII 29                     0     0     0     0     0
GigabitEthernet0_0_0_4       0     0     0     0     0  |  GigabitEthernet0_0_0_5       0     0     0     0     0
GigabitEthernet0_0_0_6       0     0     0     0     0  |  GigabitEthernet0_0_0_7       0     0     0     0     0
SGMII 34                     0     0     0     0     0  |  SGMII 35                     0     0     0     0     0
SGMII 36                     0     0     0     0     0  |  SGMII 37                     0     0     0     0     0
GigabitEthernet0_0_0_14      0     0     0     0     0  |  GigabitEthernet0_0_0_15      0     0     0     0     0
GigabitEthernet0_0_0_16      0     0     0     0     0  |  GigabitEthernet0_0_0_17      0     0     0     0     0
SGMII 42                     0     0     0     0     0  |  SGMII 43                     0     0     0     0     0
SGMII 44                     0     0     0     0     0  |  SGMII 45                     0     0     0     0     0
SGMII 46                     0     0     0     0     0  |  SGMII 47                     0     0     0     0     0
HSGMII 0                     0     0     0     0     0  |  HSGMII 1                     0     0     0     0     0
Host  HP                     0     0     0     0     0  |  Host  LP                     0     0     0     0     0
CPU 0 HP                     0     0     0     0     0  |  CPU 0 LP                     0     0     0     0     0
CPU 1 HP                     0     0     0     0     0  |  CPU 1 LP                     0     0     0     0     0
Lpbk   0                     0     0     0     0     0  |  Lpbk   1                     0     0     0     0     0

Internal Frame Memory Bank Utilization

Bank           Avg RFD Usage  Peak RFD Usage
--------------------------------------------
RFD Bank 0                 72             86
RFD Bank 1                 65             82
RFD Bank 2                 78             88
RFD Bank 3                 74             90

ICFD/OCFD Queue fullness and peak level

Queue          Avg Level      Peak Level
--------------------------------------------
ICFDQ                       0              2
OCFDQ                       0              2

 

The following diagram which shows what happens when a TOP inside an Cisco ASR9000 NPU experiances a lock-up, shows the TOPs design and their tasks:

 

The following slides show the TOPs for the 7600 ES+ line cards:

 


Previous page: Encoding Schemes
Next page: PHY, MII and MAC