Date created: Wednesday, September 18, 2024 5:54:46 PM. Last modified: Tuesday, October 15, 2024 11:39:28 AM

7280R3 - Jericho2 ASIC Drops

References

https://www.arista.com/en/support/toi/eos-4-25-2f/14703-r-series-drop-voq-monitoring
https://www.arista.com/en/support/toi/eos-4-22-0f/14215-voq-delete-monitoring
https://arista.my.site.com/AristaCommunity/s/article/troubleshooting-dequeue-deletes-on-7280-7500-devices
https://arista.my.site.com/AristaCommunity/s/question/0D52I00007Ja85tSAB/questions-about-maxquesizemaxbufferquesize-with-voq
https://arista.my.site.com/AristaCommunity/s/article/troubleshooting-egress-queue-drops-on-7280-7500-devices

 

Drop Counters

Packets are dropped for one of three reasons:

  • Adverse drops: packets drops because something about the device isn't working properly
  • Congestion: device is working but not enough capacity
  • Packet processor: device is working, and has capacity but a higher level issues exists like no route, back packet checksum etc.

 

Drops by the ASIC can be seen with "show hardware counter drop":

lab#show hardware counter drop 
Summary:
Total Adverse (A) Drops: 0
Total Congestion (C) Drops: 0
Total Packet Processor (P) Drops: 2663
Type  Chip         CounterName                    :           Count : First Occurrence    : Last Occurrence     
--------------------------------------------------------------------------------------------------------------
P     Fap0         dropVoqInNullRoute             :            2097 : 2024-09-18 14:36:37 : 2024-09-18 15:53:43


lab#show hardware counter drop rates Type Chip CounterName Count 1-Min 10-Min 1-Hour 1-Day 1-Week ------------------------------------------------------------------------------------------------------------------- P Fap0 dropVoqInNullRoute 2186 55 622 1188 2186 2186

 

Sources of Drops

When packets are dropped by the ASIC they are wrapped in a customer header which has the original packet and the drop reason, and punted to the CPU.

The CPU punted packets can then be captured with tcpdump to see which packets are being dropped and why.

Note that not 100% of packets can be captured by tcpdump if the dropped pps rate is higher than the CPU can handle.

This commands show the number of drops for any reason sent to the control-plane:

lab#bash fab dump | grep rxdrop_voq
rxdrop_voq 4219101
It should be noted that we only sent a sample of drop packets to the host supervisor, it is not
expected that every drop packet will be visible. This is to avoid throttling the PCIe bus for CPU-
bound traffic. For reference, the tail-drop thresholds used for drop VOQs under this feature are
128KB and 1000 buffers. Drop VOQs are also rate limited to 80KBps.
This limitation does not affect the count by command show hardware counter drop.

Also note that packets dropped for the following reasons aren't punted to the CPU for tcpdump'ing:

  • dropVoqInMcastEmptyMcid
  • dropVoqInAcl
  • dropVoqInMcastNoCpu
  • dropVoqInLagDiscarding

 

The CPU punted packets related to ASIC drops have the Ethernet set to 0x1044.

The only way to know which interface to tcpump on is to run tcpdump on every interface. One way to speed this up is to loop over all interfaces:

lab#bash for intf in $(ip l | grep -E et[0-9]+_[0-9] -o | sort | uniq); do echo "intf: $intf"; timeout 3 tcpdump -c 1 -i $intf ether proto 0x1044; done

Another way to find the right interface is to tcpdump on all interfaces looking for proto 0x002a, see which interface the packets come from, then listen for 0x1044 on that specific interface:

lab#bash tcpdump -e -i any -c 5 ether proto 0x002a
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
18:33:43.749384 et14  In  ifindex 43 3c:fd:fe:11:22:33 (oui Unknown) ethertype ETH_P_ARISTA (0x002a), length 140: 
	0x0000:  0001 0800 4500 0054 3987 4000 4001 5b8b  ....E..T9.@.@.[.
	0x0010:  0ac8 c802 0ac8 c804 0800 0b4a 7d58 2893  ...........J}X(.
	0x0020:  071d eb66 0000 0000 8a73 0b00 0000 0000  ...f.....s......
	0x0030:  1011 1213 1415 1617 1819 1a1b 1c1d 1e1f  ................
	0x0040:  2021 2223 2425 2627 2829 2a2b 2c2d 2e2f  .!"#$%&'()*+,-./
	0x0050:  3031 3233 3435 3637 8100 0000 0000 0000  01234567........
	0x0060:  0000 0000 0000 000f ef10 0100 0000 0000  ................
	0x0070:  4a00 0001 0000 0101                      J.......

^C

r2-lab2-de#bash tcpdump -i et14 ether proto 0x1044
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on et14, link-type EN10MB (Ethernet), snapshot length 262144 bytes
18:35:16.933369 3c:fd:fe:a9:33:b0 (oui Unknown) > e8:ae:c5:29:39:79 (oui Unknown), ethertype Arista Drop VOQ Monitoring (0x1044), length 134: vlan 1, p 0, ethertype IPv4, 10.200.200.2 > 10.200.200.4: ICMP echo request, id 32088, seq 10478, length 64
	Drop VOQ trailer:
		VOQ: inNullRoute (VOQID 74)
		out_fap_port: 0
		outlif: 0
		lif_outlif: 0
		eei_outlif: 0
		fwd_sys_vsi: 4079
		inlif_orientation: 16
		traffic_class: 1
		ftmh_dp: 0
		dscp: 0
		cpucode: 0
		fwd_code: 1
		fwd_hdr_offset: 0
		eei_type: 0
		dscp_rewrite: 1
		dsp_ext_present: 1

Above the drop reason is a null route, as shown by the output from "show hardware counter drop", meaning the packet causing the drop counter increase has been found.

 

Save the following Wireshark filter in "~/.local/lib/wireshark/plugins/": https://github.com/mpergament/voqmonitor

Local mirror: arista-voq-dissector.lua

Then one can decode the custom Arista header directly in Wireshark:

ssh -q labrouter "bash tcpdump -s 0 -U -n -w - -i et14 'ether proto 0x1044' 2>/dev/null" | wireshark -k -i -

 

One can filter for the drop reason by setting the tcpdump filter to check the drop vode value, in the Arista trailer. Packets which have been dropped and punted are truncated. The maximum size packet is 176 bytes including the 32 byte trailer. The example below filters for the null route drop code 74 (0x4a). If the packet was not truncated (because it was smaller than 144 bytes) then the off-set 168 will be incorrect:

bash tcpdump -i et4_1 -c 1 -s 0 -nlASXev ether proto 0x1044 and ether[168] == 0x4a

 

The meaning of some of the drop counters are documented at the following URL, however most of them are undocumented (TAC confirmed this): https://www.arista.com/en/support/toi/eos-4-15-3f/13754-drop-counters

 

Drop Counter Meaning

dropVoqInNullRoute: Packet is dropped because it matched a route to the null destination.

dropVoqInAcl: Packet is dropped by an ingress ACL.

dropVoqInIpv4ChecksumError: Packet is dropped due to IPv4 checksum error.

DeqDeletePktCnt: DeqDelete drops indicate stale packets in VOQ, i.e, packets which have been in the VOQ for more than 500ms without getting credits. These packets are deleted, and the drops appear as DeqDeletePktCnt

EpniAlignerError: This error counter does not appear to be tied to a single cause. There are potentially multiple triggers that could hit this error counter and these triggers could be specific to a configuration or action.
https://www.arista.com/en/support/software-bug-portal/bugdetail?bug_id=905726 EpniAlignerError hardware drop counter can sometimes increment with no adverse effect.
https://www.arista.com/en/support/software-bug-portal/bugdetail?bug_id=875777 Packets that violate the MTU will cause the counter EpniAlignerError to increment. There is no impact on traffic.
https://www.arista.com/en/support/software-bug-portal/bugdetail?bug_id=849796 The split horizon packet filtering mechanism in the hardware could lead to spurious increments of the EpniAlignerError counter. This mechanism is used when features such EVPN VXLAN L2 DCI or OISM is configured. These are not adverse drops and can be ignored.
https://www.arista.com/en/support/software-bug-portal/bugdetail?bug_id=806749 When the 'show forwarding destination' command is ran the EpniAlignerError counter will be incremented.

 


Previous page: Traffic Policy Match Statements
Next page: ethxmit