IOS to IOS-XR eBGP MTU

Below it can be seen that an IOS device and IOS-XR device have a flapping eBGP session. The session won't stay up for longer than 3 minutes (the default 3x multiplier with the default 60 seconds HELLO time).

Note that the 7606 IOS device has received 1 prefix from the IOS-XR device, an ASR9001. It is also sending many prefixes to the ASR9001, however the ASR9001 shows 0 received prefixes. Also note that the OutQ is not 0 on the 7606, it says 53:

7606#sh bgp vpnv4 unicast vrf PS summary | i Nei|64900
Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.0.0.138   4        64900       4      21 109282978    0   53 00:01:04        1

7606#sh bgp vpnv4 unicast  vrf PS neighbors 10.0.0.138 advertised-routes | i Total
Total number of prefixes 978


RP/0/RSP0/CPU0:ASR9001#show bgp vrf KPS summary | i "Nei|10.0.0.139"
Neighbor        Spk    AS MsgRcvd MsgSent   TblVer  InQ OutQ  Up/Down  St/PfxRcd 10.0.0.139 0 64901 701 1666 1220690 0 0 00:02:51 0 RP/0/RSP0/CPU0:ASR9001#show run int gi0/0/0/16 | i mtu mtu 2010 RP/0/RSP0/CPU0:ASR9001#show run int gi0/0/0/16.1630 | i mtu ! No MTU configured under sub-interface 7606#show run int gi2/11 | i mtu mtu 1996 7606#show run int gi2/11.1630 | i mtu ! No MTU configured under sub-interface

On the face of it the MTUs are the same at each side, 1996 in IOS + 14 bytes of Ethernet headers (which are included in the IOS-XR MTU configuration) == 2010.

At the IP level this configuration has taken effect:

RP/0/RSP0/CPU0:ASR9001#show int gi0/0/0/16 | i MTU
  MTU 2010 bytes, BW 1000000 Kbit (Max: 1000000 Kbit)

RP/0/RSP0/CPU0:ASR9001#show int gi0/0/0/16.1630 | i MTU
  MTU 2014 bytes, BW 1000000 Kbit (Max: 1000000 Kbit)

RP/0/RSP0/CPU0:ASR9001#show im database interface gigabitEthernet 0/0/0/16.1630 | i ipv4
  ipv4            ipv4 (up, 1996)

! Note that IOS-XR output above is adding 4 bytes for the single VLAN tagged sub-interface automatically. It can be seen that the IP MTU is 1996 on the sub-interface (2014 - 14 - 4).


7606#show int gi2/11 | i MTU
  MTU 1996 bytes, BW 1000000 Kbit/sec, DLY 10 usec,

7606#show int gi2/11.1630 | i MTU
  MTU 1996 bytes, BW 1000000 Kbit/sec, DLY 10 usec,

7606#show ip interface gi2/11.1630 | i MTU
  MTU is 1996 bytes

! Note above that IOS doesn't show the 14 bytes of Ethernet headers on the PHY interface or the additional VLAN tag 4 bytes on the sub-interface. 1996 bytes are shown in all command outputs.

The PHY and IP level MTUs seem fine. The problem must be elsewhere. Looking at the TCP MSS for this BGP session reveals an issue:

7606#show bgp vpnv4 unicast vrf PS neighbors 10.0.0.138 | i Data
Datagrams (max data segment is 1956 bytes): 


RP/0/RSP0/CPU0:ASR9001#show tcp brief | i 10.0.0.139
0x502126d8 0x6000001f      0      0  10.0.0.138:179      10.0.0.139:17906    ESTAB 

RP/0/RSP0/CPU0:ASR9001#show tcp detail pcb 0x50212b20  | i Data
Datagrams (in bytes): MSS 1240, peer MSS 1380, min MSS 1240, max MSS 1240 

The ASR9001 is using a smaller MSS than the 7606. The "show bgp ... summary" commands at the top of this page show that prefixes are being sent from the ASR9001 and received by the 7606 (it is only sending one prefix), but the 980~ routes from the 7606 are not being received by the ASR9001. The MSS outputs directly above show that the ASR9001 is using a smaller MSS than the 7606. Both devices will pack the BGP UPDATE messages to fill their MSS meaning the messages from the 7606 to ASR9001 will be too big and dropped. After 3x the HELLO interval without receiving a valid UPDATE the ASR9001 is resetting the session:

RP/0/RSP0/CPU0:Sep 21 11:33:22.229 UTC: bgp[1054]: %ROUTING-BGP-5-ADJCHANGE_DETAIL : neighbor 10.0.0.139 Up (VRF: KPS; AFI/SAFI: 1/1) (AS: 64901)
RP/0/RSP0/CPU0:Sep 21 11:36:22.282 UTC: bgp[1054]: %ROUTING-BGP-5-ADJCHANGE_DETAIL : neighbor 10.0.0.139 Down - BGP Notification sent, hold time expired (VRF: KPS; AFI/SAFI: 1/1) (AS: 64901)

One possible explanation for this mismatch between the interface MTU and MSS allocated for the BGP TCP session, on the ASR9001, is that the MTU was changed on the PHY interface for both devices after the session was originally established over the sub-interfaces, and they were never bounced/tested. At some point in after this the session or link has flapped on the 7606 side and now the issue has revealed itself.

After shutting down both sub-interfaces and ensuring any lingering TCP sessions are cleared, the sub-interfaces are enabled again:

7606#show ip interface gi2/11.1630 | i MTU
  MTU is 1996 bytes


RP/0/RSP0/CPU0:ASR9001#show im database interface gigabitEthernet 0/0/0/16.1630 | i ipv4
Thu Sep 21 09:33:22.434 UTC
  ipv4            ipv4 (up, 1996)

! The interface MTU values have not changed, which is to be expected ^


7606#show bgp vpnv4 unicast vrf PS neighbors 10.0.0.138 | i Data
Datagrams (max data segment is 1956 bytes):


RP/0/RSP0/CPU0:ASR9001#show tcp brief | i 10.0.0.139
0x502eeac0 0x6000001f      0      0  10.0.0.138:179      10.0.0.139:37177    ESTAB

RP/0/RSP0/CPU0:ASR9001#show tcp detail pcb 0x5029830c | i Data
Datagrams (in bytes): MSS 1956, peer MSS 1956, min MSS 1956, max MSS 1956

! Now both devices agree on the MSS which is correct.

Despite fixing the MSS mismatch the BGP sessions are still flapping every 3 minutes and BGP UPDATES are still only being received one-way (from the ASR9001 to the 7606). Some large pings with the DF bit set reveal the problem:

! 7606 PHY interface MTU 1996:

7606#ping vrf PS 10.0.0.138 size 1990 df-bit
Type escape sequence to abort.
Sending 5, 1990-byte ICMP Echos to 10.0.0.138, timeout is 2 seconds:
Packet sent with the DF bit set
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 4/4/4 ms


7606#ping vrf PS 10.0.0.138 size 1991 df-bit
Type escape sequence to abort.
Sending 5, 1991-byte ICMP Echos to 10.0.0.138, timeout is 2 seconds:
Packet sent with the DF bit set
.....
Success rate is 0 percent (0/5)



! ASR9001 PHY interface MTU 2010:

RP/0/RSP0/CPU0:ASR9001#ping vrf KPS 10.0.0.139 size 1990 df-bit
Type escape sequence to abort.
Sending 5, 1990-byte ICMP Echos to 10.0.0.139, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 6/59/107 ms


RP/0/RSP0/CPU0:ASR9001#ping vrf KPS 10.0.0.139 size 1991 df-bit
Type escape sequence to abort.
Sending 5, 1991-byte ICMP Echos to 10.0.0.139, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)

The ping command on IOS expects a size value up to the MTU value configured on the interface. Just as the interface MTU value is exclusive of layer 2 headers and VLAN tags, so is the ping command.

The working IOS ping output which specifies a size of 1990 bytes sends 1990 bytes (20 bytes IP + 8 bytes ICMP + 1962 ICMP payload) of layer 2 payload + 14 bytes of Ethernet headers + 4 byte VLAN tag == 2008 bytes.

The second failing ping on IOS is one byte more, 1991 bytes of layer 2 payload + 14 bytes of Ethernet headers + 4 byte VLAN tag == 2009 bytes. This fails despite being smaller than the allowed MTU of 2010 configure on the interface (1996 + 14 + 4), so the underlying carrier link must only support 2008 bytes.

In the ASR9001 output. the ping command on IOS-XR expects a size value that is less than the MTU value configured on the interface. The working ping output shows a size value of 1990 bytes + 14 bytes of Ethernet headers + 4 byte VLAN tag is 2008 on the wire.

Despite fixing the MSS mismatch on the ASR9001 side by bouncing the sub-interfaces and clearing the TCP sessions on both devices, the configured MTU size on both devices was too large for the underlying link. This means that the 7606 > ASR9001 BGP UPDATES still weren't getting through resulting in the ASR9001 flapping the session every 3 minutes (due to default timers). Even before fixing the MSS mismatch on the ASR9001 side, the BGP HELLO messages could get through using the smaller MSS and even the BGP UPDATE message from the ASR9001 was received by the 7606 because it only had 1 prefix to send. These packets were small enough to fit in the 1240 byte TCP MSS.