Date created: Tuesday, June 27, 2023 5:01:19 PM. Last modified: Sunday, December 3, 2023 4:35:47 PM

EVPN with VXLAN and NVGRE

References:

https://www.rfc-editor.org/rfc/rfc7348 - Virtual eXtensible Local Area Network (VXLAN): A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks
https://www.rfc-editor.org/rfc/rfc7432 - BGP MPLS-Based Ethernet VPN
https://www.rfc-editor.org/rfc/rfc7637 - NVGRE: Network Virtualization Using Generic Routing Encapsulation
https://www.rfc-editor.org/rfc/rfc8365 - A Network Virtualization Overlay Solution Using Ethernet VPN (EVPN)

 

See Also

EVPN with MPLS
VXLAN Recap

 

EVPN Forwarding Plane

NVGRE Forwarding

From https://www.rfc-editor.org/rfc/rfc7637#section-3.2

3.2.  NVGRE Frame Format

   Outer Ethernet Header:
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                (Outer) Destination MAC Address                |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |(Outer)Destination MAC Address |  (Outer)Source MAC Address    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                  (Outer) Source MAC Address                   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |Optional Ethertype=C-Tag 802.1Q| Outer VLAN Tag Information    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       Ethertype 0x0800        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Outer IPv4 Header:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |Version|  HL   |Type of Service|          Total Length         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Identification        |Flags|      Fragment Offset    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Time to Live | Protocol 0x2F |         Header Checksum       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      (Outer) Source Address                   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                  (Outer) Destination Address                  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   GRE Header:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0| |1|0|   Reserved0     | Ver |   Protocol Type 0x6558        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |               Virtual Subnet ID (VSID)        |    FlowID     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Inner Ethernet Header:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                (Inner) Destination MAC Address                |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |(Inner)Destination MAC Address |  (Inner)Source MAC Address    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                  (Inner) Source MAC Address                   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       Ethertype 0x0800        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Inner IPv4 Header:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |Version|  HL   |Type of Service|          Total Length         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Identification        |Flags|      Fragment Offset    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Time to Live |    Protocol   |         Header Checksum       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       Source Address                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Destination Address                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Options                    |    Padding    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      Original IP Payload                      |
   |                                                               |
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

               Figure 1: GRE Encapsulation Frame Format
…

   In the GRE header:

   o  The C (Checksum Present) and S (Sequence Number Present) bits in
      the GRE header MUST be zero.

   o  The K (Key Present) bit in the GRE header MUST be set to one.  The
      32-bit Key field in the GRE header is used to carry the Virtual
      Subnet ID (VSID) and the FlowID:

      -  Virtual Subnet ID (VSID): This is a 24-bit value that is used
         to identify the NVGRE-based Virtual Layer 2 Network.

      -  FlowID: This is an 8-bit value that is used to provide per-flow
         entropy for flows in the same VSID.  The FlowID MUST NOT be
         modified by transit devices.  The encapsulating NVE SHOULD
         provide as much entropy as possible in the FlowID.  If a FlowID
         is not generated, it MUST be set to all zeros.

   o  The Protocol Type field in the GRE header is set to 0x6558
      (Transparent Ethernet Bridging).

 

VXLAN Forwarding

From https://www.rfc-editor.org/rfc/rfc7348#section-5

5.  VXLAN Frame Format

Note: VXLAN can be carried over IPv4 and IPv6 but only the IPv4 head stack is shown below for brevity.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

   Outer Ethernet Header:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |             Outer Destination MAC Address                     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Outer Destination MAC Address | Outer Source MAC Address      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                Outer Source MAC Address                       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |OptnlEthtype = C-Tag 802.1Q    | Outer.VLAN Tag Information    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Ethertype = 0x0800            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Outer IPv4 Header:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |Version|  IHL  |Type of Service|          Total Length         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Identification        |Flags|      Fragment Offset    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Time to Live |Protocl=17(UDP)|   Header Checksum             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       Outer Source IPv4 Address               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                   Outer Destination IPv4 Address              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Outer UDP Header:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           Source Port         |       Dest Port = VXLAN Port  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           UDP Length          |        UDP Checksum           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   VXLAN Header:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |R|R|R|R|I|R|R|R|            Reserved                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                VXLAN Network Identifier (VNI) |   Reserved    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Inner Ethernet Header:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |             Inner Destination MAC Address                     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Inner Destination MAC Address | Inner Source MAC Address      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                Inner Source MAC Address                       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |OptnlEthtype = C-Tag 802.1Q    | Inner.VLAN Tag Information    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Payload:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Ethertype of Original Payload |                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
   |                                  Original Ethernet Payload    |
   |                                                               |
   |(Note that the original Ethernet Frame's FCS is not included)  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Frame Check Sequence:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   New FCS (Frame Check Sequence) for Outer Ethernet Frame     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

            Figure 1: VXLAN Frame Format with IPv4 Outer Header
   VXLAN Header:  This is an 8-byte field that has:

      - Flags (8 bits): where the I flag MUST be set to 1 for a valid
        VXLAN Network ID (VNI).  The other 7 bits (designated "R") are
        reserved fields and MUST be set to zero on transmission and
        ignored on receipt.

      - VXLAN Segment ID/VXLAN Network Identifier (VNI): this is a
        24-bit value used to designate the individual VXLAN overlay
        network on which the communicating VMs are situated.  VMs in
        different VXLAN overlay networks cannot communicate with each
        other.

      - Reserved fields (24 bits and 8 bits): MUST be set to zero on
        transmission and ignored on receipt.

   Outer UDP Header:  This is the outer UDP header with a source port
      provided by the VTEP and the destination port being a well-known
      UDP port.

      -  Destination Port: IANA has assigned the value 4789 for the
         VXLAN UDP port, and this value SHOULD be used by default as the
         destination UDP port.  Some early implementations of VXLAN have
         used other values for the destination port.  To enable
         interoperability with these implementations, the destination
         port SHOULD be configurable.

      -  Source Port:  It is recommended that the UDP source port number
         be calculated using a hash of fields from the inner packet --
         one example being a hash of the inner Ethernet frame's headers.
         This is to enable a level of entropy for the ECMP/load-
         balancing of the VM-to-VM traffic across the VXLAN overlay.
         When calculating the UDP source port number in this manner, it
         is RECOMMENDED that the value be in the dynamic/private port
         range 49152-65535.

      -  UDP Checksum: It SHOULD be transmitted as zero.  When a packet
         is received with a UDP checksum of zero, it MUST be accepted
         for decapsulation.  Optionally, if the encapsulating end point
         includes a non-zero UDP checksum, it MUST be correctly
         calculated across the entire packet including the IP header,
         UDP header, VXLAN header, and encapsulated MAC frame.  When a
         decapsulating end point receives a packet with a non-zero
         checksum, it MAY choose to verify the checksum value.  If it
         chooses to perform such verification, and the verification
         fails, the packet MUST be dropped.  If the decapsulating
         destination chooses not to perform the verification, or
         performs it successfully, the packet MUST be accepted for
         decapsulation.

 

EVPN Control Plane

EVPN with VXLAN

The RFC8365 Approach "A Network Virtualization Overlay Solution Using Ethernet VPN (EVPN)" – BGP Signalled EVPN with VXLAN or NVGRE Transport

    VXLAN encapsulation is based on UDP, with an 8-byte header following
   the UDP header.  VXLAN provides a 24-bit VNI, which typically
   provides a one-to-one mapping to the tenant VID, as described in
   [RFC7348].  In this scenario, the ingress VTEP does not include an
   inner VLAN tag on the encapsulated frame, and the egress VTEP
   discards the frames with an inner VLAN tag.  This mode of operation
   in [RFC7348] maps to VLAN-Based Service in [RFC7432], where a tenant
   VID gets mapped to an EVI.

   VXLAN also provides an option of including an inner VLAN tag in the
   encapsulated frame, if explicitly configured at the VTEP.  This mode
   of operation can map to VLAN Bundle Service in [RFC7432] because all
   the tenant's tagged frames map to a single bridge table / MAC-VRF,
   and the inner VLAN tag is not used for lookup by the disposition PE
   when performing VXLAN decapsulation as described in Section 6 of
   [RFC7348].

   [RFC7637] encapsulation is based on GRE encapsulation, and it
   mandates the inclusion of the optional GRE Key field, which carries
   the VSID.  There is a one-to-one mapping between the VSID and the
   tenant VID, as described in [RFC7637].  The inclusion of an inner
   VLAN tag is prohibited.  This mode of operation in [RFC7637] maps to
   VLAN Based Service in [RFC7432].

   As described in the next section, there is no change to the encoding
   of EVPN routes to support VXLAN or NVGRE encapsulation, except for
   the use of the BGP Encapsulation Extended Community to indicate the
   encapsulation type (e.g., VXLAN or NVGRE).  However, there is
   potential impact to the EVPN procedures depending on where the NVE is
   located (i.e., in hypervisor or ToR) and whether multihoming
   capabilities are required.
5.1.2. Virtual Identifiers to EVI Mapping Just like in [RFC7432], where two options existed for mapping broadcast domains (represented by VLAN IDs) to an EVI, when the EVPN control plane is used in conjunction with VXLAN (or NVGRE encapsulation), there are also two options for mapping broadcast domains represented by VXLAN VNIs (or NVGRE VSIDs) to an EVI: Option 1: A Single Broadcast Domain per EVI In this option, a single Ethernet broadcast domain (e.g., subnet) represented by a VNI is mapped to a unique EVI. This corresponds to the VLAN-Based Service in [RFC7432], where a tenant-facing interface, logical interface (e.g., represented by a VID), or physical interface gets mapped to an EVI. As such, a BGP Route Distinguisher (RD) and Route Target (RT) are needed per VNI on every NVE. The advantage of this model is that it allows the BGP RT constraint mechanisms to be used in order to limit the propagation and import of routes to only the NVEs that are interested in a given VNI. The disadvantage of this model may be the provisioning overhead if the RD and RT are not derived automatically from the VNI. In this option, the MAC-VRF table is identified by the RT in the control plane and by the VNI in the data plane. In this option, the specific MAC-VRF table corresponds to only a single bridge table. Option 2: Multiple Broadcast Domains per EVI In this option, multiple subnets, each represented by a unique VNI, are mapped to a single EVI. For example, if a tenant has multiple segments/subnets each represented by a VNI, then all the VNIs for that tenant are mapped to a single EVI; for example, the EVI in this case represents the tenant and not a subnet. This corresponds to the VLAN-aware bundle service in [RFC7432]. The advantage of this model is that it doesn't require the provisioning of an RD/RT per VNI. However, this is a moot point when compared to Option 1 where auto- derivation is used. The disadvantage of this model is that routes would be imported by NVEs that may not be interested in a given VNI. In this option, the MAC-VRF table is identified by the RT in the control plane; a specific bridge table for that MAC-VRF is identified by the <RT, Ethernet Tag ID> in the control plane. In this option, the VNI in the data plane is sufficient to identify a specific bridge table. 5.1.2.1. Auto-Derivation of RT This section describes how to auto-derive the RT from the VNI. 5.1.3. Constructing EVPN BGP Routes In EVPN, an MPLS label, for instance, identifying the forwarding table is distributed by the egress PE via the EVPN control plane and is placed in the MPLS header of a given packet by the ingress PE. This label is used upon receipt of that packet by the egress PE for disposition of that packet. This is very similar to the use of the VNI by the egress NVE, with the difference being that an MPLS label has local significance while a VNI typically has global significance. Accordingly, and specifically to support the option of locally assigned VNIs, the MPLS Label1 field in the MAC/IP Advertisement route, the MPLS label field in the Ethernet A-D per EVI route, and the MPLS label field in the P-Multicast Service Interface (PMSI) Tunnel attribute of the Inclusive Multicast Ethernet Tag (IMET) route are used to carry the VNI. For the balance of this memo, the above MPLS label fields will be referred to as the VNI field. The VNI field is used for both local and global VNIs; for either case, the entire 24-bit field is used to encode the VNI value. For the VLAN-Based Service (a single VNI per MAC-VRF), the Ethernet Tag field in the MAC/IP Advertisement, Ethernet A-D per EVI, and IMET route MUST be set to zero just as in the VLAN-Based Service in [RFC7432]. For the VLAN-Aware Bundle Service (multiple VNIs per MAC-VRF with each VNI associated with its own bridge table), the Ethernet Tag field in the MAC Advertisement, Ethernet A-D per EVI, and IMET route MUST identify a bridge table within a MAC-VRF; the set of Ethernet Tags for that EVI needs to be configured consistently on all PEs within that EVI. For locally assigned VNIs, the value advertised in the Ethernet Tag field MUST be set to a VID just as in the VLAN-aware bundle service in [RFC7432]. Such setting must be done consistently on all PE devices participating in that EVI within a given domain. For global VNIs, the value advertised in the Ethernet Tag field SHOULD be set to a VNI as long as it matches the existing semantics of the Ethernet Tag, i.e., it identifies a bridge table within a MAC-VRF and the set of VNIs are configured consistently on each PE in that EVI. In order to indicate which type of data-plane encapsulation (i.e., VXLAN, NVGRE, MPLS, or MPLS in GRE) is to be used, the BGP Encapsulation Extended Community defined in [RFC5512] is included with all EVPN routes (i.e., MAC Advertisement, Ethernet A-D per EVI, Ethernet A-D per ESI, IMET, and Ethernet Segment) advertised by an egress PE. Five new values have been assigned by IANA to extend the list of encapsulation types defined in [RFC5512]; they are listed in Section 11. The MPLS encapsulation tunnel type, listed in Section 11, is needed in order to distinguish between an advertising node that only supports non-MPLS encapsulations and one that supports MPLS and non-MPLS encapsulations. An advertising node that only supports MPLS encapsulation does not need to advertise any encapsulation tunnel types; i.e., if the BGP Encapsulation Extended Community is not present, then either MPLS encapsulation or a statically configured encapsulation is assumed. The Next Hop field of the MP_REACH_NLRI attribute of the route MUST be set to the IPv4 or IPv6 address of the NVE. The remaining fields in each route are set as per [RFC7432]. Note that the procedure defined here -- to use the MPLS Label field to carry the VNI in the presence of a Tunnel Encapsulation Extended Community specifying the use of a VNI -- is aligned with the procedures described in Section 8.2.2.2 of [TUNNEL-ENCAP] ("When a Valid VNI has not been Signaled").
6. EVPN with Multiple Data-Plane Encapsulations The use of the BGP Encapsulation Extended Community per [RFC5512] allows each NVE in a given EVI to know each of the encapsulations supported by each of the other NVEs in that EVI. ... When a PE advertises multiple supported encapsulations, it MUST advertise encapsulations that use the same EVPN procedures including procedures associated with split-horizon filtering described in Section 8.3.1. For example, VXLAN and NVGRE (or MPLS and MPLS over GRE) encapsulations use the same EVPN procedures; thus, a PE can advertise both of them and can support either of them or both of them simultaneously. However, a PE MUST NOT advertise VXLAN and MPLS encapsulations together because (a) the MPLS field of EVPN routes is set to either an MPLS label or a VNI, but not both and (b) some EVPN procedures (such as split-horizon filtering) are different for VXLAN/ NVGRE and MPLS encapsulations.
8. Multihoming NVEs - NVE Residing in ToR Switch In this section, we discuss the scenario where the NVEs reside in the ToR switches AND the servers (where VMs are residing) are multihomed to these ToR switches. The multihoming NVE operates in All-Active or Single-Active redundancy mode. ... 8.1.1. Multihomed ES Auto-Discovery EVPN NVEs (or PEs) connected to the same ES (e.g., the same server via Link Aggregation Group (LAG)) can automatically discover each other with minimal to no configuration through the exchange of BGP routes. 8.1.2. Fast Convergence and Mass Withdrawal EVPN defines a mechanism to efficiently and quickly signal, to remote NVEs, the need to update their forwarding tables upon the occurrence of a failure in connectivity to an ES (e.g., a link or a port failure). This is done by having each NVE advertise an Ethernet A-D route per ES for each locally attached segment. Upon a failure in connectivity to the attached segment, the NVE withdraws the corresponding Ethernet A-D route. This triggers all NVEs that receive the withdrawal to update their next-hop adjacencies for all MAC addresses associated with the ES in question. If no other NVE had advertised an Ethernet A-D route for the same segment, then the NVE that received the withdrawal simply invalidates the MAC entries for that segment. Otherwise, the NVE updates the next-hop adjacency list accordingly. 8.1.3. Split-Horizon If a server is multihomed to two or more NVEs (represented by an ES ES1) and operating in an All-Active redundancy mode, sends a BUM (i.e., Broadcast, Unknown unicast, or Multicast) packet to one of these NVEs, then it is important to ensure the packet is not looped back to the server via another NVE connected to this server. The filtering mechanism on the NVE to prevent such loop and packet duplication is called "split-horizon filtering".
8.1.4. Aliasing and Backup Path In the case where a station is multihomed to multiple NVEs, it is possible that only a single NVE learns a set of the MAC addresses associated with traffic transmitted by the station. This leads to a situation where remote NVEs receive MAC Advertisement routes, for these addresses, from a single NVE even though multiple NVEs are connected to the multihomed station. As a result, the remote NVEs are not able to effectively load-balance traffic among the NVEs connected to the multihomed ES. For example, this could be the case when the NVEs perform data-path learning on the access and the load- balancing function on the station hashes traffic from a given source MAC address to a single NVE. Another scenario where this occurs is when the NVEs rely on control-plane learning on the access (e.g., using ARP), since ARP traffic will be hashed to a single link in the LAG. To alleviate this issue, EVPN introduces the concept of "Aliasing". This refers to the ability of an NVE to signal that it has reachability to a given locally attached ES, even when it has learned no MAC addresses from that segment. The Ethernet A-D route per EVI is used to that end. Remote NVEs that receive MAC Advertisement routes with non-zero ESIs should consider the MAC address as reachable via all NVEs that advertise reachability to the relevant Segment using Ethernet A-D routes with the same ESI and with the Single-Active flag reset. Backup Path is a closely related function, albeit one that applies to the case where the redundancy mode is Single-Active. In this case, the NVE signals that it has reachability to a given locally attached ES using the Ethernet A-D route as well. Remote NVEs that receive the MAC Advertisement routes, with non-zero ESI, should consider the MAC address as reachable via the advertising NVE. Furthermore, the remote NVEs should install a Backup Path, for said MAC, to the NVE that had advertised reachability to the relevant segment using an Ethernet A-D route with the same ESI and with the Single-Active flag set. 8.1.5. DF Election If a host is multihomed to two or more NVEs on an ES operating in All-Active redundancy mode, then, for a given EVI, only one of these NVEs, termed the "Designated Forwarder" (DF) is responsible for sending it broadcast, multicast, and, if configured for that EVI, unknown unicast frames. This is required in order to prevent duplicate delivery of multi- destination frames to a multihomed host or VM, in case of All-Active redundancy. In NVEs where frames tagged as IEEE 802.1Q [IEEE.802.1Q] are received from hosts, the DF election should be performed based on host VIDs per Section 8.5 of [RFC7432]. Furthermore, multihoming PEs of a given ES MAY perform DF election using configured IDs such as VNI, EVI, normalized VIDs, and etc., as along the IDs are configured consistently across the multihoming PEs. In GWs where VXLAN-encapsulated frames are received, the DF election is performed on VNIs. Again, it is assumed that, for a given Ethernet segment, VNIs are unique and consistent (e.g., no duplicate VNIs exist).
8.3. Impact on EVPN Procedures Two cases need to be examined here, depending on whether the NVEs are operating in Single-Active or in All-Active redundancy mode. First, let's consider the case of Single-Active redundancy mode, where the hosts are multihomed to a set of NVEs; however, only a single NVE is active at a given point of time for a given VNI. In this case, the Aliasing is not required, and the split-horizon filtering may not be required, but other functions such as multihomed ES auto-discovery, fast convergence and mass withdrawal, Backup Path, and DF election are required. Second, let's consider the case of All-Active redundancy mode. In this case, out of all the EVPN multihoming features listed in Section 8.1, the use of the VXLAN or NVGRE encapsulation impacts the split-horizon and Aliasing features, since those two rely on the MPLS client layer. Given that this MPLS client layer is absent with these types of encapsulations, alternative procedures and mechanisms are needed to provide the required functions. Those are discussed in detail next. 8.3.1. Split Horizon In EVPN, an MPLS label is used for split-horizon filtering to support All-Active multihoming where an ingress NVE adds a label corresponding to the site of origin (aka an ESI label) when encapsulating the packet. The egress NVE checks the ESI label when attempting to forward a multi-destination frame out an interface, and if the label corresponds to the same site identifier (ESI) associated with that interface, the packet gets dropped. This prevents the occurrence of forwarding loops. Since VXLAN and NVGRE encapsulations do not include the ESI label, other means of performing the split-horizon filtering function must be devised for these encapsulations. The following approach is recommended for split-horizon filtering when VXLAN (or NVGRE) encapsulation is used. Every NVE tracks the IP address(es) associated with the other NVE(s) with which it has shared multihomed ESs. When the NVE receives a multi-destination frame from the overlay network, it examines the source IP address in the tunnel header (which corresponds to the ingress NVE) and filters out the frame on all local interfaces connected to ESs that are shared with the ingress NVE. With this approach, it is required that the ingress NVE perform replication locally to all directly attached Ethernet segments (regardless of the DF election state) for all flooded traffic ingress from the access interfaces (i.e., from the hosts). This approach is referred to as "Local Bias", and has the advantage that only a single IP address need be used per NVE for split-horizon filtering, as opposed to requiring an IP address per Ethernet segment per NVE. In order to allow proper operation of split-horizon filtering among the same group of multihoming PE devices, a mix of PE devices with MPLS over GRE encapsulations running the procedures from [RFC7432] for split-horizon filtering on the one hand and VXLAN/NVGRE encapsulation running local-bias procedures on the other on a given Ethernet segment MUST NOT be configured. 8.3.2. Aliasing and Backup Path The Aliasing and the Backup Path procedures for VXLAN/NVGRE encapsulation are very similar to the ones for MPLS. In the case of MPLS, Ethernet A-D route per EVI is used for Aliasing when the corresponding ES operates in All-Active multihoming, and the same route is used for Backup Path when the corresponding ES operates in Single-Active multihoming. In the case of VXLAN/NVGRE, the same route is used for the Aliasing and the Backup Path with the difference that the Ethernet Tag and VNI fields in Ethernet A-D per EVI route are set as described in Section 5.1.3. 8.3.3. Unknown Unicast Traffic Designation In EVPN, when an ingress PE uses ingress replication to flood unknown unicast traffic to egress PEs, the ingress PE uses a different EVPN MPLS label (from the one used for known unicast traffic) to identify such BUM traffic. The egress PEs use this label to identify such BUM traffic and, thus, apply DF filtering for All-Active multihomed sites. In absence of an unknown unicast traffic designation and in the presence of enabling unknown unicast flooding, there can be transient duplicate traffic to All-Active multihomed sites under the following condition: the host MAC address is learned by the egress PE(s) and advertised to the ingress PE; however, the MAC Advertisement has not been received or processed by the ingress PE, resulting in the host MAC address being unknown on the ingress PE but known on the egress PE(s). Therefore, when a packet destined to that host MAC address arrives on the ingress PE, it floods it via ingress replication to all the egress PE(s), and since they are known to the egress PE(s), multiple copies are sent to the All-Active multihomed site. It should be noted that such transient packet duplication only happens when a) the destination host is multihomed via All-Active redundancy mode, b) flooding of unknown unicast is enabled in the network, c) ingress replication is used, and d) traffic for the destination host is arrived on the ingress PE before it learns the host MAC address via BGP EVPN advertisement. If it is desired to avoid occurrence of such transient packet duplication (however low probability that may be), then VXLAN-GPE encapsulation needs to be used between these PEs and the ingress PE needs to set the BUM Traffic Bit (B bit) [VXLAN-GPE] to indicate that this is an ingress- replicated BUM traffic.

 


Previous page: BGP-ORR
Next page: EVPN with MPLS