EVPN with VXLAN and NVGRE

Date created: Tuesday, June 27, 2023 5:01:19 PM. Last modified: Monday, June 10, 2024 12:07:17 PM

EVPN with VXLAN and NVGRE

References:

https://www.rfc-editor.org/rfc/rfc7348 - Virtual eXtensible Local Area Network (VXLAN): A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks
https://www.rfc-editor.org/rfc/rfc7432 - BGP MPLS-Based Ethernet VPN
https://www.rfc-editor.org/rfc/rfc7637 - NVGRE: Network Virtualization Using Generic Routing Encapsulation
https://www.rfc-editor.org/rfc/rfc8365 - A Network Virtualization Overlay Solution Using Ethernet VPN (EVPN)

EVPN Forwarding Plane

NVGRE Forwarding

From https://www.rfc-editor.org/rfc/rfc7637#section-3.2

3.2.  NVGRE Frame Format

   Outer Ethernet Header:
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                (Outer) Destination MAC Address                |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |(Outer)Destination MAC Address |  (Outer)Source MAC Address    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                  (Outer) Source MAC Address                   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |Optional Ethertype=C-Tag 802.1Q| Outer VLAN Tag Information    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       Ethertype 0x0800        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Outer IPv4 Header:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |Version|  HL   |Type of Service|          Total Length         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Identification        |Flags|      Fragment Offset    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Time to Live | Protocol 0x2F |         Header Checksum       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      (Outer) Source Address                   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                  (Outer) Destination Address                  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   GRE Header:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0| |1|0|   Reserved0     | Ver |   Protocol Type 0x6558        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |               Virtual Subnet ID (VSID)        |    FlowID     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Inner Ethernet Header:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                (Inner) Destination MAC Address                |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |(Inner)Destination MAC Address |  (Inner)Source MAC Address    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                  (Inner) Source MAC Address                   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       Ethertype 0x0800        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Inner IPv4 Header:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |Version|  HL   |Type of Service|          Total Length         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Identification        |Flags|      Fragment Offset    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Time to Live |    Protocol   |         Header Checksum       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       Source Address                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Destination Address                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Options                    |    Padding    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      Original IP Payload                      |
   |                                                               |
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

               Figure 1: GRE Encapsulation Frame Format
…

   In the GRE header:

   o  The C (Checksum Present) and S (Sequence Number Present) bits in
      the GRE header MUST be zero.

   o  The K (Key Present) bit in the GRE header MUST be set to one.  The
      32-bit Key field in the GRE header is used to carry the Virtual
      Subnet ID (VSID) and the FlowID:

      -  Virtual Subnet ID (VSID): This is a 24-bit value that is used
         to identify the NVGRE-based Virtual Layer 2 Network.

      -  FlowID: This is an 8-bit value that is used to provide per-flow
         entropy for flows in the same VSID.  The FlowID MUST NOT be
         modified by transit devices.  The encapsulating NVE SHOULD
         provide as much entropy as possible in the FlowID.  If a FlowID
         is not generated, it MUST be set to all zeros.

   o  The Protocol Type field in the GRE header is set to 0x6558
      (Transparent Ethernet Bridging).

VXLAN Forwarding

From https://www.rfc-editor.org/rfc/rfc7348#section-5

5.  VXLAN Frame Format

Note: VXLAN can be carried over IPv4 and IPv6 but only the IPv4 head stack is shown below for brevity.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

   Outer Ethernet Header:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |             Outer Destination MAC Address                     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Outer Destination MAC Address | Outer Source MAC Address      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                Outer Source MAC Address                       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |OptnlEthtype = C-Tag 802.1Q    | Outer.VLAN Tag Information    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Ethertype = 0x0800            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Outer IPv4 Header:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |Version|  IHL  |Type of Service|          Total Length         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Identification        |Flags|      Fragment Offset    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Time to Live |Protocl=17(UDP)|   Header Checksum             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       Outer Source IPv4 Address               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                   Outer Destination IPv4 Address              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Outer UDP Header:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           Source Port         |       Dest Port = VXLAN Port  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           UDP Length          |        UDP Checksum           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   VXLAN Header:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |R|R|R|R|I|R|R|R|            Reserved                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                VXLAN Network Identifier (VNI) |   Reserved    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Inner Ethernet Header:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |             Inner Destination MAC Address                     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Inner Destination MAC Address | Inner Source MAC Address      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                Inner Source MAC Address                       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |OptnlEthtype = C-Tag 802.1Q    | Inner.VLAN Tag Information    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Payload:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Ethertype of Original Payload |                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
   |                                  Original Ethernet Payload    |
   |                                                               |
   |(Note that the original Ethernet Frame's FCS is not included)  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Frame Check Sequence:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   New FCS (Frame Check Sequence) for Outer Ethernet Frame     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

            Figure 1: VXLAN Frame Format with IPv4 Outer Header
   VXLAN Header:  This is an 8-byte field that has:

      - Flags (8 bits): where the I flag MUST be set to 1 for a valid
        VXLAN Network ID (VNI).  The other 7 bits (designated "R") are
        reserved fields and MUST be set to zero on transmission and
        ignored on receipt.

      - VXLAN Segment ID/VXLAN Network Identifier (VNI): this is a
        24-bit value used to designate the individual VXLAN overlay
        network on which the communicating VMs are situated.  VMs in
        different VXLAN overlay networks cannot communicate with each
        other.

      - Reserved fields (24 bits and 8 bits): MUST be set to zero on
        transmission and ignored on receipt.

   Outer UDP Header:  This is the outer UDP header with a source port
      provided by the VTEP and the destination port being a well-known
      UDP port.

      -  Destination Port: IANA has assigned the value 4789 for the
         VXLAN UDP port, and this value SHOULD be used by default as the
         destination UDP port.  Some early implementations of VXLAN have
         used other values for the destination port.  To enable
         interoperability with these implementations, the destination
         port SHOULD be configurable.

      -  Source Port:  It is recommended that the UDP source port number
         be calculated using a hash of fields from the inner packet --
         one example being a hash of the inner Ethernet frame's headers.
         This is to enable a level of entropy for the ECMP/load-
         balancing of the VM-to-VM traffic across the VXLAN overlay.
         When calculating the UDP source port number in this manner, it
         is RECOMMENDED that the value be in the dynamic/private port
         range 49152-65535.

      -  UDP Checksum: It SHOULD be transmitted as zero.  When a packet
         is received with a UDP checksum of zero, it MUST be accepted
         for decapsulation.  Optionally, if the encapsulating end point
         includes a non-zero UDP checksum, it MUST be correctly
         calculated across the entire packet including the IP header,
         UDP header, VXLAN header, and encapsulated MAC frame.  When a
         decapsulating end point receives a packet with a non-zero
         checksum, it MAY choose to verify the checksum value.  If it
         chooses to perform such verification, and the verification
         fails, the packet MUST be dropped.  If the decapsulating
         destination chooses not to perform the verification, or
         performs it successfully, the packet MUST be accepted for
         decapsulation.

EVPN Control Plane

EVPN with VXLAN

The RFC8365 Approach "A Network Virtualization Overlay Solution Using Ethernet VPN (EVPN)" – BGP Signalled EVPN with VXLAN or NVGRE Transport

    VXLAN encapsulation is based on UDP, with an 8-byte header following
   the UDP header.  VXLAN provides a 24-bit VNI, which typically
   provides a one-to-one mapping to the tenant VID, as described in
   [RFC7348].  In this scenario, the ingress VTEP does not include an
   inner VLAN tag on the encapsulated frame, and the egress VTEP
   discards the frames with an inner VLAN tag.  This mode of operation
   in [RFC7348] maps to VLAN-Based Service in [RFC7432], where a tenant
   VID gets mapped to an EVI.

   VXLAN also provides an option of including an inner VLAN tag in the
   encapsulated frame, if explicitly configured at the VTEP.  This mode
   of operation can map to VLAN Bundle Service in [RFC7432] because all
   the tenant's tagged frames map to a single bridge table / MAC-VRF,
   and the inner VLAN tag is not used for lookup by the disposition PE
   when performing VXLAN decapsulation as described in Section 6 of
   [RFC7348].

   [RFC7637] encapsulation is based on GRE encapsulation, and it
   mandates the inclusion of the optional GRE Key field, which carries
   the VSID.  There is a one-to-one mapping between the VSID and the
   tenant VID, as described in [RFC7637].  The inclusion of an inner
   VLAN tag is prohibited.  This mode of operation in [RFC7637] maps to
   VLAN Based Service in [RFC7432].

   As described in the next section, there is no change to the encoding
   of EVPN routes to support VXLAN or NVGRE encapsulation, except for
   the use of the BGP Encapsulation Extended Community to indicate the
   encapsulation type (e.g., VXLAN or NVGRE).  However, there is
   potential impact to the EVPN procedures depending on where the NVE is
   located (i.e., in hypervisor or ToR) and whether multihoming
   capabilities are required.

5.1.2.  Virtual Identifiers to EVI Mapping

   Just like in [RFC7432], where two options existed for mapping
   broadcast domains (represented by VLAN IDs) to an EVI, when the EVPN
   control plane is used in conjunction with VXLAN (or NVGRE
   encapsulation), there are also two options for mapping broadcast
   domains represented by VXLAN VNIs (or NVGRE VSIDs) to an EVI:

      Option 1: A Single Broadcast Domain per EVI

   In this option, a single Ethernet broadcast domain (e.g., subnet)
   represented by a VNI is mapped to a unique EVI.  This corresponds to
   the VLAN-Based Service in [RFC7432], where a tenant-facing interface,
   logical interface (e.g., represented by a VID), or physical interface
   gets mapped to an EVI.  As such, a BGP Route Distinguisher (RD) and
   Route Target (RT) are needed per VNI on every NVE.  The advantage of
   this model is that it allows the BGP RT constraint mechanisms to be
   used in order to limit the propagation and import of routes to only
   the NVEs that are interested in a given VNI.  The disadvantage of
   this model may be the provisioning overhead if the RD and RT are not
   derived automatically from the VNI.

   In this option, the MAC-VRF table is identified by the RT in the
   control plane and by the VNI in the data plane.  In this option, the
   specific MAC-VRF table corresponds to only a single bridge table.

      Option 2: Multiple Broadcast Domains per EVI

   In this option, multiple subnets, each represented by a unique VNI,
   are mapped to a single EVI.  For example, if a tenant has multiple
   segments/subnets each represented by a VNI, then all the VNIs for
   that tenant are mapped to a single EVI; for example, the EVI in this
   case represents the tenant and not a subnet.  This corresponds to the
   VLAN-aware bundle service in [RFC7432].  The advantage of this model
   is that it doesn't require the provisioning of an RD/RT per VNI.
   However, this is a moot point when compared to Option 1 where auto-
   derivation is used.  The disadvantage of this model is that routes
   would be imported by NVEs that may not be interested in a given VNI.

   In this option, the MAC-VRF table is identified by the RT in the
   control plane; a specific bridge table for that MAC-VRF is identified
   by the <RT, Ethernet Tag ID> in the control plane.  In this option,
   the VNI in the data plane is sufficient to identify a specific bridge
   table.

5.1.2.1.  Auto-Derivation of RT

This section describes how to auto-derive the RT from the VNI.


5.1.3.  Constructing EVPN BGP Routes

   In EVPN, an MPLS label, for instance, identifying the forwarding
   table is distributed by the egress PE via the EVPN control plane and
   is placed in the MPLS header of a given packet by the ingress PE.
   This label is used upon receipt of that packet by the egress PE for
   disposition of that packet.  This is very similar to the use of the
   VNI by the egress NVE, with the difference being that an MPLS label
   has local significance while a VNI typically has global significance.
   Accordingly, and specifically to support the option of locally
   assigned VNIs, the MPLS Label1 field in the MAC/IP Advertisement
   route, the MPLS label field in the Ethernet A-D per EVI route, and
   the MPLS label field in the P-Multicast Service Interface (PMSI)
   Tunnel attribute of the Inclusive Multicast Ethernet Tag (IMET) route
   are used to carry the VNI.  For the balance of this memo, the above
   MPLS label fields will be referred to as the VNI field.  The VNI
   field is used for both local and global VNIs; for either case, the
   entire 24-bit field is used to encode the VNI value.

   For the VLAN-Based Service (a single VNI per MAC-VRF), the Ethernet
   Tag field in the MAC/IP Advertisement, Ethernet A-D per EVI, and IMET
   route MUST be set to zero just as in the VLAN-Based Service in
   [RFC7432].

   For the VLAN-Aware Bundle Service (multiple VNIs per MAC-VRF with
   each VNI associated with its own bridge table), the Ethernet Tag
   field in the MAC Advertisement, Ethernet A-D per EVI, and IMET route
   MUST identify a bridge table within a MAC-VRF; the set of Ethernet
   Tags for that EVI needs to be configured consistently on all PEs
   within that EVI.  For locally assigned VNIs, the value advertised in
   the Ethernet Tag field MUST be set to a VID just as in the VLAN-aware
   bundle service in [RFC7432].  Such setting must be done consistently
   on all PE devices participating in that EVI within a given domain.
   For global VNIs, the value advertised in the Ethernet Tag field
   SHOULD be set to a VNI as long as it matches the existing semantics
   of the Ethernet Tag, i.e., it identifies a bridge table within a
   MAC-VRF and the set of VNIs are configured consistently on each PE in
   that EVI.

   In order to indicate which type of data-plane encapsulation (i.e.,
   VXLAN, NVGRE, MPLS, or MPLS in GRE) is to be used, the BGP
   Encapsulation Extended Community defined in [RFC5512] is included
   with all EVPN routes (i.e., MAC Advertisement, Ethernet A-D per EVI,
   Ethernet A-D per ESI, IMET, and Ethernet Segment) advertised by an
   egress PE.  Five new values have been assigned by IANA to extend the
   list of encapsulation types defined in [RFC5512]; they are listed in
   Section 11.

   The MPLS encapsulation tunnel type, listed in Section 11, is needed
   in order to distinguish between an advertising node that only
   supports non-MPLS encapsulations and one that supports MPLS and
   non-MPLS encapsulations.  An advertising node that only supports MPLS
   encapsulation does not need to advertise any encapsulation tunnel
   types; i.e., if the BGP Encapsulation Extended Community is not
   present, then either MPLS encapsulation or a statically configured
   encapsulation is assumed.

   The Next Hop field of the MP_REACH_NLRI attribute of the route MUST
   be set to the IPv4 or IPv6 address of the NVE.  The remaining fields
   in each route are set as per [RFC7432].

   Note that the procedure defined here -- to use the MPLS Label field
   to carry the VNI in the presence of a Tunnel Encapsulation Extended
   Community specifying the use of a VNI -- is aligned with the
   procedures described in Section 8.2.2.2 of [TUNNEL-ENCAP] ("When a
   Valid VNI has not been Signaled").

6.  EVPN with Multiple Data-Plane Encapsulations

   The use of the BGP Encapsulation Extended Community per [RFC5512]
   allows each NVE in a given EVI to know each of the encapsulations
   supported by each of the other NVEs in that EVI.
...
   When a PE advertises multiple supported encapsulations, it MUST
   advertise encapsulations that use the same EVPN procedures including
   procedures associated with split-horizon filtering described in
   Section 8.3.1.  For example, VXLAN and NVGRE (or MPLS and MPLS over
   GRE) encapsulations use the same EVPN procedures; thus, a PE can
   advertise both of them and can support either of them or both of them
   simultaneously.  However, a PE MUST NOT advertise VXLAN and MPLS
   encapsulations together because (a) the MPLS field of EVPN routes is
   set to either an MPLS label or a VNI, but not both and (b) some EVPN
   procedures (such as split-horizon filtering) are different for VXLAN/
   NVGRE and MPLS encapsulations.

8.  Multihoming NVEs - NVE Residing in ToR Switch
   In this section, we discuss the scenario where the NVEs reside in the
   ToR switches AND the servers (where VMs are residing) are multihomed
   to these ToR switches.  The multihoming NVE operates in All-Active or
   Single-Active redundancy mode.
...

8.1.1.  Multihomed ES Auto-Discovery

   EVPN NVEs (or PEs) connected to the same ES (e.g., the same server
   via Link Aggregation Group (LAG)) can automatically discover each
   other with minimal to no configuration through the exchange of BGP
   routes.

8.1.2.  Fast Convergence and Mass Withdrawal

   EVPN defines a mechanism to efficiently and quickly signal, to remote
   NVEs, the need to update their forwarding tables upon the occurrence
   of a failure in connectivity to an ES (e.g., a link or a port
   failure).  This is done by having each NVE advertise an Ethernet A-D
   route per ES for each locally attached segment.  Upon a failure in
   connectivity to the attached segment, the NVE withdraws the
   corresponding Ethernet A-D route.  This triggers all NVEs that
   receive the withdrawal to update their next-hop adjacencies for all
   MAC addresses associated with the ES in question.  If no other NVE
   had advertised an Ethernet A-D route for the same segment, then the

   NVE that received the withdrawal simply invalidates the MAC entries
   for that segment.  Otherwise, the NVE updates the next-hop adjacency
   list accordingly.

8.1.3.  Split-Horizon

   If a server is multihomed to two or more NVEs (represented by an ES
   ES1) and operating in an All-Active redundancy mode, sends a BUM
   (i.e., Broadcast, Unknown unicast, or Multicast) packet to one of
   these NVEs, then it is important to ensure the packet is not looped
   back to the server via another NVE connected to this server.  The
   filtering mechanism on the NVE to prevent such loop and packet
   duplication is called "split-horizon filtering".

8.1.4.  Aliasing and Backup Path

   In the case where a station is multihomed to multiple NVEs, it is
   possible that only a single NVE learns a set of the MAC addresses
   associated with traffic transmitted by the station.  This leads to a
   situation where remote NVEs receive MAC Advertisement routes, for
   these addresses, from a single NVE even though multiple NVEs are
   connected to the multihomed station.  As a result, the remote NVEs
   are not able to effectively load-balance traffic among the NVEs
   connected to the multihomed ES.  For example, this could be the case
   when the NVEs perform data-path learning on the access and the load-
   balancing function on the station hashes traffic from a given source
   MAC address to a single NVE.  Another scenario where this occurs is
   when the NVEs rely on control-plane learning on the access (e.g.,
   using ARP), since ARP traffic will be hashed to a single link in the
   LAG.

   To alleviate this issue, EVPN introduces the concept of "Aliasing".
   This refers to the ability of an NVE to signal that it has
   reachability to a given locally attached ES, even when it has learned
   no MAC addresses from that segment.  The Ethernet A-D route per EVI
   is used to that end.  Remote NVEs that receive MAC Advertisement
   routes with non-zero ESIs should consider the MAC address as
   reachable via all NVEs that advertise reachability to the relevant
   Segment using Ethernet A-D routes with the same ESI and with the
   Single-Active flag reset.

   Backup Path is a closely related function, albeit one that applies to
   the case where the redundancy mode is Single-Active.  In this case,
   the NVE signals that it has reachability to a given locally attached
   ES using the Ethernet A-D route as well.  Remote NVEs that receive
   the MAC Advertisement routes, with non-zero ESI, should consider the
   MAC address as reachable via the advertising NVE.  Furthermore, the
   remote NVEs should install a Backup Path, for said MAC, to the NVE
   that had advertised reachability to the relevant segment using an
   Ethernet A-D route with the same ESI and with the Single-Active flag
   set.

8.1.5.  DF Election

   If a host is multihomed to two or more NVEs on an ES operating in
   All-Active redundancy mode, then, for a given EVI, only one of these
   NVEs, termed the "Designated Forwarder" (DF) is responsible for
   sending it broadcast, multicast, and, if configured for that EVI,
   unknown unicast frames.

   This is required in order to prevent duplicate delivery of multi-
   destination frames to a multihomed host or VM, in case of All-Active
   redundancy.

   In NVEs where frames tagged as IEEE 802.1Q [IEEE.802.1Q] are received
   from hosts, the DF election should be performed based on host VIDs
   per Section 8.5 of [RFC7432].  Furthermore, multihoming PEs of a
   given ES MAY perform DF election using configured IDs such as VNI,
   EVI, normalized VIDs, and etc., as along the IDs are configured
   consistently across the multihoming PEs.

   In GWs where VXLAN-encapsulated frames are received, the DF election
   is performed on VNIs.  Again, it is assumed that, for a given
   Ethernet segment, VNIs are unique and consistent (e.g., no duplicate
   VNIs exist).

8.3.  Impact on EVPN Procedures

   Two cases need to be examined here, depending on whether the NVEs are
   operating in Single-Active or in All-Active redundancy mode.

   First, let's consider the case of Single-Active redundancy mode,
   where the hosts are multihomed to a set of NVEs; however, only a
   single NVE is active at a given point of time for a given VNI.  In
   this case, the Aliasing is not required, and the split-horizon
   filtering may not be required, but other functions such as multihomed
   ES auto-discovery, fast convergence and mass withdrawal, Backup Path,
   and DF election are required.

   Second, let's consider the case of All-Active redundancy mode.  In
   this case, out of all the EVPN multihoming features listed in
   Section 8.1, the use of the VXLAN or NVGRE encapsulation impacts the
   split-horizon and Aliasing features, since those two rely on the MPLS
   client layer.  Given that this MPLS client layer is absent with these
   types of encapsulations, alternative procedures and mechanisms are
   needed to provide the required functions.  Those are discussed in
   detail next.

8.3.1.  Split Horizon

   In EVPN, an MPLS label is used for split-horizon filtering to support
   All-Active multihoming where an ingress NVE adds a label
   corresponding to the site of origin (aka an ESI label) when
   encapsulating the packet.  The egress NVE checks the ESI label when
   attempting to forward a multi-destination frame out an interface, and
   if the label corresponds to the same site identifier (ESI) associated
   with that interface, the packet gets dropped.  This prevents the
   occurrence of forwarding loops.

   Since VXLAN and NVGRE encapsulations do not include the ESI label,
   other means of performing the split-horizon filtering function must
   be devised for these encapsulations.  The following approach is
   recommended for split-horizon filtering when VXLAN (or NVGRE)
   encapsulation is used.

   Every NVE tracks the IP address(es) associated with the other NVE(s)
   with which it has shared multihomed ESs.  When the NVE receives a
   multi-destination frame from the overlay network, it examines the
   source IP address in the tunnel header (which corresponds to the
   ingress NVE) and filters out the frame on all local interfaces
   connected to ESs that are shared with the ingress NVE.  With this
   approach, it is required that the ingress NVE perform replication
   locally to all directly attached Ethernet segments (regardless of the
   DF election state) for all flooded traffic ingress from the access
   interfaces (i.e., from the hosts).  This approach is referred to as
   "Local Bias", and has the advantage that only a single IP address
   need be used per NVE for split-horizon filtering, as opposed to
   requiring an IP address per Ethernet segment per NVE.

   In order to allow proper operation of split-horizon filtering among
   the same group of multihoming PE devices, a mix of PE devices with
   MPLS over GRE encapsulations running the procedures from [RFC7432]
   for split-horizon filtering on the one hand and VXLAN/NVGRE
   encapsulation running local-bias procedures on the other on a given
   Ethernet segment MUST NOT be configured.

8.3.2.  Aliasing and Backup Path

   The Aliasing and the Backup Path procedures for VXLAN/NVGRE
   encapsulation are very similar to the ones for MPLS.  In the case of
   MPLS, Ethernet A-D route per EVI is used for Aliasing when the
   corresponding ES operates in All-Active multihoming, and the same
   route is used for Backup Path when the corresponding ES operates in
   Single-Active multihoming.  In the case of VXLAN/NVGRE, the same
   route is used for the Aliasing and the Backup Path with the
   difference that the Ethernet Tag and VNI fields in Ethernet A-D per
   EVI route are set as described in Section 5.1.3.

8.3.3.  Unknown Unicast Traffic Designation

   In EVPN, when an ingress PE uses ingress replication to flood unknown
   unicast traffic to egress PEs, the ingress PE uses a different EVPN
   MPLS label (from the one used for known unicast traffic) to identify
   such BUM traffic.  The egress PEs use this label to identify such BUM
   traffic and, thus, apply DF filtering for All-Active multihomed
   sites.  In absence of an unknown unicast traffic designation and in
   the presence of enabling unknown unicast flooding, there can be
   transient duplicate traffic to All-Active multihomed sites under the
   following condition: the host MAC address is learned by the egress
   PE(s) and advertised to the ingress PE; however, the MAC
   Advertisement has not been received or processed by the ingress PE,
   resulting in the host MAC address being unknown on the ingress PE but
   known on the egress PE(s).  Therefore, when a packet destined to that
   host MAC address arrives on the ingress PE, it floods it via ingress
   replication to all the egress PE(s), and since they are known to the
   egress PE(s), multiple copies are sent to the All-Active multihomed
   site.  It should be noted that such transient packet duplication only
   happens when a) the destination host is multihomed via All-Active
   redundancy mode, b) flooding of unknown unicast is enabled in the
   network, c) ingress replication is used, and d) traffic for the
   destination host is arrived on the ingress PE before it learns the
   host MAC address via BGP EVPN advertisement.  If it is desired to
   avoid occurrence of such transient packet duplication (however low
   probability that may be), then VXLAN-GPE encapsulation needs to be
   used between these PEs and the ingress PE needs to set the BUM
   Traffic Bit (B bit) [VXLAN-GPE] to indicate that this is an ingress-
   replicated BUM traffic.

^ Top

Previous page: BGP-ORR
Next page: EVPN with MPLS

Navigation

EVPN with VXLAN and NVGRE

References:

See Also

EVPN Forwarding Plane

EVPN Control Plane