Date created: Tuesday, November 21, 2023 3:40:20 PM. Last modified: Sunday, December 3, 2023 4:45:28 PM

VXLAN Recap

References:

https://www.rfc-editor.org/rfc/rfc7348 - Virtual eXtensible Local Area Network (VXLAN): A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks

 

Pure VXLAN

VXLAN is a Layer 2 
  overlay scheme on a Layer 3 network.  Each overlay is termed a VXLAN
   segment.  Only VMs within the same VXLAN segment can communicate with
   each other.  Each VXLAN segment is identified through a 24-bit
   segment ID, termed the "VXLAN Network Identifier (VNI)".  This allows
   up to 16 M VXLAN segments to coexist within the same administrative
   domain.

   The VNI identifies the scope of the inner MAC frame originated by the
   individual VM.  Thus, you could have overlapping MAC addresses across
   segments but never have traffic "cross over" since the traffic is
   isolated using the VNI.  The VNI is in an outer header that
   encapsulates the inner MAC frame originated by the VM.  In the
   following sections, the term "VXLAN segment" is used interchangeably
   with the term "VXLAN overlay network".

   Due to this encapsulation, VXLAN could also be called a tunneling
   scheme to overlay Layer 2 networks on top of Layer 3 networks.  The
   tunnels are stateless, so each frame is encapsulated according to a
   set of rules.  The end point of the tunnel (VXLAN Tunnel End Point or
   VTEP) discussed in the following sections is located within the
   hypervisor on the server that hosts the VM.  Thus, the VNI- and
   VXLAN-related tunnel / outer header encapsulation are known only to
   the VTEP -- the VM never sees it (see Figure 1).  Note that it is
   possible that VTEPs could also be on a physical switch or physical
   server and could be implemented in software or hardware.  One use
   case where the VTEP is a physical switch is discussed in Section 6 on
   VXLAN deployment scenarios.

   The following sections discuss typical traffic flow scenarios in a
   VXLAN environment using one type of control scheme -- data plane
   learning.  Here, the association of VM's MAC to VTEP's IP address is
   discovered via source-address learning.  Multicast is used for
   carrying unknown destination, broadcast, and multicast frames.

   In addition to a learning-based control plane, there are other
   schemes possible for the distribution of the VTEP IP to VM MAC
   mapping information.


   Consider a VM within a VXLAN overlay network.  This VM is unaware of
   VXLAN.  To communicate with a VM on a different host, it sends a MAC
   frame destined to the target as normal.  The VTEP on the physical
   host looks up the VNI to which this VM is associated.  It then
   determines if the destination MAC is on the same segment and if there
   is a mapping of the destination MAC address to the remote VTEP.  If
   so, an outer header comprising an outer MAC, outer IP header, and
   VXLAN header (see Figure 1 in Section 5 for frame format) are
   prepended to the original MAC frame.  The encapsulated packet is
   forwarded towards the remote VTEP.  Upon reception, the remote VTEP
   verifies the validity of the VNI and whether or not there is a VM on
   that VNI using a MAC address that matches the inner destination MAC
   address.  If so, the packet is stripped of its encapsulating headers
   and passed on to the destination VM.  The destination VM never knows
   about the VNI or that the frame was transported with a VXLAN
   encapsulation.

   In addition to forwarding the packet to the destination VM, the
   remote VTEP learns the mapping from inner source MAC to outer source
   IP address.  It stores this mapping in a table so that when the
   destination VM sends a response packet, there is no need for an
   "unknown destination" flooding of the response packet.

   Determining the MAC address of the destination VM prior to the
   transmission by the source VM is performed as with non-VXLAN
   environments except as described in Section 4.2.  Broadcast frames
   are used but are encapsulated within a multicast packet, as detailed
   in the Section 4.2.
   Consider the VM on the source host attempting to communicate with the
   destination VM using IP.  Assuming that they are both on the same
   subnet, the VM sends out an Address Resolution Protocol (ARP)
   broadcast frame.  In the non-VXLAN environment, this frame would be
   sent out using MAC broadcast across all switches carrying that VLAN.

   With VXLAN, a header including the VXLAN VNI is inserted at the
   beginning of the packet along with the IP header and UDP header.
   However, this broadcast packet is sent out to the IP multicast group
   on which that VXLAN overlay network is realized.

   To effect this, we need to have a mapping between the VXLAN VNI and
   the IP multicast group that it will use.  This mapping is done at the
   management layer and provided to the individual VTEPs through a
   management channel.  Using this mapping, the VTEP can provide IGMP
   membership reports to the upstream switch/router to join/leave the
   VXLAN-related IP multicast groups as needed.  This will enable
   pruning of the leaf nodes for specific multicast traffic addresses
   based on whether a member is available on this host using the
   specific multicast address (see [RFC4541]).  In addition, use of
   multicast routing protocols like Protocol Independent Multicast -
   Sparse Mode (PIM-SM see [RFC4601]) will provide efficient multicast
   trees within the Layer 3 network.

   The VTEP will use (*,G) joins.  This is needed as the set of VXLAN
   tunnel sources is unknown and may change often, as the VMs come up /
   go down across different hosts.  A side note here is that since each
   VTEP can act as both the source and destination for multicast
   packets, a protocol like bidirectional PIM (BIDIR-PIM -- see
   [RFC5015]) would be more efficient.

   The destination VM sends a standard ARP response using IP unicast.
   This frame will be encapsulated back to the VTEP connecting the
   originating VM using IP unicast VXLAN encapsulation.  This is
   possible since the mapping of the ARP response's destination MAC to
   the VXLAN tunnel end point IP was learned earlier through the ARP
   request.

   Note that multicast frames and "unknown MAC destination" frames are
   also sent using the multicast tree, similar to the broadcast frames.

 


Previous page: Pseudowires (PWE3)
Next page: RPKI Recap