Multicast Recap

References: - Cisco Systems' Solution for Multicast in BGP/MPLS IP VPNs - Source-Specific Multicast for IP - BGP Encodings and Procedures for Multicast in MPLS/BGP IP VPNs

(*,G) is Any Source Multicast (ASM) mode, a join with IGMP or MLD (Multicast Listener Discovery) from the CPE is made to the PE. (S,G) is a SSM (Source Specific Multicast) mode. A join is made for SSM also using IGMP or MLD from the CPE to PE. RFC4607 reserves as the IPv4 and FF3x::/32 as the IPv6 Source Specific Multicast destination/group address range. ASM is simpler for the receiver whereas SSM is simpler for the network.

When multiple sources are transmitting to a multicast group address, if a receiver has sent an ASM join (*,G) it will receive all multicast IP streams destined to the same group address simultaneously. If a receiver/member sends a Source Specific Multicast join request (S,G) and the group address is not one of the SSM ranges reserved by RFC4607 ( or FF3x::/32) then all streams will still be received simultaneously.

Only for a SSM join request (S,G) from a receiver/member, where the multicast group address is inside the or FF3x::/32 range, would the member be prevented from receiving all the multicast flows to the same group address. This requires IGMPv3 between the receiver/member and it's attachment PE and the join request to contain a "channel" (unique S,G pair) that is inside the or FF3x::/32 range. The multicast tree is extended to that PE from the RP for only the unique (S,G) specified.

Multicast Distribution Trees (MDT) in PIM-SM (Source vs. Shared Trees)
When using PIM-SM and a multicast source starts to broadcast to a multicast destination within range, the attachment PE will install two mroutes; a (*,G) mroute and a (S,G) mroute. The (*,G) mroute will be rooted at the RP for this group (the OIL will initially be empty and the IIL will initially list the interface facing the multicast source), this is the initial forming of a shared tree. The (S,G) mroute represents the beginning of a source tree rooted at the attachment PE (the OIL will initially be empty and the IIL will initially list the interface facing the multicast source).

Initially the attachment PE of the source would have an empty Outgoing Interface List. The attachment PE of the source will forward multicast packets to the RP (in the form of PIM register messages) allowing the RP to create to mroute entries of its own. The RP will create a (S,G) mroute with the Incoming Interface List containing the interface facing the source attachment PE. This extends the source tree. A (*,G) mroute is also added with no incoming or outgoing interfaces, until a member/receiver joins. This extends the shared tree. The RP sends a register stop to the source attachment PE so that it stops sending PIM join messages. The source attachment PE at this point is dropping any incoming multicast traffic as there are no active members/receivers (it's OIL is empty).

The source tree that is being built is a non-shared MDT that will be built from source to receiver and is unique per (S,G) pair. It is a unidirectional tree from a specific sender to all receivers of this specific source.
The shared tree that is being built is a per-group (*,G) MDT which is built between all senders and receivers of this group and is rooted at the RP (rendezvous point). A shared tree could be bidirectional.
When a member/receiver sends a IGMP/MLD join to its attachment PE for a (*,G) "channel", that PE will in turn send a PIM join request to the RP. The RP will send a PIM join to the source attachment PE. The source attachment PE will update its OIL to include the interface facing the RP, the RP will updates its OIL list to include the interface facing the member/receiver attachment PE. The member/receiver attachment PE will create an mroute entry for the (*,G) route with the OIL set to the interface the receiver is attached to and the IIL containing the link to the RP. This forms a shared tree rooted at the RP (the receiver attachment PE is ignorant of the multicast source). Initially multicast data packets might route via the RP and then reconverge via the IGP shortest path between sender and receiver if the RP wasn't on the shortest path already.

When a member/receiver sends an IGMP/MLD join to it's attachment PE for a (S,G) channel (SSM must be configured/enabled), that PE will send a PIM join to along the IGP shortest path towards the sender. The OIL list will be updated on each PE along the IGP shorted path between sender and receiver and a source tree is built.

PIM-Dense Mode only supports the source tree model. PIM–Sparse Mode uses a shared tree initially and can re-converge (optimise) to a source tree.

Both source trees and shared trees are loop-free. Messages are replicated only where the tree branches.
Source trees have the advantage of creating the optimal path between the source and the receivers. This advantage guarantees the minimum amount of network latency for forwarding multicast traffic. However, this optimization comes at a cost: The routers must maintain path information for each source. In a network that has thousands of sources and thousands of groups, this overhead can quickly become a resource issue on the routers. Memory consumption from the size of the multicast routing table is a factor that network designers must take into consideration.

Shared trees have the advantage of requiring the minimum amount of state in each router. This advantage lowers the overall memory requirements for a network that only allows shared trees. The disadvantage of shared trees is that under certain circumstances the paths between the source and receivers might not be the optimal paths, which might introduce some latency in packet delivery.

GRE Full-Mesh Scaling
Simply encapsulating multicast frames inside GRE tunnels doesn't scale well. A full mesh of GRE tunnels would be required between all devices to service any possible multicast topology. This scales at a rate of (N*(N-1))/2 or O(N*N).

IGMP messages can't be sent directly between two CE devices over the service provider network which means that a full mesh of GRE tunnels between CEs would cause the CEs to fail the RPF lookup. This is because PIM doesn't exchange multicast topology information. Multicast forwarding decisions are made locally on each node using the multicast forwarding state information each node has locally available.
A multicast packet could be sent from CE1 to CE2 over a GRE tunnel built between these two devices however, the unicast path between the two devices is over the service provider network, not the GRE tunnel. Incoming packets would fail the RFP check of the source address and be dropped. This can be hacked around with static multicast routes but static routes also don't scale well.

The full-mesh of GRE tunnels would need to be implemented on the PEs which may have scaling issues if many different multicast customers connect to the same PE(s). Also the service provider must configure and maintain these tunnels, customers who manage their own CEs can't easily add or remove nodes to/from the mesh. The PEs would then need to run a multicast protocol like IGMP towards the CPE(s).
To support a full-mesh of GRE tunnels between PEs they would have to be manually configured. This means the signalling is manual which is inefficient. The forwarding would also be inefficient with a full-mesh of GRE tunnels between PEs. A multicast packet that enters a PE first needs to be encapsulate inside the relevant GRE tunnel headers. Then the multicast-inside-unicast GRE packet would need to be replicated inside the PE and each packet forwarding on to the relevant GRE destination in a unicast fashion.

A more efficient GRE forwarding method is to use multicast GRE tunnels or mGRE. This means that multicast packets are encapsulated inside a GRE tunnel with a multicast destination address and the forwarding paradigm becomes multicast over multicast. The MP-BGP SAFI 66 (address-family "MDT SAFI", RFC6037) can be used to signal the PE loopback IP and the multicast IP associated with that loopback between PE nodes. CEs can use MP-BGP SAFI 2 to signal multicast routes to the PE dynamically.
The down side to this approach is that the service provider core now has to be multicast aware to build a multicast underlay however it is much simpler than a full tunnel mesh.

The PE devices must now support VRF aware PIM. This is to differentiate between multicast routing inside the global routing table of PE devices for the underlay multicast default tree, and the overlay MDT inside a customer dedicated VRF. Assuming that protocol independent multicast sparse mode is being used, the global routing table and each multicast enabled VRF may have a different rendezvous point.

BGP AFI 1/2 SAFI 5 is the NG-MVPN IPv4/IPv6 SAFI. BGP AFI 1/2 SAFI 129 is the L3VPN IPv4/IPv6 Multicast SAFI. Both are defined in RFC6514. SAFI 5 is disseminating the multicast underlay routing information, SAFI 129 is the overlay service (per customer/VRF).

TE-Tunnels are unidirectional.
TE tunnels have to skip the RPF check.

Previous page: MPLS Label Distribution
Next page: Pseudowires (PWE3)