Problem Background / Current State of Affairs

Problem Background / Current State of Affairs:
N.B: The area of concern here is when a RR computes the BGP best paths using the IGP metric to the next-hop node, all other parts of the BGP best path selection algorithm are assumed to be equal (e.g. two full table Internet transit feeds without any modification to MED, local preference, AS-Path etc.).

  • When using route-reflectors by default only the single BGP best path is sent to a route-reflector client.

  • A few centrally located In-band RRs might have a better view of the network IGP than a few centrally located out-of-band RRs. These in-band RRs might advertise more optimal best paths (as in hot-potato routing) but it is still unlikely to be the best path for all RR clients.

  • Placing many RRs in to the network such as at the PoP or aggregation levels (in or out of band) will increase the likelihood that a topologically nearby client receives an optimal BGP best path but this is not guaranteed for all clients.

  • Placing a small number of RR into the core provides sub-optimal routing. Placing RRs into all or most PoPs doesn't physical scale well and is costly. A middle ground is to place RRs at key ingress/egress points in the network so that those RRs advertise the BGP best path using the ingress/egress nodes they are topologically adjacent to as the next hop. If the strategic ingress/egress traffic location(s) for the network changes the RRs need moving which is very inefficient.

  • This idea can be logically implemented using MPSL-TE Autoroute Announce or IGP Shortcut LSPs between LERs/PEs and the RR to create the illusion that a RR node is topologically closer to a strategic exit node than it actually is. This will result in the RR calculating its BGP best paths using that strategic exit node. If the strategic exit node location should change, the LSPs can be moved/reconfigured to the new exit node. This requires a lot of overhead at scale though, and still doesn't guarantee that all RR clients receive optimal routes.

  • In the case of VPNv4/VPNv6 (AFI 1 / 2 with SAFI 128) using unique route distinguishers for two different paths to the same remote prefix gives the impression to the RR's that these are two different routes. This means that the RRs will not compared these two paths using the BGP best path selection algorithm and both paths are sent to the RR clients which can then make a local BGP best path decision.

  • In the case of IPv4/IPv6 unicast (AFI 1 / 2 with SAFI 1) and IPv4/IPv6 labelled-unicast including 6PE (AFI 1 / 2 with SAFI 4) route distinguishers aren't available so the RRs will compare paths and select a single BGP best path by default to advertise to RR clients.

  • RRs could be configured to send all routes to all RR clients however this doesn't scale well and adds a significant amount of state to edge nodes.

  • BGP Add-Path can be used to send N number of paths to RR clients however there is still no guarantee that any of those multiple paths being sent are the most optimal path for every RR client.


All of the considerations above are independent of weather the RRs are physical or virtual however virtual RRs are assumed to be out of band RRs. As virtual RRs become cheaper, quicker to deploy and more scalable than using physical routers it is important that any solutions works for out of band RRs.

Another point of note is that not only do RRs have a different IGP view of the network than the RR clients, the clients maybe also have local route policies applied that varies by PE type/function/PoP/area/ region etc so even with the same IGP view the RRs still might chose sup-optimal paths for some RR clients.

This draft proposes a change only on the RR so that no client software updates or configuration changes are required, the clients don't need to support BGP Add-Paths nor need enough memory to carry all routes inside the iBGP domain.

An operator must first enable the IGP on a RR to share its link state database with BGP, because each IGP node in an ISIS or OSPF domain carries the full LSDB the SPT can be created from the view/perspective of any node in the IGP domain locally. Then an operator can configure a virtual group on the RR and 1 to N peers can be assigned to that group. The RR will compute the BGP best path for all peers in that virtual group using the same IGP perspective/view.

"...Implementations considered compliant with this document allow the configuration of a logical location from which the best path will be computed, on the basis of either a peer, a peer group, or an entire routing instance...".

"...service providers may configure IGP based optimal route reflection or policy based optimal route reflection. It is also possible to configure both approaches together. In cases where both are configured together, policy based optimal route reflection will be applied first to select the candidate paths, then IGP based optimal route reflection will be applied on top of the candidate paths to select the final path to advertise to the client.".

"...With IGP based optimal route reflection, even though the virtual IGP location could be specified on a per route reflector basis or per peer/update group basis or per peer basis, in reality, it's most likely to be specified per peer/update group basis. All clients with the same or similar IGP location can be grouped into the same peer/update group. A virtual IGP location is then specified for the peer/update group. The virtual location is usually specified as the location of one of the clients from the peer group or an ABR to the area where clients are located...".

It is expected that using BGP-ORR on a RR would increase CPU usage for the RR because for any BGP UPDATE it processes it would need to first calculate the SPT from the perspective of a virtual IGP location, then perform policy based path selection (if any), and then IGP path based selection. This process then needs to then be repeated for each virtual IGP location configured. However it is expected that optimisation like partial and incremental SPF may be used to reduce the SPT calculation overhead on the RR.


Previous page: Autonegotiation & MDIX
Next page: Interop MTUs