Date created: Monday, March 22, 2021 9:56:57 AM. Last modified: Sunday, December 3, 2023 4:31:33 PM

Routing and Forwarding Information

Throughout this document, any references to IS-IS refer to Integrated IS-IS and any references to OSPF refer to both OSPFv2 and OSPFv3 unless explicitly stated otherwise.


Forwarding Decision Making Information

Efficient and reliable IP routing requires forwarding decision making information be successfully distributed and [at least partially] synchronised between routing devices within the topology. There are multiple types of information that need to be distributed, the two most common are topology information and address information, with address information typically being broken down into two subtypes; address availability and address reachability. In all modern day networks that address information takes the form of an IPv4 or IPv6 prefix. In the context of this document then, the three types for forwarding decision making information are thus network topology, prefix availability and prefix reachability.


In classical IP routing which uses destination address based forwarding (although source addressed based routing is of course possible for IP), information must be shared amongst all devices because routing decisions are distributed. IGPs such as IS-IS, OSPF, EIGRP etc. all expect to have the complete destination routing and topology information flooded and synchronised throughout the IGP domain. This allows each device to independently compute the same result as each other device, for a given destination IP prefix, which in turn allows a source router to know the path across the network traffic will take to a given destination address.


In more contemporary routing/forwarding scenarios using MPLS, which uses next-hop address information instead of classical IP’s ultimate source or ultimate destination address information, not all network devices make the same forwarding decision even when given the same forwarding decision making information. Note that here a next-hop “address” refers to an MPLS label. This is partially because labels are locally significant. It is aso because newer technologies like segment routing place the routing decision on each source device; intermediary devices along the path towards the destination forward as instructed by the source node regardless of the intermediary device's preferred path to the destination (assuming they even have a path!). SR switches MPLS from next-hop address based forwarding to source address based forwarding paradigm (here source “address” means an SR SID).


Taking this a step further, with centralised SR-TE controllers using BGP-LS and PCEP, the forwarding decision is further abstracted and placed into a centralised controller, and all devices forward based on the instructions received from the centralised controller. Taking this to the extreme, each node needs only to know how to forward traffic to it’s directly attached neighbours and no further.


In all cases though, the same fundamental types of information are required (topology, address availability, and address reachability) and the same problem of synchronising the information amongst decision makers and forwarders (regardless of whether they are the same or separate devices) still exists.


 

Three Information Types

Topology Information

Topology information [in a typical underlay network] is transmitted between devices to build up the physical connectivity graph. In an IGP for example, this will be the node-link graph, in BGP this is the ASN graph.


Address Availability

IP prefix (address) availability information is transmitted between devices to inform receivers about an IP prefix being available i.e., a transmitting router informs receivers “I have connectivity to 192.0.2.0/24”. The more routers that advertise this same prefix, the greater the availability of that prefix. Equally with MPLS, label (address) availability information can be communicated. The FEC 192.0.2.0/24 via next-hop X over interface Y may or may not have an MPLS label bound to it. The more label advertisements received for the same FEC the greater the availability of that FEC.


Address Reachability

IP prefix (address) reachability information is transmitted between devices, in addition to prefix availability information, to inform receivers how that prefix is reachable over the given graph topology i.e., “I have connectivity to 192.0.2.0/24, via a 1Gbps connection”. Equally MPLS label (address) reachability information allows for the reachability of a FEC to be communicated “192.0.2.0/24 is via label 1234 and there is 800Mbps of bandwidth available over this LSP”.


Information Types in Practice

Some routing protocols merge these different types of information together resulting in certain attributes being implicitly inferred e.g., a BGP UPDATE might state that the announcing ASN 64501 is directly connected link to another ASN 64502 (topology information), it has access to prefixes 192.0.2.0/128 and 192.0.2.128/128 (prefix availability), and that the MED for this link is 100 (prefix reachability). In this case the MED implies that the cost to reach both prefixes is 100 via ASN 64501. In the case of a typical eBGP transit peering session, BGP doesn’t reasonably allow the transit customer to provide a MED value per-prefix for the customers own routing space (unless they’re a lunatic with individual per-prefix routing policies). Adversely, an OSPF type 1 LSA can provide a “cost” per-prefix for each network/link included in the LSA.

To make this clearer, consider an extreme example from the following extract which is discussing the use of multi-topology IS-IS for dual-stack IPv4 and IPv6 routing (taken from https://www.juniper.net/documentation/en_US/junos/topics/concept/isis-topologies.html):

"You can configure IS-IS to calculate an alternate IPv6 unicast topology, in addition to the normal IPv4 unicast topology… [this is because] the IS-IS interface metrics for the IPv4 topology can be configured independently of the IPv6 metrics." - this means prefix reachability information can be different for IPv4 and IPv6 prefixes, despite using the same underlying topology information.

"You can also selectively disable interfaces from participating in the IPv6 topology while continuing to participate in the IPv4 topology. This enables you to exercise control over the paths that unicast data takes through a network." - This means that IPv4 or IPv6 can be excluded from certain links or nodes in the network graph, to have separate logical topologies over the same physical topology (reminder: just because you can, doesn’t mean you should!).

"A topology is the set of joined nodes. IS-IS evaluates all the paths in a single topology for each IS-IS level and uses the shortest-path-first (SPF) algorithm to determine the best path among all the feasible paths. Topology discovery and SPF calculation is performed in a protocol-neutral fashion because it is done at Layer 2 of the OSI model. If you load the topology with reachability information for a certain protocol (for example, IP), the assumption is that the circuits that are supposed to provide reachability between routing devices can carry the protocol. The SPF algorithm has a per-link orientation, not a per-address family or per-protocol orientation.

Multitopology routing enables you to override this default behavior by enabling a per-address family, per-protocol SPF calculation...The multitopology extensions alter existing type, length, and value (TLV) tuples by adding a topology ID. Each routing device in a given topology maintains its adjacencies and runs a per-topology SPF calculation".

The text above is explaining that multi-topology IS-IS allows for two [or more] different address families such as IPv4 and IPv6, using same physical network of links and devices, to each build a different graph model of the physical network topology, and for the prefixes advertised by each protocol to have reachability attributes unique to them, and potentially be unaffected by any topology changes in the physical network, if those changes aren’t included in the per-protocol topology.

 

Reducing or Removing Information

Examples of Information Removal and Reduction

Address and topology information can be reduced or they can be completely removed. One example of this occurs when summarising contiguous IP prefixes into a summary route announcement, and not advertising the constituent prefixes. The summary route is providing address availability information to a different IP prefix and the constituent prefixes are no longer available. This reduction in address availability information also includes a reduction in address reachability information because the constituent IP prefixes may have slightly varying reachability attributes.


An example of completely removing information occurs when distributing between routing protocols, and not all of the source protocol information is translatable to the destination protocol e.g., there is no equivalent of the BGP MED attribute in IS-IS, and there is no clean method to translate the AS path of a prefix in BGP into a set of nodes and links within an IS-IS graph topology. Intra-AS topology information is completely removed when redistributing an IGP into iBGP, inter-AS topology information is completely removed when redistributing from eBGP into IGP*, and inter-AS topology information is also completely removed when stripping private ASNs from an outbound eBGP announcement into the DFZ. (* another “just because you can, doesn’t mean you should).


Advantages and Disadvantages of Removing or Reducing Information

There can be advantages to reducing information or completely removing it. Route summarisation is an example of a reduction of address availability information, which can have the benefits of reduced route scale saving device memory, reducing IGP flooding/BGP UPDATE time, and reducing convergence time. It can also reduce state amplification issues like a flapping prefix; this is an example of a positive feedback loop, which might trigger an undesirable reconvergence event each time a summary route constituent prefix is advertised or withdrawn.


When removing or reducing address reachability information, again using route summarisation as an example, an ASBR device can prevent metric or route attribute changes from being flooded or propagated, or path changes when a more preferred path becomes available for an existing constituent prefix of a summary route.


There are also disadvantages to reducing or removing information. BGP route reflectors hide address reachability information by default. Without ORR or BGP Add-Path a RR only advertises what it considers to be the single best path. This reduction in address availability information reduces the receivers resilience against network outages, it may not be the most optimal path for the receiver across the IGP/underlay, it can prevent the receiver from implementing features like fast reroute, and it prevents the receiver from making its own forwarding decisions.


Another common source of the disadvantages induced by reducing or removing forwarding decision making information occurs when routing redistribution is used, and redistribution loops form when mutual protocol redistribution is configured. This occurs for example when BGP and IS-IS are being redistributed into each other without additional safeguards, which need to be explicitly configured. The result is that BGP prefixes redistributed into IS-IS will be redistributed back into BGP and then back into IS-IS ad infinitum.


Independent Information Removal

Topology information reduction and address [availability and reachability] information reduction can be implemented independently of each other. For example a BGP ASBR can announce an aggregate IP prefix (address availability information) which can include AS_SETs as the AS_PATH segments within the AS_PATH attribute (topology information). This will aggregate multiple contiguous IP prefixes into a single IP prefix, reducing address availability information, without losing any topology information thus preventing routing loops, but the final summary route only provides one set of address reachability values e.g., a single MED value for the aggregate IP prefix. This final result is a reduction in address availability information (a single aggregate IP prefix), a reducing in reachability information (one MED value now covers all constituent IP prefixes), but no reduction in topology information (all constituent route AS_PATHs maintained as AS_SETs within the aggregation route AS_PATH).


An OSPF ABR provides an example of the reverse scenario; reducing topology information but without reducing address availability or address reachability information. An OSPF ASBR may perform no route summarisation at the area boundary and advertise all prefixes it has into the neighbouring area, meaning there is no reduction in address availability or reachability, but because the prefixes are advertised as type 3 LSAs they all appear as leafs in the IGP graph directly connected to the ASBR, reducing topology information only. Type 3 LSAs are called Summary LSAs, but it is topology information they summarise not address information.


Adding or Increasing Information

Information isn’t only removed, it can also be added. An OSPF ABR which receives a type 1 LSA from an OSPF ASBR will generate a type 4 LSA, and flood that type 4 LSA into other areas. This allows routers in other areas to learn about the OSPF ASBR, this is an example of adding topology information (the ASBR is now a leaf attached to the ABR in the OSPF graph). This happens because type 1 LSAs aren’t normally flooded between areas (topology information hiding), which means that routers in other areas wouldn’t normally learn about the presence of the OSPF ASBR. The OSPF ASBR can now generate type 5 and type 7 LSAs (address availability and reachability information) and flood those LSAs towards the ABR. The ABR can in turn flood these LSAs into other areas, and other receiving routers will have both the topology information (the type 4 LSA) and address information (the type 5 and type 7 LSAs) required in order to make forwarding decisions.


An example of adding address information instead of topology information, is the use of unique route distinguishers for L3VPNs, which increases address availability. A LAN may have two connections to a common L3VPN, using the two L3VPN gateway devices (a LAN with dual CPEs connecting to different PEs), and each L3VPN PE router might use unique RDs for the same L3VPN/VRF (PE1 creates the L3 VPN route 10.0.0.1:64501:192.0.2.0/24 and PE2 creates the L3VPN route 10.0.0.2:64501:192.0.2.0/24). This implies that both connections to the same shared LAN prefix are actually two connections to two unique LANs, even though they may have exactly the same reachability information and topology information. The availability information could be increased like this to allow for fast failover for example.

 


Forwarding Information in Protocol Implementations

Distance Vector and Path Vector

RIP is a distance vector routing protocol and doesn’t carry topology information in routing updates, only address reachability and address availability i.e., “I can reach 192.0.2.0/24 in 4 hops”. This lack of topology information is what leads to the count to infinity problem (assuming no split horizon or route poisoning for example). BGP is technically a path-vector protocol although, effectively it is a distance vector protocol like RIP because AS_PATH length is right near the top of the BGP path selection process. Unlike RIP though, BGP does carry some topology information to compare paths (the NLRI AS_PATH attribute) and this is what prevents routing loops and the count to infinity problem from occurring in BGP.


The topology information in BGP is fairly “vague” though, which means that a BGP receiver can suffer from BGP path hunting before locally withdrawing an external route. This is because the first withdrawal message to be received (and all subsequent withdraws) don’t explicitly contain the full topology between the receiver node and the withdrawn address origin node; it’s not clear to the BGP receiver that all these withdrawal messages relate to the same remote address and there are no alternative paths, which would prevent the path hunting.


This lack of detail within the BGP topology information also prevents BGP from supporting a FRR LFA feature which IGPs can support. There are vendor specific implementations of this feature, such as Cisco’s BGP PIC-Edge which enables a BGP receiver to install the next-best-path to a destination address learned via BGP, into the FIB, as a FRR backup path. In this case the IGP path to the BGP next-hop for the BGP next-best-path is known to be loop free but, externally beyond the local AS border, the BGP path could be pointing to the same failed device as the primary BGP path. For DC fabrics which wouldn’t typically have primary and backup paths (as provided by FRR LFA) but multiple ECMP paths instead, BGP has been extended to support fast-rehash. But in this case detailed topology information isn’t required. ECMP is implemented on the local BGP node to the directly connected neighbour.


Link-State

When compared to OSPF and IS-IS though, the topology information carried in these IGPs is much more detailed than in BGP. Despite the availability of this detailed topology information, it isn’t used for loop avoidance in these IGPs. IGP routing domains are expected to be much more densely meshed than BGP routing domains, and implicitly contain many topology loops for increased resiliency and capacity. OSPF and IS-IS carry rich address reachability data and this reachability data is used to calculate a shortest (most preferred) path tree for intra-area routing. Only the calculated shortest path is used for forwarding between two endpoints (or a subset of paths in the case of ECMP) within the same area.


Each IGP node builds a shortest path tree from itself to all other IGP nodes using only the best path to each node, to build a tree, which is a loop free. All other paths are implicitly “costed out” by the IGP, avoiding any loops. For example, given a 1Gbps link and 10Gbps link between the same pair of routers, a basic physical topology loop is created but, in a default setup where link speed is the metric, the 1Gbps link won’t be present in the calculated SPF tree.


For inter-area routing both IGPs implement the concept of a backbone area to implement split-horizon flooding of topology and address information. At the inter-area level both IGPs fall back to simple distance-vector based forwarding paradigm, because area-level loops are forbidden, and information between areas is summarised and/or reduced to improve scale.


Micro-loops do of course occur in IGPs, no routing protocol is perfect, time must elapse in order for routing updates to be exchanged and processed when network failures occur; but IGPs do have the required data to examine the alternative path(s) and calculate a new loop-free path towards the destination (rebuilt it’s local tree), or decide that the destination is unreachable, which BGP can not do.


Comparing Link-State and Path-Vector Protocols

Link-State IGPs like OSPF and IS-IS can compare the detailed reachability information carried in flooded updates to implement ECMP. One of the BGP reachability data points is MED but by default, BGP doesn’t implement ECMP, and even when explicitly instructed to enable ECNMP, BGP doesn’t compare eBGP paths for ECMP if they have different MED values or come from different ASNs. Vendors have extended BGP, for example Cisco’s “bgp deterministic-med” and “bgp always-compare-med” commands can be used to implement ECMP across paths from different ASNs with the same AS path length or multiple paths from the same ASN. Unlike an IGP though, BGP can’t be sure that these multiple paths aren’t actually converging through the same eBGP neighbour.


The IGP process of allowing each node to have the full database of topology and address information is what allows each node to build a full topology tree. BGP creates “information boundaries” (either at the eBGP/ASBR node or the iBGP/RR node) which prevent nodes from building a full view of the network. This happens because BGP needs the add-path extension in order to advertise more than just the single best path to a neighbour, whereas IGPs send all information to all nodes (by default).


For both BGP and IGPs though, all reachability data points carried in the routing updates (such as AS-PATH or MED for BGP, or link metric/cost in OSPF and IS-IS) are proxy statistics for inferring the most optimal path. The IGP metric may be link speed but latency may be more important, or vice versa. In BGP, neither MED nor AS-PATH provide any info on the size of a link, operators hope that their transit providers have bigger links than them!


Not only are BGP and IGPs carrying different granularities of topology, availability and reachability information, and not only are they limited by their respective data sets in different ways, but the way the information is altered also differs.


For BGP route-maps or policy-maps are typically used to modify NLRI attributes (topology and address reachability data) such as LOCAL-AS, AS_PATH, LOCAL_PREF, MED, etc. Prefix lists are typically used in BGP to modify address availability data. Depending on the NOS, route-maps (or policy-maps) and prefix lists can be combined or they can be applied independently to the same BGP neighbour session. For IGPs prefix-lists are typically used if address availability data needs to be modified e.g., Junos has no way of suppressing p2p link-nets by default unlike Arista and Cisco, which can both limit IGP advertisements to loopback IPs (passive interface IPs) using builtin commands. However IGP implementations don’t normally offer a method to alter topology or address reachability information through route-maps or policy maps; any link which runs the IGP should be implicitly included in the LSDB. Once notable exception is route-redistribution but this isn’t native IGP operation, this is injecting 3rd party data and supplementing it with the data required to make it appear more “native” in the LSDB.


Previous page: Sysgrep (Syslog Alerting)
Next page: Cisco AVPairs