Date created: Friday, February 5, 2021 8:36:08 AM. Last modified: Sunday, December 3, 2023 4:25:00 PM

LNS/L2TP/PPP Scaling Issues

LNS/L2TP/PPP Scaling Issues

Contents:

 

Static vs Dynamic Subscriber IP Allocation & Route Advertisements

Providing static IP allocations to subscribers reduces flexibility within the network, due to the increase in static-IP related state overhead incurred. This state increase don’t exist when providing only dynamic IP allocations. The following subsections look at scaling considerations when providing static IPs using all-active and active/standby LNS router scenarios.

 

Scaling All-Active LNS Routers with Static Subscriber IPs

Scenario:

  • All subscribers have their static IP(s) set in their RADIUS profile.
  • Either all L2TP tunnel IPs from all of the LNS routers are present in RADIUS with an equal Tunnel-Preference, to allow for all-active load-balancing, or if using any-cast the any-cast IP(s) must be in RADIUS with all LNS routers advertising the any-cast IP with the same BGP or IGP cost.
  • Subscribers are round-robin load-balanced across all LNS routers based on the equal preference in RADIUS.
  • Each active LNS router is announcing a portion of the subscriber assigned IPs.
Advantages Disadvantages
Efficient distribution of subscribers across LNS’s by subscriber counter. Can’t distribute based on traffic load.
Loss of a single LNS router results in the lowest subscriber count increase on the remaining LNS routers and fastest reconnection time of subscribers. It’s unknown which LNS a subscriber will connect to before they connect, or which LNS they will connect to after a session flap. This means predictable behaviour and failover is unlikely.
When using RADIUS based Tunnel-Preference for load-sharing, restoration of the failed router has no impact on failed over subscriber sessions. When using any-cast based load-sharing, restoration of the failed router causes a percentage of subscribers to immediately fail-back and experience a 2nd outage.
A subscriber can move physical location (e.g., across the country) and keep their static IP because all LNS routers are required to advertise all subscriber super-nets. No aggregation of the subscriber allocated address space is possible, in order to allow subscribers to connect to any active LNS router, each router must announce all subscriber /32’s and /56’s, which means the highest volume of route scale and churn.
Least cost capacity increases by allowing for incremental capacity upgrades by deploying a single additional LNS at a time (this is assuming additional cost isn’t also incurred elsewhere e.g., opening a new PoP), and these LNSs can be small – the all-active load distribution means that no LNS must carry all subscribers. Potentially wasteful use of IPs as a percentage of subscribers are always offline and their IPs can’t be used by any another subscriber.
  Stateful address and/or port translation (e.g., CG-NAT, 464XLAT, MAP-E/T) could hold a large amount and/or more complex state to support discontiguous address space.

 

Scaling All-Active LNS Routers with Dynamic or Sticky Subscriber IPs

In this scenario, most of the advantages and disadvantages of the previous section apply with the following differences:

  • No subscribers have their IP(s) set in their RADIUS profile, but a preferred DHCP pool to use on the LNS may be specified in RADIUS.
  • Subscribers are stilled round-robin load-balanced across all LNS routers.
  • Each active LNS router is announcing aggregate routers covering the specific set of subscriber sessions it terminates.
Advantages Disadvantages
Aggregation of subscriber IPs is possible; each active LNS has one or multiple DHCP pools and assigns subscriber addresses from these pools. The LNS need only announce the DHCP pool super-nets into the IGP or BGP. This means the lowest volume of route scale and churn. If a subscriber moves location they may be served by a different LNS and lose their IP (i.e., the DHCP pool on their new LNS uses a different IP range). The length of the DHCP lease may also mean they have frequent WAN IP changes even when their PPP session flaps against the same LNS router.
Most efficient usage of IP address ranges because offline subscribers don’t have IPs allocated to them. For IPv6 the public addressing extends into the LAN due (i.e., through DHCPv6-PD). When the IPv6 range changes for the subscriber, all their LAN equipment needs to renumber.
Stateful address and/or port translation (e.g., CG-NAT, 464XLAT, MAP-E/T) can hold less complex state if address space aggregation is supported.  

 

Static IP Workarounds

For business customers sometimes a static IP may be a requirement. It is possible to use all active LNS routers and provide a static IP to business customers. One example method is to use Dynamic BGP Neighbours which allows establishing eBGP sessions between an LNS it’s connected CPEs without knowing their exact WAN IPs. Then CPEs can be shipped preconfigured or can auto-configure later, a static IP on a loopback interface and use that for NAT/PAT/Routing or assign a subnet to a LAN/DMZ interface. Then the CPEs can advertise this single IP or IP subnet over the eBGP session with the LNS.

Each LNS router must have an additional loopback interface with the same IP, so that whichever LNS the subscriber session terminates on, the LNS IP is the same, so that the eBGP neighbour address configured on every CPE is the same. The LNS uses a Dynamic Neighbour subnet range which is it’s DHCP pool range(s) so that whatever the CPE’s WAN IP is, the CPE can initiate the eBGP session establishment.

This will introduce an additional layer of complexity and scaling restrictions in the form of thousands of eBGP sessions to each LNS, so it is not necessarily the best method, just an example.

Another method is to implement automatic DNS updates for business customer PTR and A/AAAA records (I don’t mention MX records because businesses hosting their own email server(s) on the end of an xDSL connection probably have bigger problems). When using dynamic IPs, each time a subscriber is assigned an IP address this can trigger an update to the ISPs DNS name servers. When coupled with “very sticky” DHCP assignments (meaning long lease times which persist across CPE reboots and long outages of several days, without the DHCP lease expiring) this allows for the net effect of a static IP to be mostly maintained. Again, this isn’t necessarily the best method just another example.

 

Scaling Active/Standby LNS Routers with Static Subscriber IPs

The scenario below describes active/standby LNS routers which at minimum requires a pair of routers and implies that one of the routers is essentially a standby device for probably 99% of the time. It is generally assumed that routers are deployed in symmetrical pairs, meaning the always in multiples of two so that there are the same number of active routers and standby routers and that both routers are scaled to handle the same volume of traffic i.e., in the event that the active LNS failed, it wouldn’t be acceptable to failover to a lower capacity device.

There are ways to improve the efficiency of the active/standby scenario; one is to have fewer standby routers than active and instead of a 1:1 relationship between active and standby routers, have an n:1 model so that many active routers will failover to the same standby router. Equally, the “active” routers could be active for one group of subscribers whilst the “standby” group could actually be active for a different subscriber group, meaning there are no inactive routers.

To clarify further: one LNS or group of LNS routers can advertise an L2TP tunnel IP address which is more preferred for a specific subscriber demographic e.g., residential, or geographically local subscribers. Meanwhile an LNS router or group of routers can advertise an L2TP tunnel IP for the same subscriber demographic which is less preferred (making them the standby routers for this demographic). This 2nd LNS group can also advertise another L2TP IP address which is more preferred by a different subscriber  demographic e.g., business subscribers or geographically remote, whilst the 1st LNS group also advise an L2TP IP for this demographic which is less preferred. This provides an active/standby deployment which allows for IP address aggregation within routing announcements and predictable failovers but has no purely idle routers making inefficient use of money.

  • All subscribers have their static IP(s) set in their RADIUS profile.
  • Either each LNS router has a unique L2TP tunnel IP and all of these tunnel IPs are in RADIUS with an increasing Tunnel Preferences (lowest is more preferred), or all LNS routers advertise the same IP in IGP/BGP with increasing metric/decreasing BGP-LP.
  • Subscribers are always connected to the LNS with the lowest Tunnel-Preference which is online and responding to incoming L2TP call requests (or with the most preferred IGP/BGP metric if using any-cast).
  • Only the active LNS is announcing the subscriber IPs.
Advantages Disadvantages
Aggregation of the subscriber allocated address space is possible, each LNS router can announce only the subscriber super-nets, which means the lowest volume of route scale and churn. No load-sharing of subscribers (either by subscriber count or traffic load).
It is known exactly which LNS a subscriber will connect to before they connect, and which LNS they will connect to after a session flap. This means predictable failover and failback behaviour is guaranteed. Loss of a single LNS router results in all subscribers being disconnected and having to reconnect to the next most preferred router. At scale this can take a long time.
When using RADIUS based Tunnel-Preference to determine the active LNS router(s), restoration of the failed router has no impact on failed-over subscriber sessions. When using any-cast based load-sharing, restoration of the failed router causes all subscribers to immediately fail-back and experience a 2nd outage.
A subscriber can move physical location (e.g., across the country) and keep their static IP because all LNS routers are required to advertise all subscriber super-nets. Potentially wasteful use of IPs as a percentage of subscribers are always offline and their IPs can’t be used by any another subscriber.
Low-cost incremental capacity increases could be allowed for by deploying many pairs of lower scale and thus cheaper LNS routers (this is assuming additional cost isn’t also incurred elsewhere e.g., opening a new PoP). When using a low number of larger scaled LNS routers, rather than many small ones, additional LNS deployments for capacity upgrades are very costly and any standby LNS routers become a very inefficient use of money because they will be idle for 99% of the time.
Stateful address and/or port translation (e.g., CG-NAT, 464XLAT, MAP-E/T) can hold less complex state if address space aggregation is supported.  
For IPv6 customers it means that they can keep the same prefix delegation after failover the all the LAN devices don't need to renumber.  

 

One LNS our group of LNS routers can advertise an L2TP tunnel IP address which is more preferred for a specific subscriber demographic e.g., residential, or geographically local subscribers. Meanwhile an LNS router or group of routers can advertise an L2TP tunnel IP for the same subscriber demographic which is less preferred (making them the standby routers for this demographic). This 2nd LNS group can also advertise another L2TP IP address which is more preferred by a different subscriber  demographic e.g., business subscribers or geographically remote, whilst the 1st LNS group also advise an L2TP IP for this demographic which is less preferred. This provides an active/standby deployment which allows for IP address aggregation within routing announcements and predictable failovers but has no purely idle routers making inefficient use of money.

 

Scaling Active/Standby LNS Routers with Dynamic or Sticky Subscriber IPs

In this scenario, most of the advantages and disadvantages of the previous section apply with the following differences:

  • No subscribers have their IP(s) set in their RADIUS profile, but a preferred DHCP pool may be specified.
  • Subscribers are always connected to the LNS with the lowest Tunnel-Preference which is online and responding to incoming L2TP call requests (or with the most preferred IGP/BGP metric if using any-cast).
  • Only the active LNS is announcing the subscriber IPs.
Advantages Disadvantages
  Potentially wasteful use of IPs as a percentage of subscribers are always offline and their IPs can’t be used by any another subscriber.
  For IPv6 the public addressing extends into the LAN due (i.e., through DHCPv6-PD). When the IPv6 range changes for the subscriber, all their LAN equipment needs to renumber.

 

Afterword / Other Issues
 

Scaling LNS Core Connectivity

When using L2TP encapsulation for PPP based subscribers, all subscriber flows between a LAC and LNS will be inside a single L2TP tunnel with the same source and destination IP addresses and UDP port numbers. When using stateless load-balancing such as ECMP routing or LAGs between LNS routers and core P nodes, between P nodes, or between P nodes and LAC devices, multiple links or paths cannot be used.

To clarify, when using PPPoE between the CPE and LAC, and L2TP towards an LNS; for devices which lie between the CPE and LAC, most can’t look beyond the PPPoE header so they only have the CPE MAC to hash on, meaning no per-flow hashing (and no visibility of the IP DSCP markings for QoS), and between the LAC and LNS most devices can’t hash beyond the L2TP header meaning they only have the LAC and LNS IPs to hash on.

This problem can generally be avoided when using IPoE and BNG devices because the end customer is either connected directly using [possibly extended] layer 2 to the BNG, which means that the customer Ethernet and IP headers are available to the BNG and intermediary device between CPE and BNG for stateless load-balancing. Alternatively, layer 2 connectivity between the CPE and BNG may implemented over a layer 2 MPLS VPN such as a pseudowire, in which case the MPLS label stack can be used for load-balancing, multiple pseudowires can be used to increase label entropy, the ingress PE can implement FAT or Entropy labels, and most devices can read deeper than the label stack and hash on the payload customer Ethernet and/or IP headers.

One method to avoid this issue with PPP subscribers is to use multiple loopback interface on either the LNS device and/or the LAC device. Within an Access-Accept message RADIUS can return the multiple IPs on the LNS(s) with the same Tunnel Preference allowing the LAC to round-robins subscribers over multiple L2TP tunnels towards the same LNS router, creating a different 5-tuple for the L2TP tunnel (and unique MPLS label if each loopback on the LNS has a different IP address). Additionally, the LAC may use multiple Loopback interfaces as multiple L2TP source IPs to create or enhance the same effect.

A similar method is to create multiple sub-interfaces on each physical interface of the LNS router. Each time more capacity is needed a new physical link to an upstream P or PE device is create, and one or multiple sub-interfaces are created for terminating L2TP tunnels. It is these sub-interface addresses at are again returned to the LAC in the Access-Accept message for load-balancing instead of multiple loopback IPs.

 

Other L2TP/PPP Issues

Another issue with PPPoE is that most devices between the CPE and LNS also can’t implement ACLs, because they can’t read beyond the PPP header. Only the LNS will be able to read the IP headers to implement an ACL meaning that any unwanted traffic is carried all the way across the network to the LNS before it is dropped.

L2TP support for IPv6 underlay routing is limited. Some vendors like Cisco don’t support subscriber sessions over IPv6 L2TP tunnels (on their ASR1K platform). Cisco also don’t support copying of the subscriber DCSP values to the L2TP tunnel IP header for IPv6 subscriber traffic, even with an IPv4 L2TP tunnel underlay (again this is on the ASR1K series, a feature called “IP ToS Reflect” for IPv4 subscriber traffic).

 

Session Steering

Depending on the method used for load-sharing subscribers over multiple LNS routers affects the granularity of session steering capabilities and the granularity or failover and failback capabilities.

  • One problem with any-cast is the inability to move specific groups of users between LNS routers or to control the speed of migration, it’s an all-or-nothing hammer. Adjusting the IGP or BGP metric of the LNS IP affects all users of that IP (meaning, all L2TP tunnels targeting that loopback IP).
  • Session steering through the IGP or BGP metric to an IP address requires a network configuration change rather than a RADIUS/application-level change, with the latter likely being less risky and more acceptable as a daytime change.

  • Any-casting / IGP/BGP traffic steering doesn’t easily allow for scalable per subscriber domain steering without allocating a new any-cast IP for every domain. This wouldn’t be a big issue if it wasn’t for the fact that most vendors don’t support L2TP over IPv6, and IPv4 addresses are precious these days.

 

ECMP and MLPPP

When the physical access circuit speed is slow, it may be desirable to logically "bond" or "group" multiple links to the same customer site together into one logical link, in order to increase capacity at the single flow level (as opposed to multiple links with per-flow load-balancing, which is simple and traditional ECMP). One way to do this is to perform per-packet load-balancing over multiple independent PPP sessions between a single CPE and LNS router, with each PPP session being established over a separate physical access circuit. Another way to achieve this us to use MLPPP to bond the multiple PPP sessions between the same CPE and LNS router into a single PPP session. In either case, it is required that all the PPP sessions (one for each of the physical access links) on the same CPE are all established with the same LNS router. This requires traffic steering to a single LNS. When using all-active LNS routers for example, MLPPP or per-packet ECMP subscribers may need to go into separate RADIUS groups which steers their PPP sessions to a preferred LNS.