Inter-AS MPLS OptionB - MPLS Label Usage

References:
http://www.cisco.com/c/en/us/td/docs/switches/metro/me3600x_3800x/software/release/15-5_1_S/configuration/guide/3800x3600xscg/swmplsloadbalancing.html

When building MPLS Inter-AS Option B interconnects the number of local labels doubles on the ASRBs consuming LFIB usage quicker than expected. This means label exhaustion is easily reached, especially with a couple of link flaps occur for example. This is demoed below on ME3600s (22,000 label limit by default, usable labels are 16 to 21999) and ME3800s (30,000 label limit by default, usable labels are 16 to 29499).

Topology:

PE1
|
ASBR2/PE2 == PE3
|
ASBR1

When a PE (for example PE2) receives routes from an iBGP VPNv4 neighbour (for example PE1) the routes are received with an MPLS label value, this doesn’t use any local label space (LFIB) on PE2 unless PE2 is advertising those routes to another iBGP peer PE (such as PE3) or a CE. In this case PE2 needs to allocate a local label and will advertise those routes to PE3 with its local label value in the BGP update (assume using next-hop-self etc).

If PE1 sends PE2 100 routes PE2 uses no labels. But if PE 2 sends PE3 the 100 routes then PE2 needs to allocate 100 labels (local labels) and now the LFIB usage is 100 (local) labels.

This is the same for VPNv4/6 and LDP learnt routes on PE2, it won’t use any of its 22,000 label limit unless it has to allocate a local label (such as when it advertises the routes to another PE).

In the case of eBGP VPNv4 (Inter-AS MPLS Option B) PE2 is receiving 100 routes from PE1 and uses 100 labels because it’s advertising the 100 routes to PE3 but now with PE2 uses another 100 labels because it generates different labels to send to the eBGP ASBR peer. So now it is using 200 labels for the 100 routes it has received from PE1.

Labels are not released for re-use (holddown timer) for 5 minutes after they are cleared. For example if the VPNv4 session to PE1 is dropped, PE2 will continue to label switch traffic it received from either PE3 or ASBR1 for a further 5 minutes until those labels time out from the LFIB.

In the case that PE2 has 12,000 routes and 12,000 local labels in use for those routes, if the BGP session to PE1 flaps (goes down and up again within the LFIB 5 minute timeout window) it will relearn those routes and allocate new labels and at this point try to use 24,000 labels (until the first 12,000 time out) and will run out of local label space.

 

One can check the label usage on an ME3600/ME3800 switch using the following:

abr2#show mpls forwarding-table summary

24 total labels

abr2#show sdm prefer current
The current License is AdvancedMetroIPAccess
The current template is "default" template.

Template values:
      number of mac table entries                        =  16000
      number of ipv4 routes                              =  20000
      number of ipv6 routes                              =  6000
      number of routing groups                           =  1000
      number of multicast groups                         =  1000
      number of bridge domains                           =  4096
      number of acl entries                              =  2000
      number of MDT mroutes                              =  0
      number of ipv6 acl entries                         =  1000
      number of ipv4 pbr entries                         =  0



abr2#show platform nile adjmgr all | i EMPLS
EMPLS3LD Total Alloc:204693 Total Free:196801 Usage:7892
EMPLSINTD Total Alloc:326 Total Free:302 Usage:24

abr2#show platform aspdma template | i MPLS
NILE_NUM_EOMPLS_TUNNELS                  =  512
NILE_NUM_ROUTED_EOMPLS_TUNNELS           =  128
NILE_NUM_MPLS_VPN                        =  128
NILE_NUM_MPLS_SERVICES                   =  512
NILE_NUM_MPLS_INGRESS_LABELS             =  22000 ! Ingress + egress share label space
NILE_NUM_MPLS_EGRESS_LABELS              =  28500
MPLSD_TABLE                                   = 34816
EMPLS3LD_TABLE                                = 28672 !28k maximum mixture of ingress+egress labels of which no more than 22k can be egress and/or no more than 28.5k can be egress labels

Example ME3600X PE that is out of MPLS label space:

! These are the messages in the syslog:

swi1.core#show logging | i mpls|label
Jan 20 13:30:58.653 UTC: nmpls_next_label_check: Label allocation Failed
Jan 20 13:30:58.653 UTC: label allocation failed for fib 172.16.250.112/30 Tbl:34 label val 1054
Jan 20 13:30:58.657 UTC: nmpls_next_label_check: Label allocation Failed
Jan 20 13:30:58.657 UTC: label allocation failed for fib 10.228.254.142/31 Tbl:19 label val 5732
Jan 20 13:31:14.285 UTC: nmpls_next_label_check: Label allocation Failed
Jan 20 13:31:14.285 UTC: label allocation failed for fib 172.16.242.32/29 Tbl:34 label val 4459


! Check the MPLS limits inside the NILE TCAM:

swi1.core#show platform aspdma template | i MPLS
NILE_NUM_EOMPLS_TUNNELS                  =  512
NILE_NUM_ROUTED_EOMPLS_TUNNELS           =  128
NILE_NUM_MPLS_VPN                        =  128
NILE_NUM_MPLS_SERVICES                   =  512
NILE_NUM_MPLS_INGRESS_LABELS             =  22000
NILE_NUM_MPLS_EGRESS_LABELS              =  28500
MPLSD_TABLE                                   = 34816
EMPLS3LD_TABLE                                = 28672  ! << MAX 

! In the last line above, 28672 is max usable label count for L3 VPNs

! Below it can be sees that label usage is fluctuating each time the command is run, but 28670 labels used (in the 2nd output) is just 2 short of the maximum, so it is likely fluctuating up to max and down again as routes come and go:

swi1.core#show platform nile adjmgr all | i EMPLS
EMPLS3LD Total Alloc:11230743 Total Free:11202080 Usage:28663
EMPLSINTD Total Alloc:411 Total Free:394 Usage:17

swi1.core#show platform nile adjmgr all | i EMPLS
EMPLS3LD Total Alloc:11230813 Total Free:11202143 Usage:28670
EMPLSINTD Total Alloc:411 Total Free:394 Usage:17

swi1.core#show platform nile adjmgr all | i EMPLS
EMPLS3LD Total Alloc:11230830 Total Free:11202164 Usage:28666
EMPLSINTD Total Alloc:411 Total Free:394 Usage:17



! 20K out of 20,480 IPv4 unicast routes are used and this device is using per-prefix labeling for most if not all VRFs:

swi1.core#show platform tcam utilization ucastv4
Nile Tcam Utilization per Application & Region:
ES == Entry size == Number of 80 bit TCAM words
==================================================================
App/Region            Start  Num Avail  ES    Used Range  Num Used
==================================================================
UCASTV4                   0     20480   1
    nile0                                                    20000
    nile1                                                    20000



! The local label range of the switch is 16-22k so just under 22K usable local label values:

swi1.core#show mpls label range
Downstream Generic label region: Min/Max label: 16/21999


! 18K local labels are used:

swi1.core#show mpls forwarding-table summary
 
18029 total labels


! In the above command "show platform aspdma template | i MPLS" it can be seen that there is space for 28K labels and the command "show platform nile adjmgr all | i EMPLS" shows that 28k are in use not 18k as per "show mpls forw summ". Why the difference?

! MPLS Option B sessions double allocate labels. 18k labels used above is 18k local labels assigned by this PE (prefixes it is advertising on to to other MPLS enabled devices so it needs to assign a local label to create the end to end LSP). The 28k labels used is 18k local labels plus (28k-18k=10k) ~10k labels not advertised on (because this is the LER for an LSP for example). So the switch can hold slightly more labels that it can locally assign.

! Looking back at this command it can be seen that the switch can store 22000 ingress labels or 28500 egress labels, and it has a total shared ingress+egress storage space of 28,672 labels:
swi1.core#show platform aspdma template | i MPLS
NILE_NUM_MPLS_INGRESS_LABELS = 22000
NILE_NUM_MPLS_EGRESS_LABELS = 28500
...
MPLS3LD_TABLE = 28672
! Even though there are 480 unicast IPv4 routes left, this switch is out of label space, since per-prefix labelling is used here it is likely because of the extensive use of MPLS Opt B interconnects on this switch double allocating labels, so the labels are exhausted just before the prefix count in this example case.

The same switch after removing some VRFs to reduce the route count:

! After deleting a load of VRFs:

swi1.core#show platform tcam utilization ucastv4
Nile Tcam Utilization per Application & Region:
ES == Entry size == Number of 80 bit TCAM words
==================================================================
App/Region            Start  Num Avail  ES    Used Range  Num Used
==================================================================
UCASTV4                   0     20480   1
    nile0                                                    15672
    nile1                                                    15672


! ^ Dropped roughly 4k routes


swi1.core#show mpls label range
Downstream Generic label region: Min/Max label: 16/21999


swi1.core#show mpls forwarding summary
 
14065 total labels

! ^ Roughly 4k less local labels are assigned due to the loss of 4k routes


swi1.core#show platform nile adjmgr all | i EMPLS
EMPLS3LD Total Alloc:11243064 Total Free:11222973 Usage:20091 ! << 8K drop
EMPLSINTD Total Alloc:411 Total Free:394 Usage:17

! ^ Roughly 8k less labels are allocatd. Due to the use of Option B's double allocating labels, 4K less routes means 8K less labels (4k less local labels + 4k BGP labels advertised over the Opt B's) because per-prefix labelling is used here.