Date created: 03/13/17 19:37:19. Last modified: 02/26/21 08:38:05

Capacity Planning Terminology

The Difference between Congestion, Contention and [Over]Subscription

The subscription ratio is the quotient of total bandwidth that has been sold over a link divided by its actual capacity. Oversubscription occurs when more bandwidth is sold on a link than there is available capacity, but this doesn't implicitly guarantee congestion i.e., due to time-based multiplexing not all customers on the shared backhaul link are using their service at the same time. Oversubscription doesn't only occur on shared links like a PoP backhaul link, if a customer requires 2Gbps of connectivity and they are sold a dedicated access circuit with only 1Gbps of capacity, that link will be oversubscribed by 100% or 2:1, and thus congested, albeit solely by themselves.

The contention ratio describes the impact of one or more customers' bandwidth usage on a share link, on the service/capacity of any other customers on the same shared link or infrastructure. Unlike oversubscription, the contention ratio is in practical terms something which only applies to shared infrastructure. This is because the contention ratio of an access circuit dedicated to a single customer is 1:1, which effectively means there is no contention, only that customer impacts the available capacity of their dedicated access circuit (assuming the access link isn't oversubscribed i.e., the customer needs 1Gbps of bandwidth and they have a 1Gbps link).

Congestion [either as a ratio or percentage] describes the time period when more traffic is trying to pass over a link or device than the link or device has capacity. A customer trying to send 1.2Gbps of traffic over a 1Gbps link causes congestion. Congestion is the event that occurs when either the subscription ratio and/or contention ration of a service haven't been well managed and traffic demand exceeds available capacity.

 

Congestion, Contention and Oversubscription Examples

For example, a 1Gbps link or NNI which has 20x 100Mbps customer connections aggregated over it has the oversubscription ratio 2:1.

Oversubscription == total bandwidth sold / NNI speed == (20x100Mbps) / 1000Mbps == 2:1.

This means twice as much bandwidth has been sold as is actually available on the physical link. This doesn't mean the link is congested though, congestion would only occur if 10 of the 20 customers simultaneously use their full 100Mbps of capacity allowance and an 11th customer tries to send any traffic at all.

The contention ratio for the above example is 10:1.

Contention ratio == NNI speed / individual customer link speed == (1000Mbps/100Mbps) == 10:1.

This means that at least 10 customer connections must be running at full capacity before any one other customer's service is impacted. 10 customers can use their connection at maximum speed without any congestion occurring, if they are the only 10 customers using their connections at that moment in time.

Further examples:
300x 10Mbps circuits over a 1Gbps NNI
Oversubscription ratio 3:1
Contention ratio 100:1

1000x 2Mbps circuits over 40Mbps link
Oversubscription ratio 50:1
Contention ratio 20:1

50x 2Mbps circuits over 2Mbps link
Oversubscription ratio 50:1
Contention ratio 1:1

ECMP Load Imbalance Capacity Impact
With LAG/ECMP bundles, if the hashing/loading distribution is both static and uneven (the device doesn't adjust its hashing based upon bundle member link load for example), then the overall capacity of the bundle is reduced as well as the capacity during the various failure scenarios described above (which all suit a "perfect" load distribution). The following applies to a static hash algorithm that is load-unaware.

The number of connections in a bundle is N.
The speed of each single member link is S.
Lmax is the utilisation of the highest utilised member link in a bundle.
Lavg is the average load across all remaining member links (excluding the most used link).
Lrem is the remaining capacity on the highest used member link: S–((S/100)*Lmax)
I is the load imbalance on the bundle, which is the ratio of traffic on the most used link in the bundle compared to the average link usage across the remaining links: Lmax/Lavg
I is the inverse of the load imbalance: 1/I
Bcap is the total bundle capacity: N*S
Bcur is the current bundle utilisation: (((N-1)*S) * (Lavg/100)) + ((S/100)*Lmax)
Brem is the current remaining unused capacity in the bundle: (((N-1)*S) * (Lavg/100)) + ((S/100)*(100-Lmax))
Bmax is the maximum overall bundle capacity that can be achieved with the current load imbalance (if any, when I > 1.0): Bcur + ((N-1)*(Lrem*I)) + Lrem
Blos is the wasted capacity when Bmax is reached (when load imbalance is present): (S*N) - Bmax

The load imbalance I for a bundle is Lmax / Lavg. For example, with an 8 link bundle with each link running perfectly equally at 50% utilisation, I == 1.0 (Lmax 50% / Lavg 50%), which is no load imbalance at all. When more traffic is hashed onto a single link, or maybe some elephant flows are hashed to the same link, let says link 1 of 8 for example, link 1 might become 100% utilised. This now means that Lmax == 100%, Lavg == 50% (assuming there has been no traffic increase on the other 7 member links). The result is that I == 100 / 50 == 2.0. The link which is running at the highest utilisation (link 1 at 100% utilisation) is running at 2.0 times the load of the average link load across the remaining member links (links 2-8).

S*N is the maximum theoretical capacity of a link bundle. Bmax is the maximum attainable bundle capacity when load imbalance is present (otherwise it is equal to S*N). Bcur is the current bundle utilisation. Brem is the current unused capacity in a bundle. Blos is the unused (wasted) capacity in a bundle when load imbalance is present (I > 1.0) and Bmax is reached.

With an 8 member link bundle and all 8 member links are running perfectly equally at 50% utilisation each for example, I == 1.0, which means the capacity of Bcap is fully attainable and there is no load imbalance.

N == 8
S == 1,000Mbps
Bcap == (8*1,000Mbs) ==8,000Mbps
Lmax == 50%
Lavg == 50%
Lrem == 1000 – ((1000/100)*50) == 500Mbps (6.25%)
I == (50/50) == 1.0
I == 1/1.0 == 1
Bcur == (((8-1)*1000)*(50/100) ) + ((1000Mbps/100)*50) == 4,000Mbps (50.00%)
Brem == 8,000Mbps – 4,000Mbps == 4,000Mbps (50.00%)
Bmax == 4,000Mbps + ((8-1)*(500Mbps*1)) + 500Mbps == 8,000Mbps (100.00%)
Blos == 8,000Mbps – 8,000Mbps == 0Mbps (0%)

With link 1 running at 100% utilisation and the remaining links (2-8) running at 50% utilisation, I == 2.0. Even though the other 7 links in the bundle each have 50% of free capacity, one member in the bundle is full which means no more traffic can be sent over that member link. Additional traffic directed to the bundle may be hashed to that same member link with no free capacity, this will then result in traffic loss. Now that no more traffic can be added to the bundle, the capacity on the remaining member links is wasted. 4,500Mbps is the max rate this 8,000Mbps bundle can be used for (Bmax), meaning 3,500Mbps or 43.75% (3500/8000) of capacity is wasted (Brem) due to load imbalance:

N == 8
S == 1,000Mbps
Bcap == (8*1,000Mbs) ==8,000Mbps
Lmax == 100%
Lavg == 50%
Lrem == 1000 – ((1000/100)*100) == 0Mbps (0.00%)
I == (100/50) == 2.0
I ==1/2.0 == 0.5
Bcur == (((8-1)*1000)*(50/100) ) + ((1000Mbps/100)*75) == 4,500Mbps (56.25%)
Brem == 8,000Mbps – 4,500Mbps == 3,500Mbps (43.75%)
Bmax == 4,500Mbps + ((8-1)*(0Mbps*0.5)) + 0Mbps == 4,500Mbps (56.25%)
Blos == 8,000Mbps – 4,500Mbps == 3,500Mbps (43.75%)

Another example; if link 1 is running at 75% utilisation and links 2-8 are running at 50% utilisation, I == 1.5. However the difference here is that more traffic can still be added to this bundle as that 1st (busiest) link is "only" 75% full. At this point Bcur == 4,250Mbps (53.125%) however assuming the imbalance remains link 1 will hit 100% utilisation first the Bmax will be reached, 5,655Mbps (70.68%) meaning the wasted bandwidth on the bundle is 2,345Mbps (29.31%):

N == 8
S == 1,000Mbps
Bcap == (8*1,000Mbs) ==8,000Mbps
Lmax == 75%
Lavg == 50%
Lrem == 1000 – ((1000/100)*75) == 250Mbps (3.125%)
I == (75/50) == 1.5
I == 1/1.5 == 0.66
Bcur == (((8-1)*1000)*(50/100) ) + ((1000Mbps/100)*75) == 4,250Mbps (53.125%)
Brem == 8,000Mbps – 4,250Mbps == 3,750Mbps (46.875%)
Bmax == 4,250Mbps + ((8-1)*(250Mbps*0.66)) + 250Mbps ==5,655Mbps (70.68%)
Blos == 8,000Mbps - 5,655Mbps == 2,345Mbps (29.31%)

A more realistic example with most links averaging less than 50% usage:

N == 8
S == 1,000Mbps
Bcap == (8*1,000Mbs) ==8,000Mbps
Lmax == 50%
Lavg == 25%
Lrem == 1000 – ((1000/100)*50) == 500Mbps (50.00%)
I == (50/25) == 2.0
I == 1/2.0 == 0.5
Bcur == (((8-1)*1000)*(25/100) ) + ((1000Mbps/100)*50) == 2,250Mbps (28.125%)
Brem == 8,000Mbps – 2,250Mbps == 5,750Mbps (71.875%)
Bmax == 2,250Mbps + ((8-1)*(500Mbps*0.5)) + 500Mbps ==4,500Mbps (56.25%)
Blos == 8,000Mbps - 4,500Mbps == 3,500Mbps (43.75%)


Previous page: SNMP with APC PDU
Next page: SNMP Examples