Date created: 10/09/13 17:02:00. Last modified: 01/05/18 11:28:11

95th Percentile and Mean/Median/Mode

See Also:
Notes on 95th percentile accuracy with RRDTools: 95th Percentile Accuracy with RRDTools

Mean:

Produce most likely central tendency although often not a number in the given data set (mathematical average). Good for data sets which are evenly spread, bad for data sets with some skewed or far out results.

All the values in the data set added together, then divided by the number of values in the data set.

 

Median (50th percentile):

Median produces the exact middle value of a data set containing an odd number of values. For data sets with an even number of values this value will not exist in the data set (it will be the average of the two most middle values). The median is not skewed by an odd or far out ranging value.

The exact middle value in a data set arrange in order.

 

Mode:

The most frequent value in a data set (highest number of occurrences). This can be skewed though if a large number of the same very high or very low values are in the data set. Also there may be two values that equally the most common.

Highest occurring value(s).

 

Percentiles:

A percentile shows a slice of the values in a data set. Bigger percentile ranges (90%) include more outlier values increasing the range of values, and are more stable (the top percentile value rises or falls slowly). A smaller percentile range (50%) shows few values and fluctuates more rapidly, and represents a smaller data subset.

95th percentile allows for the top 5% of traffic peaks to be dropped. This is what allows for burstable billing. Committed Information Rate billing gives a fixed pipe size which is similar to a 95th percentile commit rate, but no higher bursting is possible (sometimes bursting to a Peak Information Rate is allowed). Actual throughput billing on mega/giga/tera-bytes per month is more costly for the users, the usage is measured and invoiced with a direct 1:1 ratio.

On a typical business connection with day time peaks, there are lows at night times and weekends. When a months worth of samples are sorted there is a reasonable amount of data above the 95th percentile that gets dropped before billing. With a customer that has fairly consistent usage 24x7 with the odd burst here and there, there isn't a large difference between the 95th percentile and 96th. The 95th percentile is quite high when traffic is steady, this is why average speed billing can be costly. If a customer has very high peaks relative to their normal usage, that fall above the 95th percentile, that user is receiving the best bursty traffic billing scenario.