Date created: 11/20/16 22:59:51. Last modified: 09/07/18 09:37:45

Multi-Process Parallel Sorting Comparison

A comparison of the same sorting algorithms run firsty using a single Python process and then split into multiple threads using the multiprocessing library in Python for parallelisation. The integer argument to each script is the length of a list to generate of random numbers which is used for sorting.

This script (single_process_sorts.py) shows the running time for various sorting algorithms run serially in a single Python thread:

[email protected]:~/Python/threading$ python single_process_sorts.py 5000
Bubble Sort:
Duration:  1.64284515381

Selection Sort:
Duration: 0.722052097321 Insertion Sort: Duration: 1.59017801285

 

This script (multi_process_bubble_sort_worse.py) runs a bubble sorting algorithm over multiple processes however it actually takes longer than the standard single threaded version above:

[email protected]:~/Python/threading$ python multi_process_bubble_sort.py 5000
Bubble Sort:
Duration:  2.35310292244

 This script (multi_process_insertion_sort_worse.py) runs an insertion sorting algorithm over multiple processes however it also takes longer that the standard single threaded version:

[email protected]:~/Python/threading$ python multi_process_insertion_sort.py 5000
Insertion Sort:
Duration:  1.67110109329 

 

Taking a closer look at the Bubble sort algorithm, bubble sort has an upper bound performance of O(n^2). When the list is already sorted (the best case scenario) the complexity of bubble sort is only O(n) (at the least the whole input list must be walked to check it is in the required order already). The Bubble sort works by "bubbling" up the largest/smallest number to the top of the list on the first iteration, then the next largest/smallest to the top-1 list index on the second iteration, and so on.

In this first bubble sort multi-processing example the list is split into C chunks where C is the number of CPU cores. On the example machine there are four cores so C == 4.  This means the four quarters of random test data should be sorted in about a quarter of the time the single threaded version sorts the entire list:

[email protected]:~/Python/threading$ python multi_process_bubble_sort_worse.py 5000
Bubble Sort:
Duration:  0.423403024673 
# Break after initial chunk sort ^

[email protected]:~/Python/threading$ python single_process_sorts.py 5000
Bubble Sort:
Duration:  1.63240098953 

That is pretty much a quarter of the sorting time. The reason the overall time is so much longer though is because the four separate lists need to be compiled together again. Bubble sort is not able to comprehend that a list may be partially sorted already or benefit from any partial pre-sorting. In this example code two sorted chunks are concatenated together but now this new joint list needs sorted so the new list is passed to bubble sort in a new process. Each pair of sorted chunks is jointed and a new sorting thread spun up. Those sorted joint lists now need to be joined and sorted, and so on. This is ultimately less efficient. Despite being a multi-process sort the data set is iterated over so many times it actually takes longer.

Now lets compare this to C and factor in branch prediction: http://stackoverflow.com/questions/11227809/why-is-it-faster-to-process-a-sorted-array-than-an-unsorted-array?rq=1