Date created: 05/19/19 08:37:33. Last modified: 05/27/19 09:52:34

Value by Reference

Contents:
Passing Values by Reference
Iteration by Reference and by Value
Passing Reference/Value with Multiples Processes

 

Passing Values by Reference

All variables in python are pointers to objects. In the following example “rnd_nums” is created and initialised under "main()". “random.sample()” returns a list which means that a list is stored in memory and a pointer is created called “rnd_nums” which points to that list.

When passing “rnd_nums” to the function “delete_all()” the value of “rnd_nums” is passed, which is a pointer reference to the list object in memory. Lists are mutable objects which means the function can modify the original object using the mutable operations, such as “rnd_nums.remove()”.

The result is that the function “delete_all()” is able edit in-place the list stored at the reference passed from main, and main will see the modifications:

import sys
import random


def delete_all(rnd_nums):

    while (len(rnd_nums) > 0):
        print("deleting {}".format(rnd_nums[0]))
        rnd_nums.remove(rnd_nums[0])

    return


def main():

    rnd_nums = random.sample(range(1, 100), 10)
    print("Start ({}): {}".format(len(rnd_nums), rnd_nums))
    delete_all(rnd_nums)
    print("End ({}): {}".format(len(rnd_nums), rnd_nums))

    return True


if __name__ == '__main__':
    sys.exit(main())

Output:

$ python3 sp_list.py 
Start (10): [36, 64, 2, 97, 50, 15, 88, 47, 77, 96]
deleting 36
deleting 64
deleting 2
deleting 97
deleting 50
deleting 15
deleting 88
deleting 47
deleting 77
deleting 96
End (0): []

 

Iteration by Reference and by Value

An important point to note in the above script is that the “delete_all()” function uses a while loop:

def delete_all(rnd_nums):

    while (len(rnd_nums) > 0):
        print("deleting {}".format(rnd_nums[0]))
        rnd_nums.remove(rnd_nums[0])

    return

This is fine for the simple case of deleting the list elements in the order they are stored. If, however, a check needs to be made to see if the list item should be delete (e.g., if it is divisible by 2), then a for loop would be required as follows:

def delete_all_2(rnd_nums):

    for num in rnd_nums:
        if (num % 2 == 0):
            print("deleting {}".format(num))
            rnd_nums.remove(num)

    return

The problem with the above for loop is that iterating over the iterable object which is also being mutated alters the behaviour of the for loop. In the output below 42 wasn’t deleted but it is divisible by 2, 82 was also not deleted:

$ python3 sp_list.py
Start (10): [30, 42, 10, 82, 89, 95, 43, 55, 41, 66]
deleting 30
deleting 10
deleting 66
End (7): [42, 82, 89, 95, 43, 55, 41]

Iterating over an object which is also being mutated in the above example is to iterate over an object by reference. As entries are added or removed from the object the references to the entries within are shifted. In this case it is required to iterate over an object by value. One way to do this is to create a copy of the object, or in the case of a simple list, to create a slice of the list but one which a full slice from start to finish:

def delete_all_3(rnd_nums):

    for num in rnd_nums[:]:
        if (num % 2 == 0):
            print("deleting {}".format(num))
            rnd_nums.remove(num)

    return

This produces the following output in which no numbers divisible by 2 are left in the list:

$ python3 sp_list.py
Start (10): [80, 4, 53, 7, 66, 28, 8, 77, 57, 76]
deleting 80
deleting 4
deleting 66
deleting 28
deleting 8
deleting 76
End (4): [53, 7, 77, 57]

 

Passing Reference/Value with Multiples Processes

When using multiple processes in Python it is not possible to pass by reference. All objects are passed as values and not just by value, they are full copies of the original object inside the new process. In addition to variables explicitly passed to a new process, global variables are implicitly copied to the new process. These are also full copies though. This means that any interactions with a global variable in a new process will not be applied to the same global variable in the parent process. This prevents any locking/blocking trying to access a shared resource from multiple processes. However, it can create issues if there is a need to access a shared resource like some form of progress tracking object.

The code below produces a list of 500,000 random numbers. This list is copied into a list of lists; each sub-list is a copy of 1/4th of the original list (125,000 entries in each sub-list). This list of lists "chunks" is passes to a pool of new processes. Each new Python process will receive a full copy of one of the sub-lists. Each Python process simply deletes all entries in the sub-list it is passed and then finishes. After all four processes have finished, they will have each deleted all entries in all the sub-lists in the list of lists called "chunks". Finally, the original process prints the length of each sub-list in "chunks" and it can be seen that they are untouched with 125,000 entries still in each sub-list. This is because each new process received a full copy of its allocated sub-list:

import math
import multiprocessing
import os
import random
import sys


def delete_all(passed_nums):

    print(
        "Started process: {} ({}) with {} items".format(
            multiprocessing.current_process(), os.getpid(), len(passed_nums)
        )
    )

    while len(passed_nums) > 0:
        passed_nums.remove(passed_nums[0])

    print(
        "Finished process: {} ({})".format(
            multiprocessing.current_process(), os.getpid()
        )
    )

    return passed_nums


def main():

    num_processes = 4
    rnd_nums = random.sample(range(1, 1000000), 500000)
    chunk_size = int(math.ceil(len(rnd_nums) / num_processes))
    Pool = multiprocessing.Pool(num_processes)

    # Split the rnd_nums list in num_processes sub-lists:
    chunks = [rnd_nums[i : i + chunk_size] for i in range(0, len(rnd_nums), chunk_size)]

    print("Number of CPUs:  {}".format(num_processes))
    print("Chunk size per CPU: {}".format(chunk_size))

    sorted_sub_lists = Pool.map(delete_all, chunks)

    for index, chunk in enumerate(chunks):
        print(
            "Size of chunk {}: {}".format(
                index,
                len(chunk)
            )
        )

    return True


if __name__ == "__main__":
    sys.exit(main())


$ python3 ./mp_list.py 
Number of CPUs:  4
Chunk size per CPU: 125000
Started process: <ForkProcess(ForkPoolWorker-1, started daemon)> (1480) with 125000 items
Started process: <ForkProcess(ForkPoolWorker-2, started daemon)> (1481) with 125000 items
Started process: <ForkProcess(ForkPoolWorker-3, started daemon)> (1482) with 125000 items
Started process: <ForkProcess(ForkPoolWorker-4, started daemon)> (1483) with 125000 items
Finished process: <ForkProcess(ForkPoolWorker-1, started daemon)> (1480)
Finished process: <ForkProcess(ForkPoolWorker-2, started daemon)> (1481)
Finished process: <ForkProcess(ForkPoolWorker-3, started daemon)> (1482)
Finished process: <ForkProcess(ForkPoolWorker-4, started daemon)> (1483)
Size of chunk 0: 125000
Size of chunk 1: 125000
Size of chunk 2: 125000
Size of chunk 3: 125000

 


Previous page: Scapy Examples
Next page: SVN Notes