Multiprocessing for Kids Part 2: Shared Variables

When working with multiprocessing, most often the problem occurs that the different processes have to share there results or need to change the same variables. This is not so easy to do but with the script I provide it gets a lot easier.

For installation you can follow this GitHub link. Please make sure you have read the introduction to the basic usage of the script in my previous post.

Example 2: Search the Number
How to add and use shared variables

Often multiprocessing is used in combination with searching algorithms. They take a long time to run and the search processes need to share minimal to none information with each other. In this task we want to challenge the program to search for a random generated number using random guessing. 

For this example we need to import the multiprocessing for kids package as well as pythons random and time packages:

import multiprocessing_for_kids as mulki
import random
import time

We now need a function that can be executed by the doMultiprocessingLoop(…). It takes “THE_NUMBER” that the algorithm should guess, the “SEARCH_RANGE” in which the Number will be and a shared Variable called “result”.

def seach_the_number(_, THE_NUMBER, SEARCH_RANGE, result):
    # The task is to find a given Number between 1 and SEARCH_RANGE
    guess = 0
    while guess != THE_NUMBER:
        guess = random.randrange(1, SEARCH_RANGE)
        if result.value != 0: # check if other process found THE_NUMBER
            break
    else: # only if the loop breaks by its termination condition
        result.value = guess

There are a few things that might need some explanation. First of all, the order of the function parameters matters! The first one is always the current iteration value which we don’t need in this example so we describe it with an underline. Next, there are the constants passed to this function which can be used like normal variables (so they are not really constants) but will only exist in one process (here:  THE_NUMBER and SEARCH_RANGE). Finally the shared variables follow, in this case it’s only the “result”. 

Up to line 5 everything is simple Python: We execute a while loop that checks if our guess is “THE_NUMBER” and if not it generates a new random guess. Now on line 6 we check if the shared variable “result” is set to something else than 0. What this dose is it terminates the loop when another process has found the result. When working with my mulki script and MP in general, shared variables need to be used with the .value property. It’s easy but important to remember or otherwise you will get an error.

On line 8 do not wonder about the else clause after a loop. It’s not used really often but what it dose helps us a lot in this case. It prevents to run the code in line 9 when the loop was left with the break statement. So, only if the result was found and the termination condition of the loop was met, the shared variable will be set to the current guess.


Ok, now that we have our function, we need to execute it in multiple processes. Therefore we write a function called example2() that uses the doMultiprocessingLoop from the ‘MulKi’ package:

def example2():
    t0 = time.time()       # measure time

    SEARCH_RANGE = 100000  # increase this for longer searching times
    THE_NUMBER = random.randrange(1, SEARCH_RANGE)  # search result
    print("Number to guess: ", THE_NUMBER)

    print("Start guessing...")
    result = 0
    mulki.addSharedVars(result) # result as shared variable
    mulki.doMultiprocessingLoop(seach_the_number, range(4), False, 
                                THE_NUMBER, SEARCH_RANGE)

    print("Correct guess:", mulki.getSharedVarsAsValues()[0], "in ", 
          round(time.time() - t0, 2), "s")

As you can see, it’s really easy to add a shared variable that can be used in all processes by calling:

mulki.addSharedVars(result)

or just:

mulki.addSharedVars(0)

You can add as many shared variables as you want and from nearly any type you want. Just separate them with a coma in the addSharedVars function or call it multiple times. Currently supported are int, float, string, list and dict. Queue is also kind of supported but I would not recommend to use it.

To get the shared variables after the multiprocessingLoop is done, we can use

mulki.getSharedVarsAsValues()

to get a list of the values of all shared variables. Now we can execute the code by calling example2() and we should get something like this:

Output:
Number to guess: 15522
Start guessing…
Correct guess: 15522 in 3.85 s

To really make use of Multiprocessing the searching problem should take a bit longer to run. You can do so by increasing the SEARCH_RANGE. Of cause random guessing is not the best approach for searching but this example is simple and serves it’s purpose of teaching you how shared variables work. Sadly, shared variables are not always save to use and can lead to some serious problems when different processes try to change them at the same time. To understand the limits and workarounds make sure to work through Example 4.

Example 3: Terminate when result was found

In example 2 we interrupted the execution of the search by changing a shared variable (which is perfectly fine). However, there is another more intuitive way of doing it by using a standard return value inside the executed function. So let’s look at the same example but this time we change the code accordingly. First, the function:

def seach_the_number_ret(_, THE_NUMBER, SEARCH_RANGE):
    # The task is to find a given Number between 1 and SEARCH_RANGE
    
    guess = 0
    while guess != THE_NUMBER:
        guess = random.randrange(SEARCH_RANGE)

    return guess

Well, this looks a lot cleaner. To make this work we only have to set the third parameter of the doMultiprocessingLoop function to True instead of False. This tells the script to terminate all processes whenever a value is returned by one of them. We also add a ‘result =’ before the mulki.doMultiprocessingLoop(…) to catch the result.

def example3():
    t0 = time.time()         # measure time

    SEARCH_RANGE = 100000    # modify this for longer searching times
    THE_NUMBER = random.randrange(SEARCH_RANGE) # search result
    print("Number to guess: ", THE_NUMBER)

    print("Start guessing...")
    result = mulki.doMultiprocessingLoop(seach_the_number_ret, 
             range(100), True, THE_NUMBER, SEARCH_RANGE)

    print("Correct guess:",result[0],"in ",round(time.time()-t0, 2),"s")

example3()

Output:
Number to guess: 53721
Start guessing…
Terminating Multiprocessing…
Correct guess: 53721 in 3.24 s

Example 4: Shared counting

This example shows the problem with shared variables and how to fix it. Therefore we want to do the counting from example 1, but this time we do not separate it into different parts. The processes all work with the same shared variable and increase it by 1 in every loop cycle.
Because we want to fully utilize our computer we set the iterator range to the number of available CPU’s. Since you may have more or less than my 4 CPU’s I decided to replace the 4 by ‘cpu_count()’ from the multiprocessing package. Therefore, each of the processes adds ‘GOAL/cpu_count’ times a 1 to the shared variable.

from multiprocessing import cpu_count
def countTo_shared(_, PROCESS_GOAL, PRINT, counter):
    for _ in range(PROCESS_GOAL):
        counter.value += 1
        if PRINT:
            print(counter.value)

def example4(PRINT = False):
    print("Start...")
    t0 = time.time()
    GOAL = 100000          # counting goal = 100.000
    PROCESS_GOAL = int(GOAL / cpu_count())
    mulki.addSharedVars(0) # share the counter variable
    mulki.doMultiprocessingLoop(countTo_shared, range(cpu_count()), 
                                False, PROCESS_GOAL, PRINT)
    print("Finished counting with Multiprocessing in ", 
          round(time.time()-t0, 1), "s")

example4(True)

It takes some time to run and the output will look somewhat similar to this:

Output:

46172
46173
46174
46175
46176
Finished counting with Multiprocessing in 25.3 s

As we can see the last value is not 100.000! So, what happened?
The processes take the current value, add 1 to them and then return them back into memory. Now, it happens, that two (or more) processes take the current value e.g. 100, add 1 and put it back into the memory slot. Now they counted 2 steps, but the shared value only increased by 1. They just counted the same step. Thats why we reach a little less than half of our GOAL (=100.000).

To avoid this, we simply need to use a Lock from the multiprocessing package. It locks all processes while we change a shared variable and release it when we are done:

from multiprocessing import cpu_count, Lock
lock = Lock()
def countTo_shared(_, PROCESS_GOAL, PRINT, counter):
    for _ in range(PROCESS_GOAL):
        lock.acquire()     # Lock all processes
        counter.value += 1
        lock.release()     # Release them
        if PRINT:
            print(counter.value)

def example4(PRINT = False):
    ...

example4(True)

Output:

99996
99997
99998
99999
100000
Finished counting with Multiprocessing in 28.7 s

It took 3 seconds longer to run, but now we have the desired result.

As expected, frequent calls of the shared Variable will slow down the program! E.g. counting a variable to 100.000 with a normal Python loop only takes 0.531 s! I used this example just to illustrate the problems that can occur when changing a shared Variable really frequently and how to fix it using a Lock.

Conclusion

  • We can add shared variables that can be used in multiple processes at the same time.
  • It’s possible to terminate multiprocessing when your function returns a value by setting the third parameter of doMultiprocessingLoop() to True.
  • Accessing the same variable within different processes is not always save. If you are not sure what you are doing or you did not understand the examples in this post, just always put lock.acquire() before- and lock.release() after the variable change. 
  • Changing shared variables takes time! If you can avoid frequent changes do so. Otherwise a normal Python loop might be faster. 


The last Part of this series (with a more complex example) will follow soon.

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.