Knapsack Problem

In this tutorial I want to show you two ways of solving the popular Knapsack Problem. You will learn how to use random guessing in combination with multiprocessing to maximize performance. Afterwards I will show you how to use Google OR-Tools to solve this problem in no time. 

If you have trouble understanding how to use the multiprocessing for kids script (that I will use in this example) then check out Part 1 and 2 beforehand.

The Problem

The Knapsack problem is one of Karp’s 21 NP-complete problems. It’s pretty popular but also easy to explain…

So, you are a filmmaker and have a lot of gear but only one knapsack.

Now, each item has it’s own volume (in the literature it’s often called weight but I think volume makes much more sense) and it’s own value for your film trip. You can’t take every item because the knapsack only has a limited amount of space inside of it. The goal is to find the combination of items with maximum value that fits into the knapsack. It’s important to note that the value of each object is independent of the combination with other objects. So even a solution that only puts camera lenses without a camera in the knapsack would count as valid. The reason why the solution is so difficult to calculate is because the number of combinations that need to be checked grows exponentially with the number of items.
The problem appears in real-world decision-making processes e.g. resource allocation or least wasteful way to cut of raw materials.

Solution 1: Random guessing (multiprocessing example 5)

Since computers got really fast and will get much faster in the future (following moore’s law), it’s often not a bad idea to try random guessing or a brute force approach first. In many cases you get at least decent results for nearly any real world problem without a lot of algorithmic knowledge and development time.

So, I wrote a function that dose random guessing in a loop for 60 seconds and updates two shared variables as soon as it found a better solution than the current one. 
Note: I renamed ‘item values’ to PRICES to avoid confusion with other ‘values’ denominators.

The function takes the (unused) iteration value from the multiprocessing loop, the VOLUMES, the PRICES and the KNAPSACK_VOLUME as static variables. It also takes the current ‘solution’ and the ‘solution_price’ as shared variables which represent the current set of items with the highest price (or value) that fit into the knapsack and the actual price.

def knapsack_search(_, VOLUMES, PRICES, KNAPSACK_VOLUME, solution, 
                    solution_price):

Inside the function we save the amount of objects (or items) into the OBJECTS variable to make future code more readable. We also save the current time in t0 and start a while loop that run’s for 40 seconds:

OBJECTS = len(VOLUMES)          # amount of objects
t0 = time.time()
while (time.time() - t0) < 40:  # search for 40 seconds

Inside the loop we create a sample containing all possible items but in a random order. Therefore we first create an array containing the randomly ordered indices or ‘pick_positions’ of the objects. Then we use these to get the volumes of our current pick and save them in ‘current_volumes’

    # Create a random filling of the knapsack:
    pick_positions = random.sample(list(range(OBJECTS)), OBJECTS)
    current_volumes = [VOLUMES[i] for i in pick_positions]

Our current pick contains all available items. Since our knapsack volume is limited this will not fit inside. For me the most intuitive way to create a random sample that fits was to remove items until the sum(current_volumes) gets below the KNAPSACK_VOLUME:

    for i in range(1, OBJECTS):
        current_volumes = current_volumes[:-1]    # remove last element
        if sum(current_volumes) <= KNAPSACK_VOLUME: # fits in ?
            pick_positions = pick_positions[:-i]
            break

As a final step we calculate the value (‘total_price’) of the current pick and check if it’s higher than the current solution. If so, the shared variables will be updated:

    total_price = sum([PRICES[i] for i in pick_positions])
    if total_price > solution_price.value:
        solution_price.value = total_price
        solution.value       = pick_positions
        print("Current best:", total_price)

Now that we’ve got our search function, let’s define our problem and start a multiprocessing loop. First we define the function example5() and create a constant ‘OBJECTS’ containing the amount of items we must choose from to fill the knapsack.

def example5():
    OBJECTS = 24 # amount of items

Since we have no real world values, we create random VOLUMES and PRICES in different ranges. To make the results reproducible we set the random.seed to 42 (or another number of your choice). If you later want to compare the results with them from other algorithms you might want to remove the seed to get a different item set each time you run your code. As the amount of objects can be modified, the KNAPSACK_VOLUME should adapt accordingly. I decided to set it to a quarter of the summed volume of all items.

    random.seed(42) # make volumes reproducible
    VOLUMES = [random.randint(20, 100) for _ in range(OBJECTS)] 
    random.seed(42) # make prices reproducible
    PRICES  = [random.randint(10, 800) for _ in range(OBJECTS)] 
    KNAPSACK_VOLUME = int(sum(VOLUMES) / 4)

As a last step we add the shared variables ‘solution’ and ‘solution_price’ to make them available in every process and start the multiprocessing loop:

    solution = [0]
    solution_price = 0
    mulki.addSharedVars(solution, solution_price)
    mulki.doMultiprocessingLoop(knapsack_search, range(cpu_count()), 
                                False, VOLUMES, PRICES, KNAPSACK_VOLUME)
    
    # print highest price reached:
    print("random guessing solution:", mulki.getSharedVarsAsValues()[1]) 

If we run our solver using:

example5()

we get the following result:

Output:
Current best: 687
Current best: 1904
Current best: 2640
Current best: 2688
Current best: 3010
Current best: 3104
Current best: 4454
Current best: 4601
Current best: 4631
Current best: 4779
Current best: 4926
Current best: 5087
Current best: 5109
random guessing solution: 5109

This is the optimal solution! So without much programming effort we reached the maximum possible price in only 40 seconds of computation. You can get the packed items and their total volume with:

packed_items = mulki.getSharedVarsAsValues()[0]
total_volume = sum([VOLUMES[i] for i in packed_items])



In general we can say: the longer we run our random guessing and the faster our CPU, the closer we get to the optimal solution. To see this effect I’ve run the code multiple times with different amounts of items and plotted the time it took to calculate the optimal solution using random guessing.

Figure 1: Mean (red) of time it took to calculate the optimal solution using random guessing for 20 runs (blue) over different amounts of items. Y-axis in log scale.

As we can see in Figure 1, the time it takes random guessing to find the optimal solution grows exponentially with the amount of items. This tells us that for larger problems with a lot of items the random guessing approach is not feasible for finding the optimal solution. It might however give you a good approximation after a few minutes of runtime.

Of cause there are a lot of faster and better algorithms to solve the knapsack problem. If you are interested in the mathematical theory or in dynamic programming implementations feel free to leave this site and check out Wikipedia or another resource of your choice. I’m a fan of a more practical approach using a library that is already implemented and free to use. Therefore we will look at how to use one of the Google OR-Tools which was developed especially for this problem using branch and bound search.

Solution 2: Google OR-Tools

Google OR-Tools is a library that can be used with C++, C#, Java and Python. It implements solvers for various combinatorial optimization problems. You can install it using pythons package manager (pip):

python -m pip install -U --user ortools

Or, to install system wide:

sudo python -m pip install -U ortools

Now we want to use the ‘pywrapknapsack_solver’ to find the optimal solution to the problem we defined in example5(). Thus we import the package:

from ortools.algorithms import pywrapknapsack_solver

We initialize and run the solver with:

solver = pywrapknapsack_solver.KnapsackSolver(      pywrapknapsack_solver.KnapsackSolver.KNAPSACK_DYNAMIC_PROGRAMMING_SOLVER, 'test')
solver.Init(PRICES, [VOLUMES], [KNAPSACK_VOLUME])
computed_value = solver.Solve()   # max price possible

It’s important to note that the sover.Init() function takes the values (prices) array first, followed by an array of the array of the volumes (or weights). The last parameter is the max volume (or weight) that fits into the knapsack, as an array with only one entry (don’t ask me why they did it this way). If you want to extract the actual solution, use:

packed_items  = [x for x in range(len(PRICES))
                if solver.BestSolutionContains(x)]
total_volume  = sum([VOLUMES[i] for i in packed_items])
packed_prices = [PRICES[i] for i in packed_items]

For 24 items it only takes 0.00021 s to find the optimal solution of 5109. I guess we have to increase the amount of items to test the limits of this algorithm:

As we can see, it takes about 850 items to pass the one second mark and with 2000 items the algorithm takes 12.2 s to find the optimal solution. The increase in computation time dose not grow exponentially as we can see in the log plot on the right. Instead it looks like polynomial growth O(n^2), which is quit impressive. Overall this is probably the fastest and most consistent way to calculate a solution to the knapsack problem.

Conclusion

  • If you’re not sure how to solve a specific problem or you just want to see quick results, random guessing can be a great starting point. It’s really easy to implement and you might get decent approximations to the optimal solution if you give it some time. 
  • Random guessing approaches can also be optimized e.g. by avoiding combinations that have been tried before. It’s also possible to work with something like simulated annealing to decrease the search space over time to focus on the most promising parts of it.
  • However if you need a reliable and fast algorithm e.g. in production, try to find one that is already implemented for you. Google OR-Tools have solutions too many complex problems and are free to use for commercial purposes as well.

If you had trouble understanding how to use google’s knapsack solver make sure to check out this page in their guide.

I’ve uploaded the Code from this example into the same GitHub repo as the multiprocessing for kids script. You can find it inside the ‘multiprocessing_for_kids_examples.py’ file as the last example (example 5).

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.