A introduction to knapsack problem

Knapsack problem is a very well known problem. The reason of my interest in this problem is that as it may sound a computer science problem, it actually arises from different scenarios as economics, educations especially education technology , asset portfolio and many other applied fields too. In this post, I intend to get to the details of this problem, both theoretical and practical codes and applications. So sit tight and read on.

The first scenario of the problem:

Imagine a burglar entering a house. The burglar, like any human being, has brought a definite sized bag and can carry up to a certain weight only. This burglar is allowed to steal anything, but each thing has two characters, weight and volume. Now, from this, there can be three problems.

(1) The items are binary in sense of stealing. i.e. each item can either be stolen or not. All items are present in 1 piece only. This is called 0–1 knapsack problem.

(2) The items are present in finite number. The burglar can steal a certain number of each item inclusive of 0 obviously. This is called bounded knapsack problem.

(3) The items are present in infinite number ( not a real scenario, but a mathematical scenario to consider) and therefore, the space of choice is infinite too. This is called unbounded knapsack problem.

Further generalization of this problem:

Now, as this problem is a pretty natural looking problem, one can obviously think that why to take discreet number of objects. That’s where the more general scenario comes. If you imagine the bounded problem, but lets say instead of books, the thief is hungry and steals from a milk cartoon, then easily the weight or volume of the milk stolen will become non-discreet and a real positive number. As nature does not follow discreet values in all things, so clearly, an important generalization of this problem is considering that the amount of objects stolen are real numbers, bounded by some number (in shade of bounded generalization problem) or unbounded.

Background requirements:

The knapsack problem is proved to be NP complete; i.e. there does not exist any polynomial timed solution for this problem. In general, dynamical programming is used to solve this or some useful approximations are implemented to ease out the problem. Now, one may wonder what dynamical programming is? but then that is a domain of programming itself and I will not go on to discuss about that but rather suggest to check on GeeksforGeeks to read on it.

Study on solutions:

So, where do we start? well our first stop is obviously the 0–1 knapsack problem. There will be painfully slow recursive and more intelligent dynamic programming solutions to this problem. Let’s dive in. Here I will mention only the algorithms only. First we only consider one feature problem, i.e. the weight only. Next we will proceed to two or more featured problem.

(1) Stupid solution: The stupid solution will be to calculate the value for each possible subset and then finding the maximum value for it. But we will obviously not do it.

(2) Recursion: The recursion will deal with the following idea. Consider a solution. Now, each of the items are either picked or not in this case. So, we can actually try from a side i.e. left or right and then do the following:

(i) Consider the maximum value possible within the n-1 objects left, excluding the n th object. Now again consider the maximum value possible with considering the following object added. Therefore we cover the two choices clearly.

(ii) Now, to put recursion, we can call the same function again and again in each of the above cases just by reducing the parameter to n-1. Thus it will reduce back to smaller case and being finite, the algorithm definitely terminates.

(iii) To add marginal solution, we put that if number of objects possible to take is 0 or the weight constraint is 0 then maximum value possible is 0.

Now, in this case, the problem which arrives is that one calculates the same values multiple times. And therefore recursive process on higher values tend to provide bad results.

Now, we provide a dynamical programming solution. The main properties of DP are (1) bottoms up and (2) memoization. Not going into the details, but to keep up the flow of the discussions, it needs to discuss is that

(1) bottoms up is storing values in array or some other types of data structure, starting from small values and growing up to larger values.

(2) Memoization is more like creating a memo; i.e. it starts from bigger values, just like a recursive function but stores the created values in a temporary array and there after, uses later on during the process and decreases the complexity.

In this case also, observe that the knapsack value problem is dependent on two parameters in two feature case. Therefore, in dynamical programming, we create first an empty two dimensional array for storing values of the function created stored for each of the integers started from 1 to w, the maximum weight and 1 to n, the number of items available. Then once the function value for some (n,w) pair is evaluated, it gets stored in the two dimensional array.

Clearly this can lead to solving at most nW times calculation of the function. Hence clearly the complexity is O(nW).

Now, we get down to the fractional knapsack problem. Fractional knapsack problem can be described as below:

It is a generalization of the 0–1 knapsack problem. In this case, a fraction(<1) of a object can also be picked. In this case, the solution becomes desperately easy. In this case, we will divide the values by weights. Then we will sort the items decreasing on the (value/weight). Then it is as clear as water, we will fill up the bag with the items from the start i.e. with highest par unit value and then proceed to next. In the last item, we will take fractional weight to fill the bag and that will be the best fit. This process is called Greedy method.

In the next post, we will discuss some implementation and non-orthodox methods.

Mastering SQL for Data Science: Top SQL Interview Questions by Experience Level

Introduction: SQL (Structured Query Language) is a cornerstone of data manipulation and querying in data science. SQL technical rounds are designed to assess a candidate’s ability to work with databases, retrieve, and manipulate data efficiently. This guide provides a comprehensive list of SQL interview questions segmented by experience level—beginner, intermediate, and experienced. For each level, you'll find key questions designed to evaluate the candidate’s proficiency in SQL and their ability to solve data-related problems. The difficulty increases as the experience level rises, and the final section will guide you on how to prepare effectively for these rounds. Beginner (0-2 Years of Experience) At this stage, candidates are expected to know the basics of SQL, common commands, and elementary data manipulation. What is SQL? Explain its importance in data science. Hint: Think about querying, relational databases, and data manipulation. What is the difference between WHERE ...

Machine learning and statistics with python

Search This Blog