Generating multiple solutions for linear programs

I had a question the other day on my TURF analysis about generating not just the single best solution for a linear programming problem, but multiple solutions. This is one arena I like the way people typically do genetic algorithms, in that they have a leaderboard with multiple top solutions. Which is nice for exploratory data analysis.

This example though it is pretty easy to amend the linear program to return exact solutions by generating lazy constraints. (And because the program is fast, you might as well do it this way as well to get an exact answer.) The way it works here, we have decision variables for products selected. You run the linear program, and then collect the k products selected, then add a constraint to the model in the form of:

Sum(Product_k) <= k-1

Then rerun the model with this constraint, get the solution, and then repeat. This constraint makes it so the group of decision variables selected cannot be replicated (and in the forbidden set). Python’s pulp makes this quite easy, and here is a function to estimate the TURF model I showed in that prior blog post, but generate this multiple leaderboard:

import pandas as pd
import pulp

# Function to get solution from turf model
def get_solution(prod_dec,prod_ind):
    '''
    prod_dec - decision variables for products
    prod_ind - indices for products
    '''
    picked = []
    for j in prod_ind:
        if prod_dec[j].varValue >= 0.99:
            picked.append(j)
    return picked

# Larger function to set up turf model 
# And generate multiple solutions
def turf(data,k,solutions):
    '''
    data - dataframe with 0/1 selected items
    k - integer for products selected
    solutions - integer for number of potential solutions to return
    '''
    # Indices for Dec variables
    Cust = data.index
    Prod = list(data)
    # Problem and Decision Variables
    P = pulp.LpProblem("TURF", pulp.LpMaximize)
    Cu = pulp.LpVariable.dicts("Customers Reached", [i for i in Cust], lowBound=0, upBound=1, cat=pulp.LpInteger)
    Pr = pulp.LpVariable.dicts("Products Selected", [j for j in Prod], lowBound=0, upBound=1, cat=pulp.LpInteger)
    # Objective Function (assumes weight = 1 for all rows)
    P += pulp.lpSum(Cu[i] for i in Cust)
    # Reached Constraint
    for i in Cust:
        #Get the products selected
        p_sel = data.loc[i] == 1
        sel_prod = list(p_sel.index[p_sel])
        #Set the constraint
        P += Cu[i] <= pulp.lpSum(Pr[j] for j in sel_prod)
    # Total number of products selected constraint
    P += pulp.lpSum(Pr[j] for j in Prod) == k
    # Solve the equation multiple times, add constraints
    res = []
    km1 = k - 1
    for _ in range(solutions):
        P.solve()
        pi = get_solution(Pr,Prod)
        # Lazy constraint
        P += pulp.lpSum(Pr[j] for j in pi) <= km1
        # Add in objective
        pi.append(pulp.value(P.objective))
        res.append(pi)
    # Turn results into nice dataframe
    cols = ['Prod' + str(i+1) for i in range(k)]
    res_df = pd.DataFrame(res, columns=cols+['Reach'])
    res_df['Reach'] = res_df['Reach'].astype(int)
    return res_df

And now we can simply read in our data, turn it into binary 0/1, and then generate the multiple solutions.

#This is simulated data from XLStat
#https://help.xlstat.com/s/article/turf-analysis-in-excel-tutorial?language=en_US
prod_data = pd.read_csv('demoTURF.csv',index_col=0)

#Replacing the likert scale data with 0/1
prod_bin = 1*(prod_data > 3)

# Get Top 10 solutions
top10 = turf(data=prod_bin,k=5,solutions=10)
print(top10)

Which generates the results:

>>> print(top10)
        Prod1       Prod2       Prod3       Prod4       Prod5  Reach
0   Product 4  Product 10  Product 15  Product 21  Product 25    129
1   Product 3  Product 15  Product 16  Product 25  Product 26    129
2   Product 4  Product 14  Product 15  Product 16  Product 26    129
3  Product 14  Product 15  Product 16  Product 23  Product 26    129
4   Product 3  Product 15  Product 21  Product 25  Product 26    129
5  Product 10  Product 15  Product 21  Product 23  Product 25    129
6   Product 4  Product 10  Product 15  Product 16  Product 27    129
7  Product 10  Product 15  Product 16  Product 23  Product 27    129
8   Product 3  Product 10  Product 15  Product 21  Product 25    128
9   Product 4  Product 10  Product 15  Product 21  Product 27    128

I haven’t checked too closely, but this looks to me offhand like the same top 8 solutions (with a reach of 129) as the XLStat website. The last two rows are different, likely because there are further ties.

For TURF analysis, you may actually consider a lazy constraint in the form of:

Product_k = 0 for each k

This is more restrictive, but if you have a very large pool of products, will ensure that each set of products selected are entirely different (so a unique set of 5 products for every row). Or you may consider a constraint in terms of customers reached instead of on products, so if solution 1 reaches 129, you filter those rows out and only count newly reached people in subsequent solutions. This ensures each row of the leaderboard above reaches different people than the prior rows.

11 Comments

by Andy Wheeler on October 24, 2021 • Permalink

Posted in data science, Python

Tagged linear programming, TURF

Posted by Andy Wheeler on October 24, 2021

https://andrewpwheeler.com/2021/10/24/generating-multiple-solutions-for-linear-programs/

11 Comments

arisild
/ November 1, 2021

Thanks again for doing this. I have verified the solution and it is correct. The only problem I am kind-a having is when I ask for more than 10 solutions and have about 10 products and 400 respondents it takes 20 mins to reach all 10 solutions. (i.e. it is somewhat slow). SPSS can give me all possible solutions. (1023 in my case, for seconds, so I guess it is having another approach at this) Anyway, I have this solution and no one really needs to dig through all possible combinations.

Reply
- apwheele
  / November 1, 2021
  
  This does no checking under the hood for the linear program itself. I am mostly going off the assumption that valid solutions exists. For that small of N respondents it should find a valid solutions almost immediately.
  
  Not sure how you get 1023 total unique solutions (10 choose 10 is only 1 potential solution). But if you just wanted to articulate all possible solutions I wouldn’t use the linear program. It is really nice when you have many potential product inputs, and you just want a smaller number of potential combinations. Imagine 100 products choose 5, that is over 72 million potential permutations. This linear programming solution will be better in that scenario.
  
  Reply
  - arisild
    / November 1, 2021
    
    I agree. It is really fast when I ask for the top solution. It does it instantly. I have 1023 all possible combinations as I have 10 products and each of these is either 0 or 1. (Would buy / would not buy) 2 to the power of 10 in 1024 but I do not count the combination where all 10 are 0. So I end up with 1023 possible combinations 🙂
tatrout
/ May 10, 2023

Thank you for this great example! Wondering if you can you provide an example of how you might add in a constraint like…Prod1 must always include 1 of a list of products. And Prod 2 can include any of an included list of products, but not be the same as Prod1?

Reply
- apwheele
  / May 10, 2023
  
  In this particular example, the final outputed Table Prod1, Prod2, …. are just placeholders (they don’t have any particular significance, the are always just in order).
  
  For a constraint that out of a set, at least one of those items needs to be picked:
  
  #####################################
  # Additional consraint pick at least one
  # can do this after total # of products constraints
  pick_one = [‘Product 1′,’Product 10′,’Product 20’]
  P += pulp.lpSum(Pr[j] for j in pick_one) >= 1
  #####################################
  
  I am not quite sure how to interpret your second example. You could limit the number of items potentially selected in this subset, so if you *only wanted to pick a single product* out of “pick_one”, it would be “== 1”, instead of greater than or equal to.
  
  The model itself makes it so you can’t pick an item more than once though. So you never need to worry about an item being picked more than once.
  
  Reply
pranav
/ June 26, 2023

what is the use of ?

# Reached Constraint
for i in Cust:
#Get the products selected
p_sel = data.loc[i] == 1
sel_prod = list(p_sel.index[p_sel])
#Set the constraint
P += Cu[i] <= pulp.lpSum(Pr[j] for j in sel_prod)

Reply
- apwheele
  / June 26, 2023
  
  I present the formulation of the linear programming model in a prior post, https://andrewpwheeler.com/2020/10/01/a-linear-programming-example-for-turf-analysis-in-python/. It will be easier to understand that first, and then understand this post on generating multiple solutions.
  
  That constraint is saying a customer is reached if it is connected to a selected product. It is how the model the optimized in the end — it maximizes the number of customers reached by selecting k products.
  
  Reply
pranav
/ June 26, 2023

When I print P
I see conflicting contraints

_C2: Customers_Reached_2 <= 0
0 <= Customers_Reached_2 <= 1 Integer

Reply
- apwheele
  / June 26, 2023
  
  Customers_Reached_2 = 0 is not conflicting. This will occur if you have a customer who doesn’t like anything in this problem, so they will never be reached.
  
  Please don’t ask a series of coding questions in comments, feel free to get in touch if you would like a quote to help your company with coding. See https://andrewpwheeler.com/consulting/
  
  Reply
  - pranav
    / June 26, 2023
    
    ok – thanks
pranav
/ June 26, 2023

Thanks for the code and quick reply
I read the link u gave

I see the loop is responsible for creating constraints like given below

_C1: Customers_Reached_1 – Products_Selected_ChocChipCookie
– Products_Selected_FriedChicken – Products_Selected_OnionSoup
– Products_Selected_Peas <= 0

When I print P (var used in the code)
I see conflicting contraints

_C2: Customers_Reached_2 <= 0
0 <= Customers_Reached_2 <= 1 Integer

Reply

Andrew Wheeler

Generating multiple solutions for linear programs

11 Comments

arisild

apwheele

arisild

tatrout

apwheele

pranav

apwheele

pranav

apwheele

pranav

pranav

Leave a comment Cancel reply

Recent Posts

Categories

Site RSS Feeds

Follow Blog via Email

Top Posts & Pages

Stack Exchange

Andrew Wheeler

Generating multiple solutions for linear programs

Share this:

Related

11 Comments

arisild

apwheele

arisild

tatrout

apwheele

pranav

apwheele

pranav

apwheele

pranav

pranav

Leave a comment Cancel reply

Recent Posts

Categories

Site RSS Feeds

Follow Blog via Email

Top Posts & Pages

Stack Exchange