An intro to linear programming for criminologists

Erik Alda made the point the other day on twitter that we are one of the few crim folks that do anything related to linear programming. I think it is crazy useful – much more so than say teaching myself some new regression technique or a programming language.

I don’t quite remember the motivation to learn it. I think I kept seeing repeated applications in papers I read, but was also totally baffled by it; I did not understand peoples notation for it at all. In retrospect that was because it is not statistics. You are optimizing a function by estimating some parameters (there is nothing stochastic about it, so there is no statistical inference). So it is more like finding the min/max of a function in calculus.

I think the best way to think about linear programming is in terms of decision analysis. We have a set of options among which we need to choose some action. So we make the choices that either maximize or minimize some objective, but also take into account constraints on the decisions we can make.

For social scientists here is an example that hopefully illustrates the difference between statistics and linear programming. Say we are interested in conducting a hot spots policing randomized experiment. So we define our top 20 crime hot spots in the city, and randomly assign 10 of them to receive the hot spots treatment. Linear programming is basically the obverse of this, given our 20 hot spot areas, which are the best 10 locations to choose for our intervention.

This problem as stated you might be thinking is trivial – just rank each of the 20 hot spots by the total number of crimes, and then choose the top 10. Where linear programming really helps though is if you have constraints on the final choices you make. Say you did not want to choose hot spots that are within 1 mile of each other (to spread out the hot spot interventions throughout the city). There is no simple way to sort your hot spots to obey that constraint, but you can encode that in the linear program and have the computer solve it quite easily.

There is no shortage of ways you could expand the complexity of this example hot spot decision analysis. Say you had two different types of hot spot treatments, and that they had different efficacy in different areas (one was good for property crime, and the other was better for violent crime). You might think of this as doing two separate decision analyses, where a constraint is that an area can only be assigned one of the two interventions.

Here I will provide some code examples in python using the pulp library to illustrate some more examples using data you can see in action, as well as different ways to think about linear programming problems in practice. (Technically the examples I give are all mixed integer linear programs, as the decision variables are binary 0/1.)

Formulating Objectives and Constraints

For this example I will be simulating data, but imagine a case you are an analyst for the IRS, and you want to determine which business tax returns to audit. We want to audit cases that both have a high probability of being fraudulent, as well as cases in which the total amount of the underpayment is large. (If you want a more typical criminology example, imagine assigning criminal cases to detectives, some cases have more costs, e.g. homicide vs burglary, and some cases have different probabilities of being solvable. This type of decision problem is very common in my experience – pretty much any time you have to make a binary choice, and those choices have variable costs/benefits.)

First I start off by simulating some data (the only libraries we need are numpy and pulp). So I simulate 1000 business tax returns, which have an estimate of the probability they are fraud, prob_fraud, and an estimate of the amount they underpayed, underpay_est.

import numpy as np
import pulp

###########################################################
#Simulate data for costs and probabilities

np.random.seed(10)
total_cases = 1000
underpay_est = np.random.uniform(1000,100000,total_cases)
prob_fraud = np.random.uniform(0,1,total_cases)
exp_return = prob_fraud*underpay_est

###########################################################

The objective we will be maximizing then is the expected return of auditing a tax return, exp_return, which is simply the multiplication of the probability of fraud multiplied by the amount of the underpayment. For a simple example, say we have a case where fraud is estimated to be 50%, and the estimate of the underpayment amount is $10,000. So our expected return for auditing that case is $5,000.

We need these two estimates external to our linear programming problem, and they themselves can be informed by predictive models (or simpler estimates, e.g. underpayment is proportional ~30% of deductions or something like that).

Now we have all we need to set up our linear programming problem. I am going to choose 100 cases out of these 1000 to audit. Hopefully that code is documented enough to see creating the decision variables (each tax return either gets a 1 if it is chosen, or a 0 if it is not), the original objective function that we are maximizing, and the results.

#Setting up the problem
case_index = list(range(total_cases))
tot_audit = 100

####################################
#Basic Problem
P = pulp.LpProblem("Choosing Cases to Audit", pulp.LpMaximize)
D = pulp.LpVariable.dicts("Decision Variable", [i for i in case_index], lowBound=0, upBound=1, cat=pulp.LpInteger)
#Objective Function
P += pulp.lpSum( D[i]*exp_return[i] for i in case_index)
#Constraint on total number of cases audited
P += pulp.lpSum( D[i] for i in case_index ) == tot_audit
#Solve the problem
P.solve()
#Get the decision variables
dec_list = [D[i].varValue for i in case_index]
dec_np = np.asarray(dec_list)
#Expected return
print( (dec_np * exp_return).sum() )
#Should be the same
print( pulp.value(P.objective) )
#Hit rate of cases
print( (dec_np * prob_fraud).sum()/tot_audit )
####################################

If you are following along in python, you will see that the total expected return is 7,287,915, and the estimated hit rate (or clearance rate) of the audits is around 0.88.

This example would be no different if we just chose the top 100 cases based on the expected return. Say that you thought the hit rate though of 88% was too low. You will choose cases that are big dollar amounts, but not necessarily a very high probability. So you may say I want my clearance rate to be over 90% overall. That is a simple constraint to add into the above model.

####################################
#Updating the problem to constrain on the hit rate
#Above a particular threshold
hit_rate = 0.9
P += pulp.lpSum( D[i]*prob_fraud[i] for i in case_index ) >= hit_rate*tot_audit
P.solve()
#Get the decision variables
dec_list = [D[i].varValue for i in case_index]
dec_np = np.asarray(dec_list)
#Expected return is slightly lower than before
print( pulp.value(P.objective) )
#Hit rate of cases
print( (dec_np * prob_fraud).sum()/tot_audit )
####################################

So now the total expected return is lower than without the constraint, 7,229,140 (so a reduction of about $60k), but our expected hit rate is just above 90%.

You may be thinking, “why not just eliminate cases with a probability of lower than 90%”, and then amongst those left over select the highest expected return. That meets your constraints, but has a lower expected return than this program! Think of this program as more tit-for-tat. High expected return / lower probability audits can still be selected with this model, but you need to balance them out with some high probability cases in response to tip the scales to meet the overall hit rate objective.

Trade-Offs and the Frontier Curve

So you may be thinking, ok the trade-off to get a 90% clearance was not too bad in terms of total extra taxes collected. So why not set the constraint to 95%. When you create constraints, they always lower the objective function (lower or equal to be more precise). The question then becomes quantifying that trade off.

You can subsequently vary the hit rate constraint, and see how much it changes the total expected return. Here is an example of doing that, each model only takes around a second to complete.

###########################################################
#Drawing the trade-off in hit rate vs expected return

hit_rate = np.linspace(0.85, 0.95, 30)
total_return = []

#Function to estimate the model
def const_hit_rate(er, prob, tot_n, hr):
    c_index = range(len(er))
    Prob = pulp.LpProblem("Choosing Cases to Audit", pulp.LpMaximize)
    Dec = pulp.LpVariable.dicts("Decision Variable", [i for i in c_index], lowBound=0, upBound=1, cat=pulp.LpInteger)
    Prob += pulp.lpSum( Dec[i]*er[i] for i in c_index)
    Prob += pulp.lpSum( Dec[i] for i in c_index ) == tot_n
    Prob += pulp.lpSum( Dec[i]*prob[i] for i in c_index ) >= hr*tot_n
    Prob.solve()
    dec_li = [Dec[i].varValue for i in c_index]
    dec_np = np.asarray(dec_li)
    return pulp.value(Prob.objective), dec_np

for h in hit_rate:
    print(f'Estimating hit rate {h}')
    obj, dec_res = const_hit_rate(exp_return, prob_fraud, 100, h)
    total_return.append(obj)

###########################################################

For this simulated data example, there end up being pretty severe trade-offs in the total return after you get above 91% hit rates, so from this it may not be worth the trade-off to get a much higher hit rate in practice. Just depends on how much you are willing to trade-off one for the other.

There are other ways to formulate this trade off (via bi-objective functions/soft-constraints, or weighted ranking schemes), but the blog post is long enough as is!

Other Potential Applications

So in terms of my work, I have examples of using linear programs to make spatial location decisions, encode fairness constraints into predictive policing, and allocate treatment assignment with network spillovers.

Erik Alda and Joseph Ferrandino have conducted frontier analysis of different criminal justice organizations, which is based on estimating the frontier curve above from data (instead of a pre-specified objective function).

That is about it for criminologists that I know of, but there are plenty of applications towards criminal justice topics using linear programming (or related concepts). It is most popular among operations researchers, of which Laura Albert is one of my favorites. (Criminal Justice as a field might not exist for Albert Blumstein, who was also a very influential operations researcher.)

One of the things that makes this different from more traditional quantitative work in the social sciences is that again it is not statistics – we are not testing hypotheses. The contribution is simply formulating the decision problem in a tractable way that can be solved, and the drawing of the trade-offs I showed above.

It is one of the ways I really like it though – unlike saying how your regression model can be used to inform decisions, this much more explicitly shows the utility of the results of those models in some practice.

Leave a comment

3 Comments

  1. An example of soft constraints in linear programming | Andrew Wheeler
  2. CrimRxiv, Alt-Journal Contributions, and Mike Maltz’s Retrospective | Andrew Wheeler
  3. A linear programming example for TURF analysis in python | Andrew Wheeler

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: