Solving the P-Median model

Wendy Jang, my former student at UTD and now Data Scientist for the Stanislaus County Sheriff’s Dept. writes in:

Hello Andy,

I have some questions about your posting in GitHub. Honestly, I am not good at Python at all. I was playing with your python code and trying to understand the work flow. Then, I can pretty much mimic what you did; however, I encountered some errors along the way.

I was able to follow through lines but then I am stuck on PMed function. Do you know what possibly caused this error? See the screenshot below. Please advise. Thanks!

Specifically, Wendy is asking about my patrol redistricting example with workload inequality constraints. Wendy’s problem here is specifically she likely does not have CPLEX installed. CPLEX is free for academics, but unfortunately costs a bit of money for anyone else (not sure if they have cheaper licensing for public sector, looks like currently about $200 a month).

This was a good opportunity to update some of my code, so now see the two .py files in the DataCreated folder, in particular has a nice class function. I have additionally added in functionality to create a map, and more importantly eliminate subtours (my solution can potentially return disconnected areas). I have also added the ability to use different solvers.

For this problem CPLEX just works much better (takes around 10 minutes), but I was able to get the SCIP solver to return the same solution in around 5 hours. GLPK did not return a solution even letting it churn out for over 12 hours. I tested CBC for shorter time periods, but that appears to be a no-go as well.

So just a brief overview, the libraries I will be using (check out the end of the post for setting up the conda environment):

import pickle
import pulp
import networkx
import pandas as pd
from datetime import datetime
import matplotlib.pyplot as plt
import geopandas as gpd

class pmed():
    Ar - list of areas in model
    Di - distance dictionary organized Di[a1][a2] gives the distance between areas a1 and a2
    Co - contiguity dictionary organized Co[a1] gives all the contiguous neighbors of a1 in a list
    Ca - dictionary of total number of calls, Ca[a1] gives calls in area a1
    Ta - integer number of areas to create
    In - float inequality constraint
    Th - float distance threshold to make a decision variables
    def __init__(self,Ar,Di,Co,Ca,Ta,In,Th):

I do not copy-paste the entire pmed class in the blog post. It is long, but is the rewrite of my model from the paper. Next, to load in the objects I need to fit the model (which were created/pickled in the file).

# Loading in data
data_loc = r'D:\Dropbox\Dropbox\PublicCode_Git\PatrolRedistrict\PatrolRedistrict\DataCreated\fin_obj.pkl'
areas, dist_dict, cont_dict, call_dict, nid_invmap, MergeData = pickle.load(open(data_loc,"rb"))

# shapefile for reporting areas
carr_report = gpd.read_file(r'D:\Dropbox\Dropbox\PublicCode_Git\PatrolRedistrict\PatrolRedistrict\DataOrig\ReportingAreas.shp')

And now we can create a pmed object, which has all the info necessary, and gives us a print out of the total number of decision variables and constraints.

# Creating pmed object
pmed12 = pmed(Ar=areas,Di=dist_dict,Co=cont_dict,Ca=call_dict,Ta=12,In=0.1,Th=10)

Writing the class this way allows the user to input their own solver, and you can see it prints out the available solvers on your current machine. So to pass in the SCIOPT solver you could do pmed12.solve(solver=pulp.SCIP_CMD()), which again for this problem does find the solution, but takes 5 hours. So here I still stick with CPLEX to just illustrate the new classes functionality:

pmed12.solve(solver=pulp.CPLEX(timeLimit=30*60,msg=True))     # takes around 10 minutes

This prints out a ton of information at the command line that you do not get if you run from within Jupyter notebooks. So for debugging that is much more useful. timeLimit is in seconds, but does not include the presolve phase I believe, so some of these solvers can just get stuck on the presolve forever.

We can look at the results in a nice map if you do have geopandas installed and pass it a geopandas data frame and the key to match it on:

pmed12.map_plot(carr_report, 'PDGrid')

This map shows the new areas and different colors with a thick border, and the source area as a dot with a white outline. So here you can see that we end up having one disconnected area – a subtour in optimization parlance. I have added some methods to deal with this though:

stres = pmed12.collect_subtours()

And you can see that for one of our areas it identified a subtour. I haven’t written this to automatically resolve for subtours due to the potential length of the solving time. So here we can see the subtours have 0 calls in them. You can assign those areas to wherever and it does not change the objective function. So that is what I did in the paper and showed how in the Jupyter notebook at the main page for the github project.

So if you are using a function that takes 5 hours, I would suggest you manually fix these subtours if there is an obvious to the human eye easy solution. But here with the shorter solve time with CPLEX I can rerun the algorithm with a warm start, and it runs even faster (under 5 minutes). The .collect_subtours() method adds subtour constraints into the pulp model, so you just need to redo the solve() method to eliminate that particular subtour (I do not know if this is guaranteed to converge though for all problems).

So here in this data eliminating that one subtour results in a solution with no subtours:

stres = pmed12.collect_subtours()

And we can replot the result, which shows it chooses the areas that you would have assigned by eye anyway:

pmed12.map_plot(carr_report, 'PDGrid')

So again for this problem if you have CPLEX I would recommend it (have not tried Gurobi). But at least for this particular dataset SCIOPT was able to solve the problem in 5 hours. So if you are a crime analyst or someone else without academic access to CPLEX you can install the sciopt solver and give that a go for your actual data.

Note that this is based off of results for 325 subareas. So if you have more subareas it will take a longer (and if you have fewer it may be quite a bit shorter).

Setting up the Conda environment

So once you have python installed, you can typically do something like:

pip install pulp

Or via conda:

conda install pyscipopt

I often have trouble though, especially when working with the python geospatial libraries, to install geopandas, fiona, etc. So here what I do is create a new conda environment:

conda create -n linprog
conda activate linprog
conda install -c conda-forge python=3 pip pandas numpy networkx scikit-learn dbfread geopandas glpk pyscipopt pulp

And then you can run the pmedian code above in this environment. I suppose I should turn this into a python package, and I see a bunch of folks anymore are doing docker images as well with their packages in complicated environments. This is actually not that bad, minus geopandas makes things a bit tricky.

New preprint: A Gentle Introduction to Creating Optimal Patrol Areas

I have a new preprint posted, A Gentle Introduction to Creating Optimal Patrol Areas. Below is the abstract:

Models to create optimal patrol areas have been in existence for over 45 years, but police departments still regularly construct patrol areas in an ad-hoc fashion. This essay walks the reader through formulating an integer linear program to create a set number of patrol areas that have near equal call load and that are contiguous using simple examples. Then the technique is illustrated using a case study in Carrollton, TX. Creating optimal patrol areas not only have the potential to improve efficiency in response times, but can also encourage hot spots policing. Applications of linear programming can additionally be applied to a wide variety of problems within criminal justice agencies, and this essay provides a gentle introduction to understanding the mathematical notation of linear programming.

In this paper I introduce a very simple integer linear program to create patrol beats, and then build up the complexity into the fuller p-median problem with additional constraints applicable to making patrol areas. The constraints on making the call load equal that I introduce in the paper are the only real novel aspect of the paper (although no doubt someone else has done something similar previously), but I was a bit frustrated reading other linear programs to create patrol areas. Most work was concentrated in operations research journals and in my opinion was totally inaccessible to a typical crime analyst. So I frame the paper as an introduction to integer linear programs, walk though some simplified examples, and then apply that full model in Carrollton. I also provide an extensive walkthrough using the python program PuLP so others can replicate the work with their own data in the supplementary materials.

Here is my end example map of the optimal patrol areas in Carrollton.


You can see that my areas are not as nice and convex, although most applications of correcting for that introduce multiple objective functions and/or non-linear functions, making the problem much harder to estimate in practice (which was part of my pet-peeve with prior publications – none provided code to estimate the models described, with the exception of some of the work of Kevin Curtin).

Part of the reason I tackled this problem is that it comes up all the time on the IACA list-serve — how to make new patrol areas. If you are an analyst interested in applying this in your jurisdiction and would like help always feel free to contact me.