Youtube interview with Manny San Pedro on Crime Analysis and Data Science

I recently did an interview with Manny San Pedro on his YouTube channel, All About Analysis. We discuss various data science projects I conducted while either working as an analyst, or in a researcher/collaborator capacity with different police departments:

Here is an annotated breakdown of the discussion, as well as links to various resources I discuss in the interview. This is not a replacement for listening to the video, but is an easier set of notes to link to more material on what particular item I am discussing.

0:00 – 1:40, Intro

For rundown of my career, went to do PhD in Albany (08-15). During that time period I worked as a crime analyst at Troy, NY, as well as a research analyst for my advisor (Rob Worden) at the Finn Institute. My research focused on quant projects with police departments (predictive modeling and operations research). In 2019 went to the private sector, and now work as an end-to-end data scientist in the healthcare sector working with insurance claims.

You can check out my academic and my data science CV on my about page.

I discuss the workshop I did at the IACA conference in 2017 on temporal analysis in Excel.

Long story short, don’t use percent change, use other metrics and line graphs.

7:30 – 13:10, Patrol Beat Optimization

I have the paper and code available to replicate my work with Carrollton PD on patrol beat optimization with workload equality constraints.

For analysts looking to teach themselves linear programming, I suggest Hillier’s book. I also give examples on linear programming on this blog.

It is different than statistical analysis, but I believe has as much applicability to crime analysis as your more typical statistical analysis.

13:10 – 14:15, Million Dollar Hotspots

There are hotspots of crime that are so concentrated, the expected labor cost reduction in having officers assigned full time likely offsets the position. E.g. if you spend a million dollars in labor addressing crime at that location, and having a full time officer reduces crime by 20%, the return on investment for hotspots breaks even with paying the officers salary.

I call these Million dollar hotspots.

14:15 – 28:25, Prioritizing individuals in a group violence intervention

Here I discuss my work on social network algorithms to prioritize individuals to spread the message in a focussed deterrence intervention. This is opposite how many people view “spreading” in a network, I identify something good I want to spread, and seed the network in a way to optimize that spread:

I also have a primer on SNA, which discusses how crime analysts typically define nodes and edges using administrative data.

Listen to the interview as I discuss more general advice – in SNA it matters what you want to accomplish in the end as to how you would define the network. So I discuss how you may want to define edges via victimization to prevent retaliatory violence (I think that would make sense for violence interupptors to be proactive for example).

I also give an example of how detective case allocation may make sense to base on SNA – detectives have background with an individuals network (e.g. have a rapport with a family based on prior cases worked).

28:25 – 33:15, Be proactive as an analyst and learn to code

Here Manny asked the question of how do analysts prevent their role being turned into more administrative role (just get requests and run simple reports). I think the solution to this (not just in crime analysis, but also being an analyst in the private sector) is to be proactive. You shouldn’t wait for someone to ask you for specific information, you need to be defining your own role and conducting analysis on your own.

He also asked about crime analysis being under-used in policing. I think being stronger at computer coding opens up so many opportunities that learning python, R, SQL, is the area I would like to see stronger skills across the industry. And this is a good career investment as it translates to private sector roles.

33:15 – 37:00, How ChatGPT can be used by crime analysts

I discuss how ChatGPT may be used by crime analysis to summarize qualitative incident data and help inform . (Check out this example by Andreas Varotsis for an example.)

To be clear, I think this is possible, but the tech I don’t think is quite up to that standard yet. Also do not submit LEO sensitive data to OpenAI!

Also always feel free to reach out if you want to nerd out on similar crime analysis questions!

Machine learning models and the market for lemons

The market for lemons is an economic concept in which buyers of a good cannot distinguish between quality products and poor products (the lemons). This lack of knowledge makes it so that people selling lemons can always underbid people with higher quality products. In the long run, all quality vendors are driven out, and only cheap lemon sellers remain.

I believe people selling predictive models (or machine learning models, or forecasting products, or artificial intelligence, to round out all the SEO terms) are highly susceptible to this. This occurs in markets in which the predictive models cannot be easily evaluated.

What reminded me of this is I recently saw a vendor saying they have the “most accurate” population health predictive models. This is a patently absurd assertion (even if you hosted a kaggle style competition, it would only apply to that kaggle dataset, not a more general claim to that particular institutions population). But the majority of buyers (different healthcare systems), likely have no way to evaluate my companies claims vs this vendors.

ChatGPT is another recent example. Although it can generate on its face “quality” answers, don’t use it to diagnose your illnesses. ChatGPT is very impressive at generating grammatically correct responses, so to a layman may appear to be high quality, but really it is very superficial in most domains (no different than using google searches to do anything complicated, which can be useful sometimes but is very superficial).

So what is the solution? From a consumer perspective, here is my advice. You should ask the vendor to do a demonstration on your own data. So you ask the vendor, “here is my data for 2019, can you forecast the 2020 data?”, or something along those lines where you provide a training set and a test set. Then you have the vendor generate predictions for test set, and you do the evaluation yourself to see if there predictions are worth the cost of the product.

This is a situation in which academic peer review has some value as well (as well as data competitions). You can see that the method a particular group used was validated by its peers, but ultimately the local tests on your own data will be needed. Even if my recidivism model is accurate for Georgia, it won’t necessarily generalize to your state.

If you are in a situation in which you do not have data to validate the results in the end, you need to rely on outside experts and understanding the methodology used to generate the estimates. A good example of this is people selling aggregate crime data (that literally make numbers up). I have slated a blog post about that in the near future to go into more detail, but in short there is no legitimate seller of second hand crime data in the US currently.

If you are interested in building or evaluating predictive models, please get in touch with my consulting services. While I say that markets for lemons can drive prices down, I still see quite a few ridiculous SaaS prices, like $900k for a black box, unevaluated early intervention system for police.

At least so far many of these firms are using the Joel Spolsky 6 figure sales approach for crappy products. My consulting firm can easily beat a six digit price tag, so the lemons have not driven me out yet.

The limit on the cost efficiency of gun violence interventions

Imagine a scenario where someone came out with technology that would 100% reduce traffic fatalities at a particular curve in a road. But, installation and maintenance of the tech would cost $36 million dollars per 100 feet per year. It is unlikely anyone would invest in such technology – perhaps if you had a very short stretch of road that resulted in a fatality on average once a month it would be worth it. In that case, the tech would result in $36/12 = $3 million dollars to ‘save a life’.

There are unlikely any stretches of roads that have this high of fatality rate though (and this does not consider potential opportunity costs of less effective but cheaper other interventions). So if we had a location that has a fatality once a year, we are then paying $36 million dollars to save one life. We ultimately have upper limits on what society will pay to save a life.

Working on gun violence prevention is very similar. While gun violence has potentially very large costs to society, see Everytown’s estimates of $50k to a nonfatal shooting and $270k for a fatality, preventing that gun violence is another matter.

The translation to gun violence interventions from the traffic scenario is ‘we don’t have people at super high risk of gun violence’ and ‘the interventions are not going to be 100% effective’.

My motivation to write this post is the READI intervention in Chicago, which has a price tag of around $60k per participant per 20 months. What makes this program then ‘worth it’ is the probability of entrants being involved with gun violence multiplied by the efficacy of the program.

Based on other work I have done on predicting gun violence (Wheeler et al., 2019b), I guesstimate that any gun violence predictive instrument spread over a large number of individuals will have at best positive predictive probabilities of 10% over a year. 10% risk of being involved in gun violence is incredibly high, a typical person will have something more on the order of 0.01% to 0.001% risk of being involved with gun violence. So what this means is if you have a group of 100 high risk people, I would expect ~10 of them to be involved in a shooting (either as a victim or offender).

This lines up almost perfectly with READI, which in the control group had 10% shot over 20 months. So I think READI actual did a very good job of referring high risk individuals to the program. I don’t think they could do any better of a job in referring even higher risk people.

This though implies that even with 100% efficacy (i.e. anyone who is in READI goes to 0% risk of involvement in gun violence), you need to treat ~10 people to prevent ~1 shooting victimization. 100% efficacy is not realistic, so lets go with 50% efficacy (which would still be really good for a crime prevention program, and is probably way optimistic given the null results). Subsequently this implies you need to treat ~20 people to prevent ~1 shooting. This results in a price tag of $1.2 million to prevent 1 shooting victimization. If we only count the price of proximal gun violence (as per the Everytown estimates earlier), READI is already cost-inefficient from the get go – a 100% efficacy you would still need around 10 people (so $600k) to reduce a single shooting.

The Chicago Crime Lab uses estimates from Cohen & Piquero (2009) to say that READI has a return on investment of 3:1, so per $60k saves around $180. These however count reductions over the life-course, including person lost productivity, not just state/victim costs, which I think are likely to be quite optimistic for ROI that people care about. (Productivity estimates always seem suspect to me, models I have put into production in my career have generated over 8 digits of revenue, but if I did not do that work someone else would have. I am replaceable.)

I think it is likely one can identify other, more cost effective programs to reduce gun violence compared to READI. READI has several components, part of which is a caseworker, cognitive behavior therapy (CBT), and a jobs program. I do not know cost breakdowns for each, but it may be some parts drive up the price without much benefit over the others.

I am not as much on the CBT bandwagon as others (I think it looks quite a bit like the other pysch research that has come into question more recently), but I think caseworkers are a good idea. The police department I worked with on the VOID paper had caseworkers as part of their intervention, as did focused deterrence programs I have been involved with (Wheeler et al., 2019a). Wes Skogan even discussed how caseworkers were part of Chicago CEASEFIRE/outreach workers on Jerry Ratcliffe’s podcast. For those not familiar, case workers are just social workers assigned to these high risk individuals, and they often help their charges with things like getting an ID/Drivers License and applying to jobs. So just an intervention of caseworkers assigned to high risk people I think is called for.

You may think many of these high risk individuals are not amenable to treatment, but my experience is a non-trivial number of them are willing to sit down and try to straighten their lives out, and they need help to do that it. Those are people case workers are a good potential solution.

Although I am a proponent of hot spots policing as well, if we are just talking about shootings, I don’t think hot spots will have a good return on investment either (Drake et al., 2022). Only if you widen the net to other crimes do a think hot spots makes sense (Wheeler & Reuter, 2021). And maybe here I am being too harsh, if you reduce other criminal behavior READIs cost-benefit ratio likely looks better. But just considering gun violence, I think dropping $60k per person is never going to be worth it in realistic high gun violence risk populations.

References

Recent Papers on Hot Spots of Crime in Dallas

So I have two different papers that were published recently. Both are on hot spots in Dallas, so might as well discuss them together.

For each I have posted the code to replicate the results (and that spreadsheet has links to preprints as well).

For each as a bit of a background as to the motivation for the projects, Dallas has had official hot spots, named TAAG (Target Area Action Grid). These were clearly larger than what would be considered best practice in identifying hot spots (they were more like entire neighborhoods). I realize ‘best practices’ is a bit wishy-washy, but the TAAG areas ended up covering around 20% of the city (a smidge over 65 square miles). Here is a map of the 2017 areas. There were 54 TAAG areas that covered, so on average each is alittle over 1 square mile.

Additionally I knew the Dallas police department was interested in purchasing the RTM software to do hot spots. And a separate group, the Dallas Crime Task Force was interested in using the software as well for non-police related interventions.

So I did these projects on my own (with my colleagues Wouter and Sydney of course). It wasn’t paid work for any of these groups (I asked DPD if they were interested, and had shared my results with folks from CPAL before that task force report came out, but nothing much came of it unfortunately). But my results for Dallas data are very likely to generalize to other places, so hopefully they will be helpful to others.

Machine Learning to Predict and Understand Hot Spots

So I see the appeal for folks who want to use RTM. It is well validated in both theory and practice, and Joel has made a nice software as a service app. But I knew going in that I could likely improve upon the predictions compared to RTM.

RTM tries to find a middle ground between prediction and causality (which isn’t a critique, it is sort of what we are all doing). RTM in the end spits out predictions that are like “Within 800 feet of a Subway Entrances is Risk Factor 1” and “The Density of Bars within 500 Feet is Risk Factor 2”. So it prefers simple models, that have prognostic value for PDS (or other agencies) to identify potential causal reasons for why that location is high crime. And subsequently helps to not only identify where hot spots are, but frame the potential interventions an agency may be interested it.

But this simplicity has a few drawbacks. One is that it is a global model, e.g. “800 feet within a subway entrance” applies to all subway entrances in the city. Most crime generators have a distribution that makes it so most subway entrances are relatively safe, only a few end up being high crime (for an example). Another is that it forces the way that different crime generators predict crime to be a series of step functions, e.g. “within 600 ft” or “a high density within 1000 ft”. In reality, most geographic processes follow a distance decay function. E.g. if you are looking at the relationship between check-cashing stores and street robbery, there are likely to be more very nearby the store, and it tails off in a gradual process the further away you get.

So I fit a more complicated random forest model that has neither of those limitations and can learn much more complicated functions, both in terms of distance to crime generators as well as spatially varying over the city. But because of that you don’t get the simple model interpretation – they are fundamentally conflicting goals. In terms of predictions either my machine learning model or a simpler comparison of using prior crime = future crime greatly outperforms RTM for several different predictive metrics.

So this shows the predictions are better for RTM no matter how you slice the hot spot areas, but again you lose out the prognostic value of RTM. To replace that, I show local interpretability scores for hot spots. I have an online map here for an example. If you click on one of the high crime predicted areas, it gives you a local breakdown of the different variables that contributes to the risk score.

So it is still more complicated than RTM, but gets you a local set of factors that potentially contribute to why places are hot spots. (It is still superficial in terms of causality, but PDs aren’t going to be able to get really well identified causal relationships for these types of predictions.)

Return on Investment for Hot Spots Policing

The second part of this is that Dallas is no doubt in a tight economic bind. And this was even before all the stuff about reforming police budgets. So policing academics have been saying PDs should shift many more resources from reactive to proactive policing for years. But how to make the argument that it is in police departments best interest to shift resources or invest in additional resources?

To do this I aimed to calculate a return on investment on investing in hot spots policing. Priscilla Hunt (from RAND) recently came up with labor cost estimates for crime specifically relevant for police departments. So if an aggravated assault happens PDs (in Texas) typically spend around $8k in labor costs to respond to the crime and investigate (it is $125k for a homicide). So based on this, you can say, if I can prevent 10 agg assaults, I then save $80k in labor costs. I use this logic to estimate a return on investment for PDs to do hot spots policing.

So first I generate hot spots, weighting for the costs of those crimes. Here is an interactive map to check them out, and below is a screenshot of the map.

I have an example of then calculating a return on investment for the hot spot area that captured the most crime. I get this estimate by transforming meta-analysis estimates of hot spots policing, estimating an average crime reduction, and then backing out how much labor costs that would save a police department. So in this hot spot, an ROI for hot spots policing (for 1.5 years) is $350k.

That return would justify at least one (probably more like two) full time officers just to be assigned to that specific hot spot. So if you actually hire more officers, it will be around net-zero in terms of labor costs. If you shift around current officers it should be a net gain in labor resources for the PD.

So most of the hot spots I identify in the study if you do this ROI calculation likely aren’t hot enough to justify hot spots policing from this ROI perspective (these would probably never justify intensive overtime that is typical of crackdown like interventions). But a few clearly are, and definitely should be the targets of some type of hot spot intervention.

Setting the threshold for bail decisions

I am at it again on the bail reform stuff. So we have critics of algorithms in-place of bail say that these bail reforms are letting too many people out (see Chicago and NYC). We also have folks on the other side saying such systems are punishing too many people (the Philly piece is more about probation, but the critique applies the same to pre-trial algorithms). So how can risk assessment algorithms both be too harsh and too lenient at the same time? The answer is that how harsh or lenient is not intrinsic to the algorithm itself, people can choose the threshold to be either.

At its most basic, risk assessment algorithms provide an estimate of future risk over some specific time horizon. For example, after scored for a pre-trial risk assessment, the algorithm may say an individual has a 30% probability of committing a crime within the next 3 months. Is 30% low risk, and so they should be released? Or is 30% high risk? The majority of folks do not go on to commit a new offense awaiting for trial, so 30% overall may be really high actually relative to most folks being assessed. So how exactly do you decide the threshold – at what % is it too high and they should be detained pre-trial?

For an individual cost-benefit analysis, you consider the costs of pre-trial detainment (the physical costs to detain a person for a specific set of time, as well as the negative externalities of detention for individuals) vs the cost to society of whether a future crime occurs. So for the 30% mark, say that average crime cost to society is $10,000 (this is ignoring that violent crimes cost more than property crime, in practice you can get individual probability estimates for each and combine them together). So the expected risk if we let this person go would then be $10,000*0.3 = $3,000. Whether the 3k risk is worth the pre-trial detention depends on how exactly we valuate detention. Say we have on average 90 days pre-trial detention, and the cost is something like $200 up front fixed costs plus $50 per day. We would then have a cost to detain this person as $200 + $50*90 = $4,700. From that individual perspective, it is not worth to detain that person pre-trial. This is a simplified example, e.g. this ignores any negative externality cost of detaining (e.g. lost wages for individuals detained), but setting the threshold risk for detaining or releasing on recognizance (ROR) should look something like this.

One of the things I like about this metric is that it places at the forefront how another aspect of bail reform – the time to trial – impacts costs. So if you reduce the time to trial, for those ROR’d you reduce the time horizon of risk. (Those only out 1 month have less time to recidivate than those that have to wait 6 months.) It also reduces the cost of detaining individuals pre-trial as well (costs less for both the state and the individual to only be in jail 1 month vs 6 months). It may be to make everyone happy, we should reduce the queue so we can get people to trial faster. (But in the short term risk assessment tools I think have a great upside to both increase public safety as well as reduce imprisonment costs.)

Evaluating Overall Crime Effects

Part of the reason folks are not happy with the current bail reforms is that they think it is increasing overall crime (see this example in Dallas). Here is an example graph though that folks doing these assessments should be providing, both in terms of up-front establishing the threshold for risk, as well as evaluating the efficacy of the risk assessment tool in practice.

I will use the data from my prior posts on false positives to illustrate. For the graph, you place on the X axis the cumulative number of people that are detained, and the Y axis you place the expected number of crimes that your model thinks will be committed by those folks who are ROR’d. So a simplified table may be

Person %crime
   A     0.5
   B     0.4
   C     0.3
   D     0.2
   E     0.1

If we let all of these folks go, we would expect they commit a total of 1.5 crimes (the sum of the percent predicted crime column) forecasted per our risk assessment algorithm. If we detained just person A, we have 1 in the detain column, and then a cumulative risk for the remaining folks of 1 (the sum of the predicted crime column for those that are remaining and are not detained). So then we go from the above table to this table by increasing the number of folks detained one-by-one.

Detained ExpectedCrimes
   0          1.5
   1          1.0
   2          0.6
   3          0.3
   4          0.1
   5          0

Here is what that graph looks like using the ProPublica data, so if we apply this strategy to the just under 3,000 cases (in my test set from the prior blog post). So you can see that if we decided to detain no-one, we would expect a total of 1,200 extra crimes. And this curve decreases over detaining everyone. So you may say I don’t want more than 200 crimes, which you would need to have detained 1,500 people in the prior example (and happens to result in a risk threshold of 36% in this sample).

Using historical data, this is good to establish a overall amount of crime risk you expect to occur from a particular set of bail reform decisions. To apply it to the future threshold decision making, you need to assume the past in terms of the total number of people arrested as well as the risk distribution stays the same (the latter I don’t think is a big issue, and the former you should be able to make reasonable projections if it is not constant). But this sets up the hypothetical, OK if we release this many more people ROR, we expect this many more crimes to occur as an upfront expectation of bail reform. It may be even if the individual cost-benefit calculation above says release, this would result in a total number of extra crimes folks deem unacceptable when applying that decision to everyone. So we can set the threshold to say we only want 10 extra crimes to happen because of bail reform, or 50 extra, or 100 extra, etc. This example again just aggregates all crime together, but you can do the same thing for every individual crime outcome you are interested in.

After the assessment is in place, this is actually necessary monitoring folks should do be doing anyway to ensure the model is working as expected. That is, you get an estimate of the number of crimes folks who are released you think would commit per your risk assessment model. If you predict more/less than what you see amongst those released, your model is not well calibrated and needs to be updated. In practice you can’t just estimate a predictive model once and then use it forever, you need to constantly monitor whether it is still working well in real life. (Actually I showed in my prior blog post that the model was not very good, so this is a pretty big over-estimate of the number of crimes in this sample.)

This should simultaneously quell complaints about bail reform is causing too many crimes. The lack of this information is causing folks to backlash against these predictive algorithms (although I suspect they do better than human judges, so I suspect they can reduce crime overall if used wisely). Offhand the recent crime increases in Philly, NYC, and Dallas I’m skeptical are tied to these bail reform efforts (they seem too big of increases or too noisy up/downs to reliably pin to just this), but maybe I am underestimating how many people they are letting out and the cumulative overall risk expected from the current models in those places. On the flip-side folks are right to question those Chicago stats, I suspect the risk algorithm should be saying that more crimes are occurring then they observed (ignoring what they should or should not be counting as recidivated).

I’d note these metrics I am suggesting here should be pretty banal to produce in practice. It is administrative data already collected and should be available in various electronic databases. So in practice I don’t know why this data is not readily available in various jurisdictions.

What about False Positives?

One thing you may notice is that in my prior cost-benefit analysis I didn’t take into consideration false positives. Although my prior post details how you would set this, there is a fundamental problem with monitoring false positives (those detained but who would not go on to recidivate) in practice. In practice, you can’t observe this value (you can only estimate it from historical data). Once you detain an individual, by construction they aren’t given the chance to recidivate. So you don’t get any on-policy feedback about false-positives, only false-negatives (folks who were released and went on to commit a crime pre-trial).

This I think puts a pretty big nail in the coffin of using false positive rates as a policy goal for bail reform in practice. Like I said earlier, you can’t just set a model once and expect it to work forever in the future. But, I actually don’t think that should be considered in the cost-benefit calculus anyway. So traditionally people tend to think of setting the threshold for predictive models like this confusion table, where different outcomes in the table have different costs to individuals and to society:

In this table those on the bottom row are those detained pre-trial. So in the hypothetical, you may say if we could someone know the false positives, we should calculate extra harm that pre-trial detainment causes to those individuals (lost wages, losing job, health harms, etc.). But for the folks who would have gone on and recidivated, we should just calculate the bare bones cost of detainment.

I think this is the wrong way to think about it though. Those harms are basically across the board for everyone – even if the person was likely to recidivate they still bear those subsequent harms of being incarcerated. Whether you think people deserve the harm does not make it go away.

The main reason I am harping on bail reform so much (folks who know my work will realize it is outside my specific research area) is that the current bail system is grossly inefficient and unequitable. These are folks that piling on monetary bail costs are the exact wrong way to ensure safety and to promote better outcomes for these folks.

It is a hard decision to make on who to detain vs who to let go. But pretending the current state of judges making these decisions on whatever personal whims they have and thinking we are better off than using a cost-benefit approach and algorithmic assessments is just sticking your head in the sand.

Plotting Predictive Crime Curves

Writing some notes on this has been in the bucket list for a bit, how to evaluate crime prediction models. A recent paper on knife homicides in London is a good use case scenario for motivation. In short, when you have continuous model predictions, there are a few different graphs I would typically like to see, in place of accuracy tables.

The linked paper does not provide data, so what I do for a similar illustration is grab the lower super output area crime stats from here, and use the 08-17 data to predict homicides in 18-Feb19. I’ve posted the SPSS code I used to do the data munging and graphs here — all the stats could be done in Excel though as well (just involves sorting, cumulative sums, and division). Note this is not quite a replication of the paper, as it includes all cases in the homicide/murder minor crime category, and not just knife crime. There ends up being a total of 147 homicides/murders from 2018 through Feb-2019, so the nature of the task is very similar though, predicting a pretty rare outcome among almost 5,000 lower super output areas (4,831 to be exact).

So the first plot I like to make goes like this. Use whatever metric you want based on historical data to rank your areas. So here I used assaults from 08-17. Sort the dataset in descending order based on your prediction. And then calculate the cumulative number of homicides. Then calculate two more columns; the total proportion of homicides your ranking captures given the total proportion of areas.

Easier to show than to say. So for reference your data might look something like below (pretend we have 100 homicides and 1000 areas for a simpler looking table):

 PriorAssault  CurrHom CumHom PropHom PropArea
 1000          1         1      1/100    1/1000
  987          0         1      1/100    2/1000
  962          2         4      4/100    3/1000
  920          1         5      5/100    4/1000
    .          .         .       .        .
    .          .         .       .        .
    .          .         .       .        .
    0          0       100    100/100 1000/1000

You would sort the PriorCrime column, and then calculate CumHom (Cumulative Homicides), PropHom (Proportion of All Homicides) and PropArea (Proportion of All Areas). Then you just plot the PropArea on the X axis, and the PropHom on the Y axis. Here is that plot using the London data.

Paul Ekblom suggests plotting the ROC curve, and I am too lazy now to show it, but it is very similar to the above graph. Basically you can do a weighted ROC curve (so predicting areas with more than 1 homicide get more weight in the graph). (See Mohler and Porter, 2018 for an academic reference to this point.)

Here is the weighted ROC curve that SPSS spits out, I’ve also superimposed the predictions generated via prior homicides. You can see that prior homicides as the predictor is very near the line of equality, suggesting prior homicides are no better than a coin-flip, whereas using all prior assaults does alittle better job, although not great. SPSS gives the area-under-the-curve stat at 0.66 with a standard error of 0.02.

Note that the prediction can be anything, it does not have to be prior crimes. It could be predictions from a regression model (like RTM), see this paper of mine for an example.

So while these do an OK job of showing the overall predictive ability of whatever metric — here they show using assaults are better than random, it isn’t real great evidence that hot spots are the go to strategy. Hot spots policing relies on very targeted enforcement of a small number of areas. The ROC curve shows the entire area. If you need to patrol 1,000 LSOA’s to effectively capture enough crimes to make it worth your while I wouldn’t call that hot spots policing anymore, it is too large.

So another graph you can do is to just plot the cumulative number of crimes you capture versus the total number of areas. Note this is based on the same information as before (using rankings based on assaults), just we are plotting whole numbers instead of proportions. But it drives home the point abit better that you need to go to quite a large number of areas to be able to capture a substantive number of homicides. Here I zoom in the plot to only show the first 800 areas.

So even though the overall curve shows better than random predictive ability, it is unclear to me if a rare homicide event is effectively concentrated enough to justify hot spots policing. Better than random predictions are not necessarily good enough.

A final metric worth making note of is the Predictive Accuracy Index (PAI). The PAI is often used in evaluating forecast accuracy, see some of the work of Spencer Chainey or Grant Drawve for some examples. The PAI is simply % Crime Captured/% Area, which we have already calculated in our prior graphs. So you want a value much higher than 1.

While those cited examples again use tables with simple cut-offs, you can make a graph like this to show the PAI metric under different numbers of areas, same as the above plots.

The saw-tooth ends up looking very much like a precision-recall curve, but I haven’t sat down and figured out the equivalence between the two as of yet. It is pretty noisy, but we might have two regimes based on this — target around 30 areas for a PAI of 3-5, or target 150 areas for a PAI of 3. PAI values that low are not something to brag to your grandma about though.

There are other stats like the predictive efficiency index (PAI vs the best possible PAI) and the recapture-rate index that you could do the same types of plots with. But I don’t want to put everyone to sleep.

CAN SEBP webcast on predictive policing

I was recently interviewed for a webcast by the Canadian Society of Evidence Based Policing on Predictive Policing.

I am not directly affiliated with any software vendor, so these are my opinions as an outsider, academic, and regular consultant for police departments on quantitative problems.

I do have some academic work on predictive policing applications that folks can peruse at the moment (listed below). The first is on evaluating the accuracy of a people predictions, the second is for addressing the problem of disproportionate minority contact in spatial predictive systems.

  • Wheeler, Andrew P., Robert E. Worden, and Jasmine R. Silver. (2018) The predictive accuracy of the Violent Offender Identification Directive (VOID) tool. Conditionally accepted at Criminal Justice and Behavior. Pre-print available here.
  • Wheeler, Andrew P. (2018) Allocating police resources while limiting racial inequality. Pre-print available here.

I have some more work on predictive policing applications in the pipeline, so just follow the blog or follow me on Twitter for updates about future work.

If police departments are interested in predictive policing applications and would like to ask me some questions, always feel free to get in contact. (My personal email is listed on my CV, my academic email is just Andrew.Wheeler at utdallas.edu.)

Most of my work consulting with police departments is ad-hoc (and much of it is pro bono), so if you think I can be of help always feel free to get in touch. Either for developing predictive applications or evaluating whether they are effective at achieving the outcomes you are interested in.