Estimating Criminal Lifetime Value

At work I am currently working on estimating/forecasting healthcare spending. Similar to work I have done on forecasting person level crime risks (Wheeler et al., 2019), I build the predictive model dataset like this:

CrimeYear2020 PriorCrimeA PriorCrimeB
     0              2          3
     1              5          0
     0              0          0

etc. So I flatten people to a single row, and as covariates include prior cumulative crime histories. Most people do this similarly in the healthcare setting, so it looks like:

SpendingYear2020 PriorComorbidA PriorComorbidB
     3000              1          2
      500              3          0
    10000              0          0

Or sometimes people do a longitudinal dataset, where it is a spending*year*person panel (see Lauffenburger et al., 2020 for an example). I find this approach annoying for a few reasons though. One, it requires arbitrary temporal binning, which is somewhat problematic in these transaction level databases. We are talking for chronic offenders a few crimes per year is quite high, and ditto in the healthcare setting a few procedures a year can be very costly. So there is not much data to estimate the underlying function over time.

A second aspect I think is bad is that it doesn’t take into account the recency of the patterns. So the variables on the right hand side can be very old or very new. And with transaction level databases it is somewhat difficult to define how to estimate the lookback – do you consider it normalized by time? The VOID paper I mentioned we evaluated the long term scores, but the PD that does that chronic offender system has two scores – one a cumulative history and another a 90 day history to attempt to deal with that issue (again ad-hoc).

One approach to this issue from marketing research I have seen from very similar types of transactions databases are models to estimate Customer Lifetime Value (Fader et al. 2005). These models in the end generate a dataset that looks like this:

Person    RecentMonths  TotalEvents AveragePurchase
  A            3             5            $50
  B            1             2           $100
  C            9             8            $25

TotalEvents should be straightforward, RecentMonths just is a measure of the time since the last purchase, and then you have the average value of the purchases. And using just this data, estimates the probability of any future purchases, as well as projects the total value of the future average purchases. So here I use an example of this approach, using the Wolfgang Philly cohort public data. I am not going into the model more specifically (read some of the Bruce Hardie notes to get a flavor).

I have created some python code to follow along and apply these same customer lifetime value estimates to chronic offender data. Most examples of weighting crime harm apply it to spatial areas (Mitchell, 2019; Wheeler & Reuter, 2021), but you can apply it the same to chronic offender lists (Liggins et al., 2019).

Example Criminal Lifetime Value in Python

First, install the lifetimes python library – Cam’s documentation is excellent and makes the data manipulation/modelling quite simple.

Here I load in the transaction level crime data, e.g. it just have person A, 1/5/1960, 1000, where the 1000 is a crime seriousness index created by Wolfgang. Then the lifetimes package has some simple functions to turn our data into the frequency/recency format.

Note that for these models, you drop the first event in the series. To build a model to do train/test, I also split the data into evens before 1962, and use 1962 as the holdout test period.

import lifetimes as lt
import pandas as pd

# Just the columns from dataset II
# ID, SeriousScore, Date
df = pd.read_csv('PhilData.csv')
df['Date'] = pd.to_datetime(df['Date'])

# Creating the cumulative data
# Having holdout for one year in future
sd = lt.utils.calibration_and_holdout_data(df,'ID','Date',
              calibration_period_end='12-31-1961',
              observation_period_end='12-31-1962',
              freq='M',
              monetary_value_col='SeriousScore')

# Only keeping people with 2+ events in prior period
sd = sd[sd['frequency_cal'] > 0].copy()
sd.head()

Recency_cal is how many months since a prior crime (starting in 1/1/1962), frequency is the total number of events (minus 1, so number of repeat events technically), and the monetary_value_cal here is the average of the crime seriousness across all the events. The way this function works, the variables with the subscript _cal are in the training period, and _holdout are events in the 1962 period. For subsequent models I subset out people with at least 2 events total in the modeling.

Now we can fit a model to estimate the predicted number of future crimes a person will commit – so this does not take into account the seriousness of those crimes. The final groupby statement shows the predicted number of crimes vs those actually committed, broken down by number of crimes in the training time period. You can see the model is quite well calibrated over the entire sample.

# fit BG model
bgf = lt.BetaGeoFitter(penalizer_coef=0)
bgf.fit(sd['frequency_cal'],sd['recency_cal'],sd['T_cal'])

# look at fit of BG model
t = 12
sd['pred_events'] = bgf.conditional_expected_number_of_purchases_up_to_time(t, sd['frequency_cal'], sd['recency_cal'],sd['T_cal'])
sd.groupby('frequency_cal',as_index=False)[['frequency_holdout','pred_events']].sum() # reasonable

Now we can fit a model to estimate the average crime severity score for an individual as well. Then you can project a future cumulative score for an offender (here over a horizon of 1 year), by multiple the predicted number of events times the estimate of the average severity of the events, what I label as pv here:

# See conditional seriousness
sd['pred_ser'] = ggf.conditional_expected_average_profit(
                              sd['frequency_cal'],
                              sd['monetary_value_cal'])

sd['pv'] = sd['pred_ser']*sd['pred_events']
sd['cal_tot_val'] = sd['monetary_value_holdout']*sd['frequency_holdout']
# Not great correlation, around 0.2
vc = ['frequency_holdout','monetary_value_holdout','cal_tot_val','pred_events','pv']
sd[vc].corr()

The correlation between pv and the holdout cumulative crime severity cal_tot_val, is not great at 0.26. But lets look at this relative to the more typical approach analysts will do, simply rank prior offenders based on either total number of events or the crime seriousness:

# Lets look at this method via just ranking prior
# seriousness or frequency
sd['rank_freq'] = sd['frequency_cal'].rank(method='first',ascending=True)
sd['rank_seri'] = (sd['monetary_value_cal']*sd['frequency_cal']).rank(method='first',ascending=True)
vc += ['rank_freq','rank_seri']
sd[vc].corr()[vc[-3:]]

So we can see that pv outperforms ranking based on total crimes (rank_freq), or ranking based on the cumulative serious score for offenders (rank_seri) in terms of the correlation for either the total number of future events or the cumulative crime harm.

If we look at capture rates, e.g. pretend we highlight the top 50 chronic offenders for intervention, we can see the criminal lifetime value pv estimate outperforms either simple ranking scheme by quite a bit:

# Look at capture rates by ranking
topn = 50
res_summ = []
for v in vc[-3:]:
    rank = sd[v].rank(method='first',ascending=False)
    locv = sd[rank <= topn].copy()
    tot_crimes = locv['frequency_holdout'].sum()
    tot_ser = locv['cal_tot_val'].sum()
    res_summ.append( [v,tot_crimes,tot_ser,topn] )

res_df = pd.DataFrame(res_summ,columns=['Var','TotCrimes','TotSer','TotN'])
res_df

In terms of the seriousness projection, it is reasonably well calibrated over the entire sample, but has a very tiny variance – it basically just predicts the average crime serious score over the sample and assigns that as the prediction going forward:

# Cumulative stats over sample reasonable
# variance much too small
sd[['cal_tot_val','pv']].describe()

So what this means is that if say Chicago READI wanted to do estimates to reasonably justify the max dollar cost for their program (over a large number of individuals) that would be reasonable. And this is how most marketing people use this info, average benefits of retaining a customer.

For individual projections though, e.g. I think OffenderB will generate between [low,high] crime harm in the next year, this is not quite up to par. I am hoping though to pursue these models further, maybe either in a machine learning/regression framework to estimate the parameters directly, or to use mixture models in an equivalent way that marketers use “segmentation” to identify different types of customers. Knowing the different way people have formulated models though is very helpful to be able to build a machine learning framework, which you can incorporate covariates.

References

The limit on the cost efficiency of gun violence interventions

Imagine a scenario where someone came out with technology that would 100% reduce traffic fatalities at a particular curve in a road. But, installation and maintenance of the tech would cost $36 million dollars per 100 feet per year. It is unlikely anyone would invest in such technology – perhaps if you had a very short stretch of road that resulted in a fatality on average once a month it would be worth it. In that case, the tech would result in $36/12 = $3 million dollars to ‘save a life’.

There are unlikely any stretches of roads that have this high of fatality rate though (and this does not consider potential opportunity costs of less effective but cheaper other interventions). So if we had a location that has a fatality once a year, we are then paying $36 million dollars to save one life. We ultimately have upper limits on what society will pay to save a life.

Working on gun violence prevention is very similar. While gun violence has potentially very large costs to society, see Everytown’s estimates of $50k to a nonfatal shooting and $270k for a fatality, preventing that gun violence is another matter.

The translation to gun violence interventions from the traffic scenario is ‘we don’t have people at super high risk of gun violence’ and ‘the interventions are not going to be 100% effective’.

My motivation to write this post is the READI intervention in Chicago, which has a price tag of around $60k per participant per 20 months. What makes this program then ‘worth it’ is the probability of entrants being involved with gun violence multiplied by the efficacy of the program.

Based on other work I have done on predicting gun violence (Wheeler et al., 2019b), I guesstimate that any gun violence predictive instrument spread over a large number of individuals will have at best positive predictive probabilities of 10% over a year. 10% risk of being involved in gun violence is incredibly high, a typical person will have something more on the order of 0.01% to 0.001% risk of being involved with gun violence. So what this means is if you have a group of 100 high risk people, I would expect ~10 of them to be involved in a shooting (either as a victim or offender).

This lines up almost perfectly with READI, which in the control group had 10% shot over 20 months. So I think READI actual did a very good job of referring high risk individuals to the program. I don’t think they could do any better of a job in referring even higher risk people.

This though implies that even with 100% efficacy (i.e. anyone who is in READI goes to 0% risk of involvement in gun violence), you need to treat ~10 people to prevent ~1 shooting victimization. 100% efficacy is not realistic, so lets go with 50% efficacy (which would still be really good for a crime prevention program, and is probably way optimistic given the null results). Subsequently this implies you need to treat ~20 people to prevent ~1 shooting. This results in a price tag of $1.2 million to prevent 1 shooting victimization. If we only count the price of proximal gun violence (as per the Everytown estimates earlier), READI is already cost-inefficient from the get go – a 100% efficacy you would still need around 10 people (so $600k) to reduce a single shooting.

The Chicago Crime Lab uses estimates from Cohen & Piquero (2009) to say that READI has a return on investment of 3:1, so per $60k saves around $180. These however count reductions over the life-course, including person lost productivity, not just state/victim costs, which I think are likely to be quite optimistic for ROI that people care about. (Productivity estimates always seem suspect to me, models I have put into production in my career have generated over 8 digits of revenue, but if I did not do that work someone else would have. I am replaceable.)

I think it is likely one can identify other, more cost effective programs to reduce gun violence compared to READI. READI has several components, part of which is a caseworker, cognitive behavior therapy (CBT), and a jobs program. I do not know cost breakdowns for each, but it may be some parts drive up the price without much benefit over the others.

I am not as much on the CBT bandwagon as others (I think it looks quite a bit like the other pysch research that has come into question more recently), but I think caseworkers are a good idea. The police department I worked with on the VOID paper had caseworkers as part of their intervention, as did focused deterrence programs I have been involved with (Wheeler et al., 2019a). Wes Skogan even discussed how caseworkers were part of Chicago CEASEFIRE/outreach workers on Jerry Ratcliffe’s podcast. For those not familiar, case workers are just social workers assigned to these high risk individuals, and they often help their charges with things like getting an ID/Drivers License and applying to jobs. So just an intervention of caseworkers assigned to high risk people I think is called for.

You may think many of these high risk individuals are not amenable to treatment, but my experience is a non-trivial number of them are willing to sit down and try to straighten their lives out, and they need help to do that it. Those are people case workers are a good potential solution.

Although I am a proponent of hot spots policing as well, if we are just talking about shootings, I don’t think hot spots will have a good return on investment either (Drake et al., 2022). Only if you widen the net to other crimes do a think hot spots makes sense (Wheeler & Reuter, 2021). And maybe here I am being too harsh, if you reduce other criminal behavior READIs cost-benefit ratio likely looks better. But just considering gun violence, I think dropping $60k per person is never going to be worth it in realistic high gun violence risk populations.

References

New preprint: The accuracy of the violent offender identification directive (VOID) tool to predict future gun violence

I have a new preprint out, The accuracy of the violent offender identification directive (VOID) tool to predict future gun violence. This is work with Rob Worden and Jasmine Silver from our time at the Finn Institute. Below is the abstract:

We evaluate the Violent Offender Identification Directive (VOID) tool, a risk assessment instrument implemented within a police department to prospectively identify offenders likely to be involved with future gun violence. The tool uses a variety of static measures of prior criminal history that are readily available in police records management systems. The VOID tool is assessed for predictive accuracy by taking a historical sample and calculating scores for over 200,000 individuals known to the police at the end of 2012, and predicting 103 individuals involved with gun violence (either as a shooter or a victim) during 2013. Despite weights for the instrument being determined in an ad-hoc manner by crime analysts, the VOID tool does very well in predicting involvement with gun violence compared to an optimized logistic regression and generalized boosted models. We discuss theoretical reasons why such ad-hoc instruments are likely to perform well in identifying chronic offenders for all police departments.

There were just slightly over 100 violent gun offenders we were trying to pick out of over 200,000. The VOID tool did really well! Here is a graph comparing how many of those offenders VOID captured compared to a generalized boosted model (GBM), and two different logistic regression equations.

I have some of my thoughts in this article as to why a simple tool does just as well as more complicated regression and machine learning techniques, which is a common finding in recidivism studies as well. My elevator pitch for why that is is because most offenders are generalists, and for example you can basically swap prior arrests for robbery with prior arrests for motor vehicle theft — they both provide essentially the same signal for future potential criminality. See also discussion of this on Dan Simpson’s post on the Stat Modeling, Causal Inference and Social Science blog, which in turn makes me think the idea behind simple models can be readily applied to many decision points in the criminal justice field.

The simple takeaway from this for crime analysts making chronic offender lists is that don’t let the perfect be the enemy of the good. Analysts can likely create an ad-hoc weighting to prioritize chronic offenders and it will do quite well compared to fancier models.

I will be presenting this work at the ACJS conference in New Orleans on Saturday 2/17/18. It is a great session, with YongJei Lee, Jerry Ratcliffe, Bryanna Fox, and Stacy Sechrist (see session 384 in the ACJS program), so stop on by. If you want to catch up with me in New Orleans just send me an email. And as always if you have feedback on the draft I am all ears.