Gun Buy Back Programs Probably Don’t Work

When I was still a criminology professor, I remember one day while out getting groceries receiving a cold call from a police department interested in collaborating. They asked if I could provide evidence to support their cities plan to implement sex offender residence restrictions. While taking the call I was walking past a stand for the DARE program.

A bit of inside pool for my criminology friends, but for others these are programs that have clearly been shown to not be effective. Sex offender restrictions have no evidence they reduce crimes, and DARE has very good evidence it does not work (and some mild evidence it causes iatrogenic effects – i.e. causes increased drug use among teenagers exposed to the program).

This isn’t a critique of the PD who called me – academics just don’t do a great job of getting the word out. (And maybe we can’t effectively, maybe PDs need to have inhouse people do something like the American Society of Evidence Based Policing course.)

One of the programs that is similar in terms of being popular (but sparse on evidence supporting it) are gun buy back programs. Despite little evidence that they are effective, cities still continue to support these programs. Both Durham and Raleigh recently implemented buy backs for example.


What is a gun buy back program? Police departments encourage people to turn in guns – no questions asked – and they get back money/giftcards for the firearms (often in the range of $50 to $200). The logic behind such programs is that by turning in firearms it prevents them from being used in subsequent crimes (or suicides). No questions asked is to encourage individuals who have even used the guns in a criminal manner to not be deterred from turning in the weapons.

There are not any meta-analyses of these programs, but the closest thing to it, a multi-city study by Ferrazares et al. (2021), analyzing over 300 gun buy backs does not find macro, city level evidence of reduced gun crimes subsequent to buy back programs. While one can cherry pick individual studies that have some evidence of efficacy (Braga & Wintemute, 2013; Phillips et al., 2013), the way these programs are typically run in the US they are probably not effective at reducing gun crime.

Lets go back to first principles – if we 100% knew a gun would be used in the commission of a crime, then “buying” that gun would likely be worth it. (You could say an inelastic criminal will find or maybe even purchase a new gun with the reward, Mullin (2001), so that purchase does not prevent any future crimes, but I am ignoring that here.)

We do not know that for sure any gun will be used in the commission of a crime – but lets try to put some guesstimates on the probability that it will be used in a crime. There are actually more guns in the US than there are people. But lets go with a low end total of 300 million guns (Braga & Wintemute, 2013). There are around half a million crimes committed with a firearm each year (Planty et al., 2013). So that gives us 500,000/300,000,000 ~ 1/600. So I would guess if you randomly confiscated 600 guns in the US, you would prevent 1 firearm crime.

This has things that may underestimate (one gun can be involved in multiple crimes, still the expected number of crimes prevented is the same), and others that overestimate (more guns, fewer violent crimes, and replacement as mentioned earlier). But I think that this estimate is ballpark reasonable – so lets say 500-1000 guns to reduce 1 firearm crime. If we are giving out $200 gift cards per weapon returned, that means we need to drop $100k to $200k to prevent one firearm crime.

Note I am saying one firearm crime (not homicide), if we were talking about preventing one homicide with $200k, that is probably worth it. That is not a real great return on investment though for the more general firearm crimes, which have costs to society typically in the lower 5 digit range.

Gun buy backs have a few things going against them though even in this calculation. First, the guns returned are not a random sample of guns. They tend to be older, long guns, and often not working (Kuhn et al., 2021). It is very likely the probability those specific guns would be used in the commission of a crime is smaller than 1/600. Second is just the pure scope of the programs, they are often just around a few hundred firearms turned in for any particular city. This is just too small a number to reasonably tell whether they are effective (and what makes the Australian case so different).

Gun buy backs are popular, and plausibly may be “worth it”. (If encouraging working hand guns (Braga & Wintemute, 2013) and the dollar rewards are more like $25-$50 the program is more palatable in my mind in terms of at least potentially being worth it from a cost/benefit perspective.) But with the way most of these studies are conducted, they are hopeless to identify any meaningful macro level crime reductions (at the city level, would need to be more like 20 times larger in scope to notice reductions relative to typical background variation). So I think more proven strategies, such as focussed deterrence or focusing on chronic offenders, are likely better investments for cities/police departments to make instead of gun buy backs.

References

The limit on the cost efficiency of gun violence interventions

Imagine a scenario where someone came out with technology that would 100% reduce traffic fatalities at a particular curve in a road. But, installation and maintenance of the tech would cost $36 million dollars per 100 feet per year. It is unlikely anyone would invest in such technology – perhaps if you had a very short stretch of road that resulted in a fatality on average once a month it would be worth it. In that case, the tech would result in $36/12 = $3 million dollars to ‘save a life’.

There are unlikely any stretches of roads that have this high of fatality rate though (and this does not consider potential opportunity costs of less effective but cheaper other interventions). So if we had a location that has a fatality once a year, we are then paying $36 million dollars to save one life. We ultimately have upper limits on what society will pay to save a life.

Working on gun violence prevention is very similar. While gun violence has potentially very large costs to society, see Everytown’s estimates of $50k to a nonfatal shooting and $270k for a fatality, preventing that gun violence is another matter.

The translation to gun violence interventions from the traffic scenario is ‘we don’t have people at super high risk of gun violence’ and ‘the interventions are not going to be 100% effective’.

My motivation to write this post is the READI intervention in Chicago, which has a price tag of around $60k per participant per 20 months. What makes this program then ‘worth it’ is the probability of entrants being involved with gun violence multiplied by the efficacy of the program.

Based on other work I have done on predicting gun violence (Wheeler et al., 2019b), I guesstimate that any gun violence predictive instrument spread over a large number of individuals will have at best positive predictive probabilities of 10% over a year. 10% risk of being involved in gun violence is incredibly high, a typical person will have something more on the order of 0.01% to 0.001% risk of being involved with gun violence. So what this means is if you have a group of 100 high risk people, I would expect ~10 of them to be involved in a shooting (either as a victim or offender).

This lines up almost perfectly with READI, which in the control group had 10% shot over 20 months. So I think READI actual did a very good job of referring high risk individuals to the program. I don’t think they could do any better of a job in referring even higher risk people.

This though implies that even with 100% efficacy (i.e. anyone who is in READI goes to 0% risk of involvement in gun violence), you need to treat ~10 people to prevent ~1 shooting victimization. 100% efficacy is not realistic, so lets go with 50% efficacy (which would still be really good for a crime prevention program, and is probably way optimistic given the null results). Subsequently this implies you need to treat ~20 people to prevent ~1 shooting. This results in a price tag of $1.2 million to prevent 1 shooting victimization. If we only count the price of proximal gun violence (as per the Everytown estimates earlier), READI is already cost-inefficient from the get go – a 100% efficacy you would still need around 10 people (so $600k) to reduce a single shooting.

The Chicago Crime Lab uses estimates from Cohen & Piquero (2009) to say that READI has a return on investment of 3:1, so per $60k saves around $180. These however count reductions over the life-course, including person lost productivity, not just state/victim costs, which I think are likely to be quite optimistic for ROI that people care about. (Productivity estimates always seem suspect to me, models I have put into production in my career have generated over 8 digits of revenue, but if I did not do that work someone else would have. I am replaceable.)

I think it is likely one can identify other, more cost effective programs to reduce gun violence compared to READI. READI has several components, part of which is a caseworker, cognitive behavior therapy (CBT), and a jobs program. I do not know cost breakdowns for each, but it may be some parts drive up the price without much benefit over the others.

I am not as much on the CBT bandwagon as others (I think it looks quite a bit like the other pysch research that has come into question more recently), but I think caseworkers are a good idea. The police department I worked with on the VOID paper had caseworkers as part of their intervention, as did focused deterrence programs I have been involved with (Wheeler et al., 2019a). Wes Skogan even discussed how caseworkers were part of Chicago CEASEFIRE/outreach workers on Jerry Ratcliffe’s podcast. For those not familiar, case workers are just social workers assigned to these high risk individuals, and they often help their charges with things like getting an ID/Drivers License and applying to jobs. So just an intervention of caseworkers assigned to high risk people I think is called for.

You may think many of these high risk individuals are not amenable to treatment, but my experience is a non-trivial number of them are willing to sit down and try to straighten their lives out, and they need help to do that it. Those are people case workers are a good potential solution.

Although I am a proponent of hot spots policing as well, if we are just talking about shootings, I don’t think hot spots will have a good return on investment either (Drake et al., 2022). Only if you widen the net to other crimes do a think hot spots makes sense (Wheeler & Reuter, 2021). And maybe here I am being too harsh, if you reduce other criminal behavior READIs cost-benefit ratio likely looks better. But just considering gun violence, I think dropping $60k per person is never going to be worth it in realistic high gun violence risk populations.

References

Home buying and collective efficacy

With the recent large appreciation in home values, around 20% in the prior year, there have been an increase in private investors purchasing homes to rent out. Recent stories on this by Tyler Dukes and colleagues have collated open parcel data to identify the scope of these companies across all of North Carolina.

For bit of background, I tried to purchase a home in Plano, TX early 2018. Homes in our price range at that time were going in a single day and typically a few thousand over asking price.

Fast forward to early 2021, I am full remote data scientist instead of a professor, and kiddo is in online school. Even with the pay bump, housing competition was even worse in Plano at this point, so we knew we were likely going to have to move school districts to be able to purchase a home. So we decided to strike out, and ended up looking around Raleigh. Ended up quite quickly deciding to purchase a new build home in the suburb of Clayton (totally recommend our realtor, Ellen Pitts, her crew did quite a bit of work for us remotely).

I was lucky to get in then it appears – many of the new developments in the area are being heavily scooped up by these equity firms (and rent would be ~$600 more for my home than the mortgage). So I downloaded the public data Dukes put together, and loaded it into Excel to make a quick map of the properties.

For a NC state view, we have big clusters in Charlotte, Greensboro and Raleigh:

We can zoom in, and here is an overview of triangle area:

So you can see that inside the loop in Raleigh is pretty sparse, but many of the newer developments on the east side have many more of the private firm purchased houses. Charlotte is much more infilled with these private firms purchasing properties.

Zooming in even further to my town of Clayton, there is quite a bit of variance in the proportion of private vs residential purchases across various developments. My development is less than 50% of these purchases, several developments though appear almost 100% private purchased though. (This is not my home/neighborhood FYI.)


So what does this have to do with collective efficacy? Traditionally areas with higher home ownership have been associated with lower rates of crime. For not criminologists reading my blog, one of the most prominent criminological theories is that state actions only move the needle slightly on increasing/decreasing crime, people enforcing social norms is a bigger factor that explains high crime vs low crime areas. Places with people churning out more frequently – which occurs in areas with more renters – tend to have fewer people effectively keeping the peace. Because social scientists love to make up words, we call this concept collective efficacy.

Downloading and looking at this data, while I was mostly just interested in zooming into my neighborhood and seeing the infill of renters, sparked a criminological hypothesis: I expect neighborhoods with higher rates of private equity purchased housing in the long run to have higher rates of criminal behavior.

This hypothesis will be difficult to test in the wild. It is partially confounded with capital – those who buy their homes accumulate more wealth over time (again mortgage is quite a bit cheaper than rent, so even ignoring home value appreciation this is true). But the variance in the number of homes purchased by private equity firms in different areas makes me wonder if there is enough variation to do a reasonable research design to test my hypothesis, especially in the Charlotte area in say two or three years post a development being finished.

State dependence and trajectory models

I am currently reviewing a paper that uses group based trajectory models (GBTM) – and to start this isn’t a critique of the paper. GBTM I think is a very useful descriptive tool (how this paper I am reading mostly uses it), and can be helpful in some predictive contexts as well.

It is much more difficult though to attribute a causal framework to those trajectories though. First, my favorite paper on this topic is Distinguishing facts and artifacts in group-based modeling (Skardhamar, 2010). Torbjørn in that paper simulates random data (not dissimilar to what I do here, but a few more complicated factors), and shows that purely random data will still result in GBTM identifying trajectories. You can go the other way as well, I have a blog post where I simulate actual latent trajectories and GBTM recovers them, and another example where fit stats clearly show a random effects continuous model is better for a different simulation. In real data though we don’t know the true model like these simulations, so we can only be reasonably skeptical that the trajectories we uncover really represent latent classes.

In particular, the paper I was reading is looking at a binary outcome, so you just observe a bunch of 0s and 1s over the time period. So given the limited domain, it is difficult to uncover really wild looking curves. They ended up finding a set of curves that although meet all the good fit stats, pretty much cover the domain of possibilities – one starting high an linearly sloping down, one starting low and sloping up, one flat high, one flat low, and a single curved up slope.

So often in criminology we interpret these latent trajectories as population heterogeneity – people on different curves are fundamentally different (e.g. Moffitt’s taxonomy for offending trajectories). But there are other underlying data generating processes that can result in similar trajectories – especially over a limited domain of 0/1 data.

Here I figured the underlying data the paper I am reviewing is subject to very strong state dependence – your value at t-1 is very strongly correlated to t. So here I simulate data in R, and use the flexmix package to fit the latent trajectories.

First, I simulate 1500 people over 15 time points. I assign them an original probability estimate uniformly, then I generate 15 0/1 observations, updating that probability slightly over time with an auto-correlation of 0.9. (Simulations are based on the logit scale, but then backed out into 0/1s.)

# R Code simulating state dependence 0/1
# data
library("flexmix")
set.seed(10)

# logit and inverse function
logistic <- function(x){1/(1+exp(-x))}
logit <- function(x){log(x/(1-x))}

# generate uniform probabilities
n <- 1500
orig_prob <- runif(n)

# translate to logits
ol <- logit(orig_prob)
df <- data.frame(id=1:n,op=orig_prob,ol)

# generate auto-correlated data for n = 10
auto_corr <- 0.90
tp <- 15
vl <- paste0('v',1:tp)
vc <- var(ol) #baseline variance, keep equal

for (v in vl){
   # updated logit
   rsd <- sqrt(vc - vc*(auto_corr^2))
   ol <- ol*0.9 + rnorm(n,0,rsd)
   # observed outcome
   df[,v] <- rbinom(n,1,logistic(ol))
}

This generates the data in wide format, so I reshape to long format needed to fit the models using flexmix, and I by default choose 5 trajectories (same as chosen in the paper I am reviewing).

# reshape wide to long
ld <- reshape(df, idvar="id", direction="long",
        varying = list(vl))

# fit traj model for binary outcomes
mod <- flexmix(v1 ~ time + I(time^2) | id,
               model = FLXMRmultinom(),
               data=ld, k=5)

rm <- refit(mod)
summary(rm)

Now I create smooth curves over the period to plot. I am lazy here, the X axis should actually be 1-15 (I simulated 15 discrete time points).

tc <- summary(rm)@components[[1]]
pd <- data.frame(c=1,t=seq(1,tp,length.out=100))
pd$tsq <- pd$t^2

co <- matrix(-999,nrow=3,ncol=5)

for (i in 1:5){
  vlab <- paste0('pred',i)
  co[,i] <- tc[[i]][,1]
}

pred <- as.matrix(pd) %*% co

# plot on probability scale
matplot(logistic(pred))

These are quite similar to the curves for the paper I am reviewing, a consistent low probability (5), and a consistent high (1), a downward mostly linear slope (3), and an upward linear slope (2), and then one parabola concave down (4) (in the paper they had one concave up).

I figured the initial probability I assigned would highly impact the curve the model assigned a person to in this simulation. It ends up being more spread out than I expected though.

# distribution of classes vs original probability
ld$clus <- clusters(mod)
r1 <- ld[ld$time == 1,]
clustjit <- r1$clus + runif(n,-0.2,0.2)
plot(clustjit,r1$op) # more spread out than I thought

So there is some tendency for each trajectory to be correlated based on the original probability, but it isn’t that strong.

If we look at the average max posterior probabilities, they are OK minus the parabola group 4.

# average posterior probability
pp <- data.frame(posterior(mod))
ld$pp <- pp[cbind(1:(n*tp),ld$clus)]
r1 <- ld[ld$time == 1,]
aggregate(pp ~ clus, data = r1, mean)
#   clus        pp
# 1    1 0.8923801
# 2    2 0.7903938
# 3    3 0.7535281
# 4    4 0.6380946
# 5    5 0.8419221

The paper I am reviewing has much higher APPs for each group, so maybe they are really representing pop heterogeneity instead of continuous state dependence, it is just really hard with such observational data to tell the difference.

An update on the WaPo Officer Involved Shooting Stats

Marisa Iati interviewed me for a few clips in a recent update of the WaPo data on officer involved fatal police shootings. I’ve written in the past the data are very consistent with a Poisson process, and this continues to be true.

So first thing Marisa said was that shootings in 2021 are at 1055 (up from 1021 in 2020). Is this a significant increase? I said no off the cuff – I knew the average over the time period WaPo has been collecting data is around 1000 fatal shootings per year, so given a Poisson distribution mean=variance, we know the standard deviation of the series is close to sqrt(1000), which approximately equals 60. So anything 1000 plus/minus 60 (i.e. 940-1060) is within the typical range you would expect.

In every interview I do, I struggle to describe frequentist concepts to journalists (and this is no different). This is not a critique of Marisa, this paragraph is certainly not how I would write it down on paper, but likely was the jumble that came out of my mouth when I talked to her over the phone:

Despite setting a record, experts said the 2021 total was within expected bounds. Police have fatally shot roughly 1,000 people in each of the past seven years, ranging from 958 in 2016 to last year’s high. Mathematicians say this stability may be explained by Poisson’s random variable, a principle of probability theory that holds that the number of independent, uncommon events in a large population will remain fairly stagnant absent major societal changes.

So this sort of mixes up two concepts. One, the distribution of fatal officer shootings (a random variable) can be very well approximated via a Poisson process. Which I will show below still holds true with the newest data. Second, what does this say about potential hypotheses we have about things that we think might influence police behavior? I will come back to this at the end of the post,

R Analysis at the Daily Level

So my current ptools R package can do a simple analysis to show that this data is very consistent with a Poisson process. First, install the most recent version of the package via devtools, then you can read in the WaPo data directly via the Github URL:

library(devtools)
install_github("apwheele/ptools")
library(ptools)

url <- 'https://raw.githubusercontent.com/washingtonpost/data-police-shootings/master/fatal-police-shootings-data.csv'
oid <- read.csv(url,stringsAsFactors = F)

Looking at the yearly statistics (clipping off events recorded so far in 2022), you can see that they are hypothetically very close to a Poisson distribution with a mean/variance of 1000, although perhaps have a slow upward trend over the years.

# Year Stats
oid$year <- as.integer(substr(oid$date,1,4))
year_stats <- table(oid$year)
print(year_stats)
mean(year_stats[1:7]) # average of 1000 per year
var(year_stats[1:7])  # variance just under 1000

We can also look at the distribution at shorter time intervals, here per day. First I aggregat the data to the daily level (including 0 days), second I use my check_pois function to get the comparison distributions:

#Now aggregating to count per day
oid$date_val <- as.Date(oid$date)
date_range <- paste0(seq(as.Date('2015-01-01'),max(oid$date_val),by='days'))
day_counts <- as.data.frame(table(factor(oid$date,levels=date_range)))
head(day_counts)

pfit <- check_pois(day_counts$Freq, 0, 10, mean(day_counts$Freq))
print(pfit)

The way to read this, for a mean of 2.7 fatal OIS per day (and given this many days), we would expect 169.7 0 fatality days in the sample (PoisF), but we actually observed 179 0 fatality days, so a residual of 9.3 in the total count. The trailing rows show the same in percentage terms, so we expect 6.5% of the days in the sample to have 0 fatalities according to the Poisson distribution, and in the actual data we have 6.9%.

You can read the same for the rest of the rows, but it is mostly the same. It is only very slight deviations from the baseline Poisson expected Poisson distribution. This data is the closest I have ever seen to real life, social behavioral data to follow a Poisson process.

For comparison, lets compare to the NYC shootings data I have saved in the ptools package.

# Lets check against NYC Shootings
data(nyc_shoot)
date_range <- paste0(seq(as.Date('2006-01-01'),max(nyc_shoot$OCCUR_DATE),by='days'))
shoot_counts <- as.data.frame(table(factor(nyc_shoot$OCCUR_DATE,levels=date_range)))

sfit <- check_pois(shoot_counts$Freq,0,max(shoot_counts$Freq),mean(shoot_counts$Freq))
round(sfit,1)

This is much more typical of crime data I have analyzed over my career (in that it deviates from a Poisson process by quite a bit). The mean is 4.4 shootings per day, but the variance is over 13. There are many more 0 days than expected (433 observed vs 73 expected). And there are many more high crime shooting days than expected (tail of the distribution even cut off). For example there are 27 days with 18 shootings, whereas a Poisson process would only expect 0.1 days in a sample of this size.

My experience though is that when the data is overdispersed, a negative binomial distribution will fit quite well. (Many people default to a zero-inflated, like Paul Allison I think that is a mistake unless you have a structural reason for the excess zeroes you want to model.)

So here is an example of fitting a negative binomial to the shooting data:

# Lets fit a negative binomial and check out
library(fitdistrplus)
fnb <- fitdist(shoot_counts$Freq,"nbinom")
print(fnb$estimate)

sfit$nb <- 100*mapply(dnbinom, x=sfit$Int, size=fnb$estimate[1], mu=fnb$estimate[2])
round(sfit[,c('Prop','nb')],1) # Much better overall fit

And this compares the percentages. So you can see observed 7.5% 0 shooting days, and expected 8.6% according to this negative binomial distribution. Much closer than before. And the tails are fit much closer as well, for example, days with 18 shootings are expected 0.2% of the time, and are observed 0.4% of the time.

So What Inferences Can We Make?

In social sciences, we are rarely afforded the ability to falsify any particular hypothesis – or in more lay-terms we can’t really ever prove something to be false beyond a reasonable doubt. We can however show whether empirical data is consistent or inconsistent with any particular hypothesis. In terms of Fatal OIS, several ready hypotheses ones may be interested in are Does increased police scrutiny result in fewer OIS?, or Did the recent increase in violence increase OIS?.

While these two processes are certainly plausible, the data collected by WaPo are not consistent with either hypothesis. It is possible both mechanisms are operating at the same time, and so cancel each other out, to result in a very consistent 1000 Fatal OIS per year. A simpler explanation though is that the baseline rate has not changed over time (Occam’s razor).

Again though we are limited in our ability to falsify these particular hypotheses. For example, say there was a very small upward trend, on the order of something like +10 Fatal OIS per year. Given the underlying variance of Poisson variables, even with 7+ years of data it would be very difficult to identify that small of an upward trend. Andrew Gelman likens it to measuring the weight of a feather carried by a Kangaroo jumping on the scale.

So really we could only detect big changes that swing OIS by around 100 events per year I would say offhand. Anything smaller than that is likely very difficult to detect in this data. And so I think it is unlikely any of the recent widespread impacts on policing (BLM, Ferguson, Covid, increased violence rates, whatever) ultimately impacted fatal OIS in any substantive way on that order of magnitude (although they may have had tiny impacts at the margins).

Given that police departments are independent, this suggests the data on fatal OIS are likely independent as well (e.g. one fatal OIS does not cause more fatal OIS, nor the opposite one fatal OIS does not deter more fatal OIS). Because of the independence of police departments, I am not sure there is a real great way to have federal intervention to reduce the number of fatal OIS. I think individual police departments can increase oversight, and maybe state attorney general offices can be in a better place to use data driven approaches to oversee individual departments (like ProPublica did in New Jersey). I wouldn’t bet money though on large deviations from that fatal 1000 OIS anytime soon though.

Power and bias in logistic regression

Michael Sierra-Arévalo, Justin Nix and Bradley O’Guinn have a recent article about examining officer fatalities following gunshot assaults (Sierra-Arévalo, Nix, & O-Guinn). They do not find that distance to a Level 1/2 trauma ERs make a difference in the survival probabilities, which conflicts with prior work of mine with Gio Circo (Circo & Wheeler, 2021). Justin writes this as a potential explanation for the results:

The results of our multivariable analysis indicated that proximity to trauma care was not significantly associated with the odds of officers surviving a gunshot wound (see Table 2 on p. 9 of the post-print). On the one hand, this was somewhat surprising given that proximity to trauma care predicts survival of gunshot wounds among the general public.1 On the other hand, police have specialized equipment, such as ballistic vests and tourniquets, that reduce the severity of gunshot wounds or allow them to be treated immediately.

I think it is pretty common when results do not pan out, people turn to theoretical (or sociological) reasons why their hypothesis may be invalid. While these alternatives are often plausible, often equally plausible are simpler data based reasons. Here I was concerned about two factors, 1) power and 2) omitted severity of gun shot wound factors. I did a quick simulation in R to show power seems to be OK, but the omitted severity confounders may be more problematic in this design, although only bias the effect towards 0 (it would not cause the negative effect estimate MJB find).

Power In Logistic Regression

First, MJB’s sample size is just under 1,800 cases. You would think offhand this is plenty of power for whatever analysis right? Well, power just depends on the relevant effect size, a small effect and you need a bigger sample. My work with Gio found a linear effect in the logistic equation of 0.02 (per minute driving increases the logit). We had 5,500 observations, and our effect had a p-value just below 0.05, hence why a first thought was power. Also logistic regression is asymptotic, it is common to have small sample biases in situations even up to 1000 observations (Bergtold et al., 2018). So lets see in a simple example ignoring the other covariates:

# Some upfront work
logistic <- function(x){1/(1+exp(-x))}
set.seed(10)

# Scenario 1, no covariates omitted
n <- 2000; 
de <- 0.02
dist <- runif(n,5,200)
p <- logistic(-2.5 + de*dist)
y <- rbinom(n,1,p)

# Variance is small enough, seems reasonably powered
summary(glm(y ~ dist, family = "binomial"))

Here with 2000 cases, taking the intercept from MJB’s estimates and the 0.02 from my paper, we see 2000 observations is plenty enough well powered to detect that same 0.02 effect in mine and Gio’s paper. Note when doing post-hoc power analysis, you don’t take the observed effect (the -0.001 in Justin’s paper), but a hypothetical effect size you think is reasonable (Gelman, 2019), which I just take from mine and Gio’s paper. Essentially saying “Is Justin’s analysis well powered to detect an effect of the same size I found in the Philly data”.

One thing that helps MJB’s design here is more variance in the distance parameter, looking intra city the drive time distances are smaller, which will increase the standard error of the estimate. If we pretend to limit the distances to 30 minutes, this study is more on the fence as to being well enough powered (but meets the threshold in this single simulation):

# Limited distance makes the effect have a higher variance
n <- 2000; 
de <- 0.02
dist <- runif(n,1,30)
p <- logistic(-2.5 + de*dist)
y <- rbinom(n,1,p)

# Not as much variation in distance, less power
summary(glm(y ~ dist, family = "binomial"))

For a more serious set of analysis you would want to do these simulations multiple times and see the typical result (since they are stochastic), but this is good enough for me to say power is not an issue in this design. If people are planning on replications though, intra-city with only 1000 observations is really pushing it with this design though.

Omitted Confounders

One thing that is special about logistic regression, unlike linear regression, even if an omitted confounder is uncorrelated with the effect of interest, it can still bias the estimates (Mood, 2010). So even if you do a randomized experiment your effects could be biased if there is some large omitted effect from the regression equation. Several people interpret this as logistic regression is fucked, but like that linked Westfall article I think that is a bit of an over-reaction. Odds ratios are very tricky, but logistic regression as a method to estimate conditional means is not so bad.

In my paper with Gio, the largest effect on whether someone would survive was based on the location of the bullet wound. Drive time distances then only marginal pushed up/down that probability. Here are conditional mean estimates from our paper:

So you can see that being shot in the head, drive time can make an appreciable difference over these ranges, from ~45% to 55% probability of death. Even if the location of the wound is independent of drive time (which seems quite plausible, people don’t shoot at your legs because you are far away from a hospital), it can still be an issue with this research design. I take Justin’s comment about ballistic vests as reducing death as essentially taking the people in the middle of my graph (torso and multiple injuries) and pushing them into the purple line at the bottom (extremities). But people shot in the head are not impacted by the vests.

So lets see what happens to our effect estimates when we generate the data with the extremities and head effects (here I pulled the estimates all from my article, baseline reference is shot in head and negative effect is reduction in baseline probability when shot in extremity):

# Scenario 3, wound covariate omitted
dist <- runif(n,5,200)
ext_wound <- rbinom(n,1,0.8)
ef <- -4.8
pm <- logistic(0.2 + de*dist + ef*ext_wound)
ym <- rbinom(n,1,pm)

# Biased downward (but not negative)
summary(glm(ym ~ dist, family = "binomial"))

You can see here the effect estimate is biased downward by a decent margin (less than half the size of the true effect). If we estimate the correct equation, we are on the money in this simulation run:

What happens if we up the sample size? Does this bias go away? Unfortunately it does not, here is an example with 10,000 observations:

# Scenario 3, wound covariate ommitted larger sample
n2 <- 10000
dist <- runif(n2,5,200)
ext_wound <- rbinom(n2,1,0.8)
ef <- -4.8
pm <- logistic(0.2 + de*dist + ef*ext_wound)
ym <- rbinom(n2,1,pm)

# Still a problem
summary(glm(ym ~ dist, family = "binomial"))

So this omission is potentially a bigger deal – but not in the way Justin states in his conclusion. The quote earlier suggests the true effect is 0 due to vests, I am saying here the effect in MJB’s sample is biased towards 0 due to this large omitted confounder on the severity of the wound. These are both plausible, there is no way based just on MJB’s data to determine if one interpretation is right and the other is wrong.

This would not explain the negative effect estimate MJB finds though in their paper, it would only bias towards 0. To be fair, Jessica Beard critiqued mine and Gio’s paper in a similar vein (saying the police wound location data had errors), this would make our drive time estimates be biased towards 0 as well, so if that factor may be even larger than me and Gio even estimated.

Potential robustness checks here are to simply do a linear regression instead of logistic with the same data (my graph above shows a linear regression would be fine for the data if I included interaction effects with wound location). And another would be to look at the unconditional marginal distribution of distance vs probability of death. If that is highly non-linear, it is likely due to omitted confounders in the data (I suspect it may plateau as well, eg the first 30 minutes make a big difference, but after that it flattens out, you’ve either stabilized someone or they are gone at that point).

Policy?

In the case of intra-city public violence, the policy implication of drive times on survival are relevant when people are determining whether to keep open or close trauma centers. I did not publish this in my paper with Gio (you can see the estimates in the replication code), but we actually estimated counter-factual increased deaths by taking away facilities. Its marginal effect is around 10~20 homicides over the 4.5 years if you take away one of the facilities in Philadelphia. I don’t know if reducing 5 homicides per year is sufficient justification to keep a trauma facility open, but officer shootings are themselves much less frequent, and so the marginal effects are very unlikely to justify keeping a trauma facility open/closed by themselves.

You could technically figure out the optimal location to site a new trauma facility from mine and Gio’s paper, but probably a more reasonable response would be to site resources to get people to the ER faster. Philly already does scoop and run (Winter et al., 2021), where officers don’t wait for an ambulance. Another possibility though is to proactively locate ambulances to get to scenes faster (Hosler et al., 2019). Again though it just isn’t as relevant/feasible outside of major urban areas though to do that.

Often times social science authors do an analysis, and then in the policy section say things that are totally reasonable on their face, but are not supported by the empirical analysis. Here the suggestion that officers should increase their use of vests by MJB is totally reasonable, but nothing in their analysis supports that conclusion (ditto with the tourniquets statement). You would need to measure those incidents that had those factors, and see its effect on officer survival to make that inference. MJB could have made the opposite statement, since drive time doesn’t matter, maybe those things don’t make a difference in survival, and be equally supported by the analysis.

I suspect MJB’s interest in the analysis was simply to see if survival rates were potential causes of differential officer deaths across states (Sierra-Arévalo & Nix, 2020). Which is fine to look at by itself, even if it has no obviously direct policy implications. Talking back and forth with Justin before posting this, he did mention it was a bit of prodding from a reviewer to add in the policy implications. Which it goes for both (reviewers or original writers), I don’t think we should pad papers with policy recommendations (or ditto for theoretical musings) that aren’t directly supported by the empirical analysis we conduct.

References

  • Bergtold, J. S., Yeager, E. A., & Featherstone, A. M. (2018). Inferences from logistic regression models in the presence of small samples, rare events, nonlinearity, and multicollinearity with observational data. Journal of Applied Statistics, 45(3), 528-546.
  • Circo, G. M., & Wheeler, A. P. (2021). Trauma Center Drive Time Distances and Fatal Outcomes among Gunshot Wound Victims. Applied Spatial Analysis and Policy, 14(2), 379-393.
  • Gelman, A. (2019). Don’t calculate post-hoc power using observed estimate of effect size. Annals of Surgery, 269(1), e9-e10.
  • Hosler, R., Liu, X., Carter, J., & Saper, M. (2019). RaspBary: Hawkes Point Process Wasserstein Barycenters as a Service.
  • Mood, C. (2010). Logistic regression: Why we cannot do what we think we can do, and what we can do about it. European Sociological Review, 26(1), 67-82.
  • Sierra-Arévalo, M., & Nix, J. (2020). Gun victimization in the line of duty: Fatal and nonfatal firearm assaults on police officers in the United States, 2014–2019. Criminology & Public Policy, 19(3), 1041-1066.
  • Sierra-Arévalo, Michael, Justin Nix, & Bradley O’Guinn (2022). A National Analysis of Trauma Care Proximity and Firearm Assault Survival among U.S. Police. Forthcoming in Police Practice and Research. Post-print available at
  • Winter, E., Hynes, A. M., Shultz, K., Holena, D. N., Malhotra, N. R., & Cannon, J. W. (2021). Association of police transport with survival among patients with penetrating trauma in Philadelphia, Pennsylvania. JAMA network open, 4(1), e2034868-e2034868.

Paper retraction and exemplary behavior in Crim

Criminology researchers had a bad look going for them in the Stewart/Pickett debacle. But a recent exchange shows to me behavior we would all be better if we emulated; a critique of a meta analysis (by Kim Rossmo) and a voluntary retraction (by Wim Bernasco).

Exemplary behavior by both sides in this exchange. I am sure people find it irksome if you are on the receiving end, but Kim has over his career pursued response/critique pieces. And you can see in the retraction watch piece this is not easy work (basically as much work as writing an original meta analysis). This is important if science is to be self correcting, we need people to spend the time to make sure prior work was done correctly.

And from Wim’s side it shows much more humility than the average academic – which it is totally OK to admit ones faults/mistakes and move on. I have no doubt if Kim (or whomever) did a deep dive into my prior papers, he would find some mistakes and maybe it would be worth a retraction. It is ok, Wim will not be made to wear a dunce hat at the next ASC or anything like that. Criminology would be better off if we all were more like Kim and more like Wim.

One thing though is that I agree with Andrew Gelman, and that it is OK to do a blog post if you find errors before going to the author directly. Most academics don’t respond to critiques at all (or make superficial excuses). So if you find error in my work go ahead and blog it or write to the editor or whatever. I am guessing it worked out here because I imagine Kim and Wim have crossed paths before, and Wim actually answers his emails.

Note I think this is OK even. For example Data Colada made a dig at an author for not responding to a critique recently (see the author feedback at the bottom). If you critique my work I don’t think I’m obligated to respond. I will respond if I think it is worth my time – papers are not a contract to defend until death.


A second part I wanted to blog about was reviewing papers. You can see in my comment on Gelman’s blog, Kaiser Fung asks “What happened during the peer review process? They didn’t find any problems?”. And you can see in the original retraction watch, I think Kim did his due diligence in the original review. It was only after it was published and he more seriously pursued a replication analysis (which is beyond what is typically expected in peer review), did he find inconsistencies that clearly invalidated the meta analysis.

It is hard reviewing papers to find really widespread problems with an empirical analysis. Personally I do small checks, think of these as audits, that are not exhaustive but I often do find errors. For meta-analysis things I have done are pull out 1/2/3 studies, and see if I can replicate the point effects the authors report. One example I realized in doing this for example is that the Braga meta analysis of hot spots uses the largest point effect for some tables, which I think is probably a mistake and they should just pool all of the effects reported (although the variants I have reviewed have calculated them correctly).

Besides this for meta-analysis I do not have much advice. I have at times noted papers missing, but that was because I was just familiar with them, not because I replicated the authors search strategy. And I have advocated sharing data and code in reviews (which should clearly be done in meta-analysis), but pretty much no one does this.

For not meta analysis, one thing I do is if people have inline statistics (often things like F-tests or Chi-Square tests), I try to replicate these. Looking at regression coefficients it may be simpler to see a misprint, but I don’t have Chi-square committed to memory. I can’t remember a time I was actually able to replicate one of these, reviewed a paper one time with almost 100 inline stats like this and I couldn’t figure out a single one! It is actually somewhat common in crim articles for regression to online print the point effects and p-values, which is more difficult to check for inconsistencies without the standard errors. (You should IMO always publish standard errors, to allow readers to do their own tests by eye.)

Even if one did provide code/data, I don’t think I would spend the time to replicate the tables as a reviewer – it is just too much work. I think journals should hire data/fact checkers to do this (an actual argument for paid for journals to add real value). I only spend around 3-8 hours per review I think – this is not enough time for me to dig into code, putz with it to run on my local machine, and cross reference the results. That would be more like 2~4 days work in many cases I think. (And that is just using the original data, verifying the original data collection in a meta-analysis would be even more work.)

Precision in measures and policy relevance

Too busy to post much recently – will hopefully slow down a bit soon and publish some more technical posts, but just a quick opinion post for this Sunday. Reading a blog post by Callie Burt the other day – I won’t comment on the substantive critique of the Harden book she is discussing (since I have not read it), but this quote struck me:

precise point estimates are generally not of major interest to social scientists. Nearly all of our measures, including our outcome measures, are noisy, (contain error), even biased. In general, what we want to know is whether more of something (education, parental support) is associated with more (or less) of something else (income, education) that we care about, ideally with some theoretical orientation. Frequently the scale used to measure social influences is somewhat arbitrary anyway, such that the precise point estimate (e.g., weeks of schooling) associated with 1 point increase in the ‘social support scale’ is inherently vague.

I think Callie is right, precise point estimates often aren’t of much interest in general criminology. I think this perspective is quite bad though for our field as a whole in terms of scientific advancement. Most criminology work is imprecise (for various reasons), and because of this it has no hope to be policy relevant.

Lets go with Callie’s point about education is associated with income. Imagine we have a policy proposal that increases high school completion rates via allocating more money to public schools (the increased education), and we want to see its improvement on later life outcomes (like income). Whether a social program “is worth it” depends not only whether it is effective in increasing high school completion rates, but by how much and how much return on investment there is those later life outcomes we care about. Programs ultimately have costs; both in terms of direct costs as well as opportunity costs to fund some other intervention.

Here is another more crim example – I imagine most folks by now know that bootcamps are an ineffective alternative to incarceration for the usual recidivism outcomes (MacKenzie et al., 1995). But what folks may not realize is that bootcamps are often cheaper than prison (Kurlychek et al., 2011). So even if they do not reduce recidivism, they may still be worth it in a cost-benefit analysis. And I think that should be evaluated when you do meta-analyses of CJ programs.

Part of why I think economics is eating all of the social sciences lunch is not just because of the credibility revolution, but also because they do a better job of valuating costs and benefits for a wide variety of social programs. These cost estimates are often quite fuzzy, same as more general theoretical constructs Callie is talking about. But we often can place reasonable bounds to know if something is effective enough to be worth more investment.

There are a smattering of crim papers that break this mold though (and to be clear you can often make these same too fuzzy to be worthwhile critiques for many of my papers). For several examples in the policing realm Laura Huey and her Canadian crew have papers doing a deep dive into investigation time spent on cases (Mark et al., 2019). Another is Lisa Tompson and company have a detailed program evaluation of a stalking intervention (Tompson et al., 2021). And for a few papers that I think are very important are Priscilla Hunt’s work on general CJ costs for police and courts given a particular UCR crime (Hunt et al., 2017; 2019).

Those four papers are definitely not the norm in our field, but personally think are much more policy relevant than the vast majority of criminological research – properly estimating the costs is ultimately needed to justify any positive intervention.

References

  • Hunt, P., Anderson, J., & Saunders, J. (2017). The price of justice: New national and state-level estimates of the judicial and legal costs of crime to taxpayers. American Journal of Criminal Justice, 42(2), 231-254.
  • Hunt, P. E., Saunders, J., & Kilmer, B. (2019). Estimates of law enforcement costs by crime type for benefit-cost analyses. Journal of Benefit-Cost Analysis, 10(1), 95-123.
  • Kurlychek, M. C., Wheeler, A. P., Tinik, L. A., & Kempinen, C. A. (2011). How long after? A natural experiment assessing the impact of the length of aftercare service delivery on recidivism. Crime & Delinquency, 57(5), 778-800.
  • MacKenzie, D. L., Brame, R., McDowall, D., & Souryal, C. (1995). Boot camp prisons and recidivism in eight states. Criminology, 33(3), 327-358.
  • Tompson, L., Belur, J., & Jerath, K. (2021). A victim-centred cost–benefit analysis of a stalking prevention programme. Crime Science, 10(1), 1-11.
  • Mark, A., Whitford, A., & Huey, L. (2019). What does robbery really cost? An exploratory study into calculating costs and ‘hidden costs’ of policing opioid-related robbery offences. International Journal of Police Science & Management, 21(2), 116-129.

Musings on Project Organization, Books and Courses

Is there a type of procrastination via which people write lists of things? I have that condition.

I have been recently thinking about project organization. At work we have been using the Cookie Cutter Data Science project set up – and I really hate it. I have been thinking about this more recently, as I have taken over several other data scientists models at work. The Cookie Cutter Template is waaaay too complicated, and mixes logic of building python packages (e.g. setup.py, a LICENSE folder) with data science in production code (who makes their functions pip installable for a production pipeline?). Here is the Cookie Cutter directory structure (even slightly cut off):

Cookie cutter has way too many folders (data folder in source, and data folder itself), multiple nested folders (what is the difference between external data, interim, and raw data?, what is the difference between features and data in the src folder?) I can see cases for individual parts of these needed sometimes (e.g. an external data file defining lookups for ICD codes), but why start with 100 extra folders that you don’t need. I find this very difficult taking over other peoples projects in that I don’t know where there are things and where there are not (most of these folders are empty).

So I’ve reorganized some of my projects at work, and they now look like this:

├── README.md           <- High level overview of project + any special notes
├── requirements.txt    <- Default python libraries we often use (eg sklearn, sqlalchemy)
├                         + special instructions for conda environments in our VMs
├── .gitignore          <- ignore `models/*.pkl`, `*.csv`, etc.
├── /models             <- place to store trained and serialized models
├── /notebooks          <- I don't even use notebooks very often, more like a scratch/EDA folder
├── /reports            <- Powerpoint reports to business (using HMS template)
├── /src                <- Place to store functions

And then depending on the project, we either use secret environment variables, or have a YAML file that has database connection strings etc. (And that YAML is specified in .gitignore.)

And then over time in the root folder it will typically have shell scripts call whatever production pipeline or API we are building. All the function files in source is fine, although it can grow to more modules if you really want it to.

And this got me thinking about how to teach this program management stuff to new data scientists we are hiring, and if I was still a professor how I would structure a course to teach this type of stuff in a social science program.

Courses

So in my procrastination I made a generic syllabi for what this software developement course would look like, Software & Project Development For Social Scientists. It would have a class/week on using the command prompt, then a week on github, then a few weeks building a python library, then ditto for an R package. And along the way sprinkle in literate programming (notebooks and markdown and Latex), unit testing, and docker.

And here we could discuss how projects are organized. And social science students get exposed to way more stuff that is relevant in a typical data science role. I have over the years also dreamt up other data science related courses as well.

Stats Programming for CJ. This goes through the basics of data manipulation using statistical programming. I would likely have tutorials for R, python, SPSS, and Stata for this. My experience with students is that even if they have had multiple stats classes in grad school, if you ask them “take this incident dataset with dates, and prepare a weekly level file with counts of crimes per week” they don’t know how to do even that simple task (an aggregation). So students need an entry level data manipulation course.

Optimization for Criminal Justice (or alt title Operations Research and Machine Learning for CJ). This one is not as developed as some of my other courses, but I think I could make it work for a semester. I think learning linear programming is a really great skill not taught at all in any CJ program I am aware of. I have some small notes on machine learning in my Research Design class for PhD students, but that could be expanded out (week for decision trees/forests, week for boosting, week for neural networks, etc.).

And last, I have made syllabi for the one credit entry level course for undergrad students, and the equivalent course for the new PhD students, College Prep. These classes I had I don’t think did a very good job. My intro one at Bloomsburg for undergrad had a textbook lol! The only thing I remember about my PhD one was fear mongering over publications (which at that point I had no idea what was going on), and spending the last class with Julie Horney and David McDowell at whatever the place next to the Washington Tavern in Albany was called (?Gingerbread?).

These are of course just in my head at the moment. I have posted my course materials over the years that I have delivered.

I have pitched to a few programs to hire me as a semi teaching professor (and still keep my private sector gig). This set up is not that uncommon in comp sci departments, but no CJ ones I think are interested. Even though I like musing about courses, adjunct pay is way too low to justify this investment, and should be paid to both develop the material as well as deliver the class.

Books

I have similarly made outlines for books over the years as well. One is Data Science for Crime Analysis with Python. I think there is an opening in the crime analysis market to advance to more professional coding, and so a python book would be good. But the market is overall tiny, my high end guesstimates are only around 800, so hard to justify the effort. (It would be mainly just a collection of my blog posts, but all in a nicer format for everyone to walk through/replicate.)

Another is a reader book, Handbook of Advanced Crime Analysis. That may not be needed though, as Cory Haberman and Liz Groff did a recent book that has quite a bit of overlap (can’t find it at the moment, maybe it is not out yet). Many current advanced techniques are scattered and sometimes difficult to replicate, I figured a reader that also includes code walkthroughs would help quite a few PhD students.

And again if I was still in the publishing game I would like to turn my Poisson course notes into a little Sage green book.

If I was still a professor, this would go hand in hand with developing courses. I know Uni’s do sometimes have grants to develop open source teaching materials, and these would probably best fit those molds. These aren’t going to generate revenue directly from sales.

So complaints and snippets on blog posts are all you are going to get for now from me.

Incoherence in policy preferences for gun violence reduction

One of the most well vetted criminal justice interventions at this point we have is hot spots policing. We have over 50 randomized control trials at this point, showing modest overall crime reductions on average (Braga & Weisburd, 2020). This of course is not perfect, I think Emily Owen sums it up the best in a recent poll of various academics on the issue of gun violence:

So when people argue that hot spots policing doesn’t show long term benefits, all I can do is agree. If in a world where we are choosing between doing hot spots vs doing nothing, I think it is wrong to choose the ultra risk adverse position of do nothing because you don’t think on average short term crime reductions of 10% in hot spots are worth it. But I cannot say it is a guaranteed outcome and it probably won’t magically reduce crime forever in that hot spot. Mea culpa.

The issue is most people making these risk adverse arguments against hot spots, whether academics or pundits or whoever, are not actually risk adverse or ultra conservative in accepting scientific evidence of the efficacy of criminal justice policies. This is shown when individuals pile on critiques of hot spots policing – which as I noted the critiques are often legitimate in and of themselves – but then take the position that ‘policy X is better than hotspots’. As I said hot spots basically is the most well vetted CJ intervention we have – you are in a tough pickle to explain why you think any other policy is likely to be a better investment. It can be made no doubt, but I haven’t seen a real principled cost benefit analysis to prefer another strategy over it to prevent crime.

One recent example of this is on the GritsForBreakfast blog, where Grits advocates for allocating more resources for detectives to prevent violence. This is an example of an incoherent internal position. I am aware of potential ways in which clearing more cases may reduce crimes, even published some myself on that subject (Wheeler et al., 2021). The evidence behind that link is much more shaky however overall (see Mohler et al. 2021 for a conflicting finding), and even Grits himself is very skeptical of general deterrence. So sure you can pile on critiques of hot spots, but putting the blinders on for your preferred policy just means you are an advocate, not following actual evidence.

To be clear, I am not saying more detective resources is a bad thing, nor do I think we should go out and hire a bunch more police to do hot spots (I am mostly advocating for doing more with the same resources). I will sum up my positions at the end of the post, but I am mostly sympathetic in reference to folks advocating for more oversight for police budgets, as well as that alternative to policing interventions should get their due as well. But in a not unrealistic zero sum scenario of ‘I can either allocate this position for a patrol officer vs a detective’ I am very skeptical Grits is actually objectively viewing the evidence to come to a principled conclusion for his recommendation, as opposed to ex ante justifying his pre-held opinion.

Unfortunately similarly incoherent positions are not all that uncommon, even among academics.

The CJ Expert Panel Opinions on Gun Violence

As I linked above, there was a recent survey of various academics on potential gun violence reduction strategies. I think these are no doubt good things, albeit not perfect, similar to CrimeSolutions.gov but are many more opinions on overall evidence bases but are more superficial.

This survey asked about three general strategies, and asked panelists to give Likert responses (strongly agree,agree,neutral,disagree,strongly disagree), as well as a 1-10 for how confident they were, whether those strategies if implemented would reduce gun violence. The three strategies were:

  • investing in police-led targeted enforcement directed at places and persons at high risk for gun crime (e.g.,“hot spot” policing; gang enforcement)
  • investing in police-led focused deterrence programs (clearly communicating “carrots and sticks” to local residents identified as high risk, followed by targeted surveillance and enforcement with some community-based support for those who desist from crime)
  • investing in purely community-led violence-interruption programs (community-based outreach workers try to mediate and prevent conflict, without police involvement)

The question explicitly stated you should take into account implementation in real life as well. Again people can as individuals have very pessimistic outlooks on any of these programs. It is however very difficult for me to understand a position where you ‘disagree’ with focused deterrence (FD) in the above answer and also ‘agree’ with violence interrupters (VI).

FD has a meta analysis of 20 some studies at this point (Braga et al., 2018), all are quasi-experimental (e.g. differences in differences comparing gang shootings vs non gang shootings, as well as some matched comparisons). So if you want to say – I think it is bunk because there are no good randomized control trials, I cannot argue with this. However there are much fewer studies for VI, Butts et al. (2015) have 5 (I imagine there are some more since then), and they are all quasi-experimental as well. So in this poll of 39 academics, how many agree with VI and disagree with FD?

We end up having 3. I show in that screen shot as well the crosstabulation with the hot spots (HS) question as well. It ends up being the same three people disagreed on HS/FD and agreed on VI:

I will come back to Makowski and Apel’s justification for their opinion in a bit. There is a free text field (although not everyone filled in, we have no responses from Harris here), and while I think this is pretty good evidence of having shifting evidentiary standards for their justification, the questions are quite fuzzy and people can of course weight their preferences differently. The venture capitalist approach would say we don’t have much evidence for VI, so maybe it is really good!

So again as a first blush, I checked to see how many people had opinions that I consider here coherent. You can say they all are bad, or you can agree with all the statements, but generally the opinions should be hs >= fd >= vi if one is going by the accumulated evidence in an unbiased manner. I checked how many coherent opinions there are in this survey according to this measure and it is the majority, 29/39 (those at the top of the list are more hawkish, saying strongly agree and agree more often):

Here are those I considered incoherent according to this measure:

Looking at the free text field for why people justified particular positions in this table, with the exception of Makowski and Apel, I actually don’t think they have all that unprincipled opinions (although how they mapped their responses to agree/disagree I don’t think is internally consistent). For example, Paolo Pinotti disagrees with lumping in hot spots with people based strategies:

Fair enough and I agree! People based strategies are much more tenuous. Chalfin et al. (2021) have a recent example of gang interdiction, but as far as I’m aware much of the lit on that (say coordinated RICO), is a pretty mixed bad. Pinotti then gives agree to FD and neutral to VI (with no text for either). Another person in this list is Priscilla Hunt, who mentions the heterogeneity of hot spots interventions:

I think this is pretty pessimistic, since the Braga meta analyses often break down by different intervention types and they mostly coalesce around the same effect estimates (about a 10% reduction in hot spots compared to control, albeit with a wide variance). But the question did ask about implementation. Fair enough, hot spots is more fuzzy a category than FD or VI.

Jennifer Doleac is an example where I don’t think they are mapping opinions consistently to what they say, although what they say is reasonable. Here is Doleac being skeptical for FD:

I think Doleac actually means this RCT by Hamilton et al. (2018) – arrests are not the right outcome though (more arrests probably mean the FD strategy is not working actually), so personally I take this study as non-informative as to whether FD reduces gun violence (although there is no issue to see if it has other spillovers on arrests). But Doleac’s opinion is still reasonable in that we have no RCT evidence. Here is Doleac also being skeptical of VI, but giving a neutral Likert response:

She mentions negative externalities for both (which is of course something people should be wary of when implementing these strategies). So for me to say this is incoherent is really sweating the small stuff – I think incorporating the text statement with these opinions are fine, although I believe a more internally consistent response would be neutral for both or disagree for both.

Jillian Carr gives examples of the variance of hot spots:

This is similar to Priscilla’s point, but I think that is partially an error. When you collect more rigorous studies over time, the effect sizes will often shrink (due to selection effects in the scholarly literature process that early successes are likely to have larger errors, Gelman et al. 2020). And you will have more variance as well and some studies with null effects. This is a good thing – no social science intervention is so full proof to always be 100% success (the lower bound is below 0 for any of these interventions). Offhand the variance of the FD meta analysis is smaller overall than hot spots, so Carr’s opinion of agree on FD can still be coherent, but for VI it is not:

If we are simply tallying when things do not work, we can find examples of that for VI (and FD) as well. So it is unclear why it is OK for FD/VI but not for HS to show some studies that don’t work.

There is an actual strategy I mentioned earlier where you might actually play the variance to suggest particular policies – we know hot spots (and now FD) have modest crime reducing effects on average. So you may say ‘I think we should do VI, because it may have a higher upside, we don’t know’. But that strikes me as a very generous interpretation of Carr’s comments here (which to be fair are only limited to only a few sentences). I think if you say ‘the variance of hot spots is high’ as a critique, you can’t hang your hat on VI and still be internally coherent. You are just swapping out a known variance for an unknown one.

Makowski and Apels Incoherence?

I have saved for last Michael Makowski and Robert Apel’s responses. I will start out by saying I don’t know all of the people in this sample, but the ones I do know are very intelligent people. You should generally listen to what they say, although I think they show some bias here in these responses. We all have biases, and I am sure you can trawl up examples of my opinions over time that are incoherent as well.

I do not know Michael Makowski, so I don’t mean to pick on him in particular here. I am sure you should listen to him over me for many opinions on many different topics. For example agree with his proposal to sever seized assets with police budgets. But just focusing on what he does say here (which good for him to actually say why he chose his opinions, he did not have to), for his opinion on hot spots:

So Makowski thinks policing is understaffed, but hot spots is a no go. OK, I am not sure what he expects those additional officers to do – answer calls for service and drive around randomly? I’d note hot spots can simultaneously be coordinated with the community directly – I know of no better examples of community policing than foot patrols (e.g. Haberman & Stiver, 2019 for an example). But the question was not that specific about that particular hot spot strategy, so that is not a critique of Makowski’s position.

We have so many meta analyses of hot spots now, that we also have meta analyses of displacement (Bowers et al., 2011), and the Braga meta analyses of direct effects have all included supplemental analyses of displacement as well. Good news! We actually often find evidence of diffusion of benefits in quite a few studies. Banking on secondary effects that are larger/nullify direct effects is a strange position to take, but I have seen others take it as well. The Grits blog I linked to earlier mentions that these studies only measure displacement in the immediate area. Tis true, these studies do not measure displacement in surrounding suburbs, nor displacement to the North Pole. Guess we will never know if hot spots reduce crime worldwide. Note however this applies to literally any intervention!

For Makowski’s similarly pessimistic take on FD:

So at least Makowski is laying his cards on the table – the question did ask about implementation, and here he is saying he doesn’t think police have the capability to implement FD. If you go in assuming police are incompetent than yeah no matter what intervention the police might do you would disagree they can reduce violence. This is true for any social policy. But Makowski thinks other orgs (not the police) are good to go – OK.

Again have a meta analysis showing that quite a few agencies can implement FD competently and subsequently reduce gun violence, which are no doubt a self selected set of agencies that are more competent compared to the average police department. I can’t disagree with if you interpret the question as you draw a random police department out of a hat, can they competently implement FD (most of these will be agencies with only a handful of officers in rural places who don’t have large gun violence problems). The confidence score is low from Makowski here though (4/10), so at least I think those two opinions are wrong but are for the most part are internally consistent with each other.

I’d note also as well, that although the question explicitly states FD is surveillance, I think that is a bit of a broad brush. FD is explicitly against this in some respects – Kennedy talks about in the meetings to tell group members the police don’t give a shit about minor infractions – they only care if a body drops. It is less surveillancy than things like CCTV or targeted gang takedowns for example (or maybe even HS). But it is right in the question, so a bit unfair to criticize someone for focusing on that.

Like I said if someone wants to be uber critical across the board you can’t really argue with that. My problem comes with Makowski’s opinion of VI:

VI is quite explicitly diverged from policing – it is a core part of the model. So when interrupters talk with current gang members, they can be assured the interrupters will not narc on them to police. The interrupters don’t work with the police at all. So all the stuff about complementary policing and procedural justice is just totally non-sequitur (and seems strange to say hot spots no, but boots on the ground are good).

So while Makowski is skeptical of HS/FD, he thinks some mechanism he just made up in his own mind (VI improving procedural justice for police) with no empirical evidence will reduce gun violence. This is the incoherent part. For those wondering, while I can think procedural justice is a good thing, thinking it will reduce crime has no empirical support (Nagin & Telep, 2020).

I’d note that while Makowski thinks police can’t competently implement FD, he makes no such qualms about other agencies implementing VI. I hate to be the bearer of bad news for folks, but VI programs quite often have issues as well. Baltimore’s program over the years have had well known cases of people selling drugs and still quite active in violence themselves. But I guess people are solely concerned about negative externalities from policing and just turn a blind eye to other non policing interventions.

Alright, so now onto Bob Apel. For a bit off topic – one of the books that got me interested in research/grad school was Levitt and Dubners Freakonomics. I had Robert Apel for research design class at SUNY Albany, and Bob’s class really formalized counterfactual logic that I encountered in that book for me. It was really what I would consider a transformative experience from student to researcher for me. That said, it is really hard for me to see a reasonable defense of Bob’s opinions here. We have a similar story we have seen before in the respondents for hot spots, there is high variance:

The specific to gun violence is potentially a red herring. The Braga meta analyses do breakdowns of effects on property vs violent crime, with violent typically having smaller but quite similar overall effect sizes (that includes more than just gun violence though). We do have studies specific to gun violence, Sherman et al. (1995) is actually one of the studies with the highest effects sizes in those meta analyses, but is of course one study. I disagree that the studies need to be specific to gun violence to be applicable, hot spots are likely to have effects on multiple crimes. But I think if you only count reduced shootings (and not violent crime as a whole), hot spots are tough, as even places with high numbers of shootings they are typically too small of N to justify a hot spot at a particular location. So again all by itself, I can see a reasonably skeptical person having this position, and Bob did give a low confidence score of 3.

And here we go for Bob’s opinion of FD:

Again, reasonably skeptical. I can buy that. Saying we need more evidence seems to me to be conflicting advice (maybe Bob saying it is worth trying to see if it works, just he disagrees it will work). The question does ask if violence will be reduced, not if it is worth trying. I think a neutral response would have been more consistent with what Bob said in the text field. But again if people want to be uber pessimistic I cannot argue so much against that in particular, and Bob also had a low confidence.

Again though we get to the opinion of VI:

And we see Bob does think VI will reduce violence, but not due to direct effects, but indirect effects of positive spillovers. Similar to Makowski these are mechanisms not empirically validated in any way – just made up. So we get critiques of sample selection for HS, and SUTVA for FD, but Bob agrees VI will reduce violence via agencies collecting rents from administering the program. Okey Dokey!

For the part about the interrupters being employed as a potential positive externality – again you can point to examples where the interrupters are still engaged in criminal activity. So a reasonably skeptical person may think VI could actually be worse in terms of such spillovers. Presumably a well run program would hire people who are basically no risk to engage in violence themselves, so banking on employing a dozen interrupters to reduce gun violence is silly, but OK. (It is a different program to give cash transfers to high risk people themselves.)

I’d note in a few of the cities I have worked/am familiar with, the Catholic orgs that have administered VI are not locality specific. So rents they extract from administering the program are not per se even funneled back into the specific community. But sure, maybe they do some other program that reduces gun violence in some other place. Kind of a nightmare for someone who is actually concerned about SUTVA. This also seems to me to be logic stemmed from Patrick Sharkey’s work on non-profits (Sharkey et al., 2017). If Bob was being equally of critical of that work as HS/FD, it is non-experimental and just one study. But I guess it is OK to ignore study weaknesses for non police interventions.

For both Bob and Makowski here I could concoct some sort of cost benefit analysis to justify these positions. If you think harms from policing are infinite, then sure VI makes sense and the others don’t. A more charitable way to put it would be Makowski and Bob have shown lexicographic preferences for non policing solutions over policing ones, no matter what the empirical evidence for those strategies. So be it – it isn’t opinions based on scientific evidence though, they are just word souping to justify their pre held positions on the topic.

What do I think?

God bless you if you are still reading this rant 4k words in. But I cannot end by just bagging on other peoples opinions without giving my own can I? If I were to answer this survey as is, I guess I would do HS/agree (confidence 6), FD/agree (confidence 5), VI/agree (confidence 3). Now if you changed the question to ‘you get even odds, how much money would you put on reduced violence if a random city with recent gun violence increases implemented this strategy’, I would put down $0.00 (the variance people talked about is real!) So maybe a more internally consistent position would be neutral across the board for these questions with a confidence of 0. I don’t know.

This isn’t the same as saying should a city invest in some of these policies. If you properly valuate all the issues with gun violence, I think each of these strategies are worth the attempt – none of them are guaranteed to work though (any big social problem is hard to fix)! In terms of hot spots and FD, I actually think these have a strong enough evidence base at this point to justify perpetual internal positions at PDs devoted to these functions. The same as police have special investigation units focused on drugs they could have officers devoted to implementing FD. Ditto for community police officers could be specifically devoted to COP/POP at hot spots of crime.

I also agree with the linked above editorial on VI – even given the problems with Safe Streets in Baltimore, it is still worth it to make the program better, not just toss it out.

Subsequently if the question were changed to, I am a mayor and have 500k burning a hole in my pocket, which one of these programs do I fund? Again I would highly encourage PDs to work with what they have already to implement HS, e.g. many predictive policing/hot spots interventions are nudge style just spend some extra time in this spot (e.g. Carter et al., 2021), and I already gave the example of how PDs invest already in different roles that would likely be better shifted to empirically vetted strategies. And FD is mostly labor costs as well (Burgdorf & Kilmer, 2015). So unlike what Makowski implies, these are not rocket science and necessitate no large capital investments – it is within the capabilities of police to competently execute these programs. So I think a totally reasonable response from that mayor is to tell the police to go suck on a lemon (you should do these things already), and fund VI. I think the question of right sizing police budgets and how police internally dole out responsibilities can be reasoned about separately.

Gosh some of my academic colleagues must wonder how I sleep at night, suggesting some policing can be effective and simultaneously think it is worth funding non police programs.

I have no particular opinion about who should run VI. VI is also quite cheap – I suspect admin/fringe costs are higher than the salaries for the interrupters. It is a dangerous thing we are asking these interrupters to do for not much money. Apel above presumes it should be a non-profit community org overseeing the interrupters – I see no issue if someone wanted to leverage current govt agencies to administer this (say the county dept of social services or public health). I actually think they should be proactive – Buffalo PD had a program where they did house visits to folks at high risk after a shooting. VI could do the same and be proactive and target those with the highest potential spillovers.

One of the things I am pretty frustrated with folks who are hyper critical of HS and FD is the potential for negative externalities. The NAS report on proactive policing lays out quite a few potential mechanisms via which negative externalities can occur (National Academies of Sciences, Engineering, and Medicine, 2018). It is evidence light however, and many studies which explicitly look for these negative externalities in conjunction with HS do not find them (Brantingham et al., 2018; Carter et al., 2021; Ratcliffe et al., 2015). I have published about how to weigh HS with relative contact with the CJ system (Wheeler, 2020). The folks in that big city now call it precision policing, and this is likely to greatly reduce absolute contact with the CJ system as well (Manski & Nagin, 2017).

People saying no hot spots because maybe bad things are intentionally conflating different types of policing interventions. Former widespread stop, question and frisk policies do not forever villify any type of proactive policing strategy. To reasonably justify any program you need to make assumptions that the program will be faithfully implemented. Hot spots won’t work if a PD just draws blobs on the map and does no coordinated strategy with that information. The same as VI won’t work if there is no oversight of interrupters.

For sure if you want to make the worst assumptions about police and the best assumptions about everyone else, you can say disagree with HS and agree with VI. Probably some of the opinions on that survey do the same in reverse – as I mention here I think the evidence for VI is plenty good enough to continue to invest and implement such programs. And all of these programs should monitor outcomes – both good and bad – at the onset. That is within the capability of crime analysis units and local govt to do this (Morgan et al., 2017).

I debated on closing the comments for this post. I will leave them open, but if any of the folks I critique here wish to respond I would prefer a more long formed response and I will publish it on my blog and/or link to your response. I don’t think the shorter comments are very productive, as you can see with my back and forth with Grits earlier produced no resolution.

References