Preprint: Analysis of LED street light conversions on firearm crimes in Dallas, Texas

I have a new pre-print out, Analysis of LED street light conversions on firearm crimes in Dallas, Texas. This work was conducted in collaboration with the Child Poverty Action Lab, in reference to the Dallas Taskforce report. Instead of installing the new lights though at hotspots that CPAL suggested, Dallas stepped up conversion of street lamps to LED. Here is the temporal number of conversions over time:

And here is an aggregated quadrat map at quarter square mile grid cells (of the total number of LED conversions):

I use a diff-in-diff design (compare firearm crimes in daytime vs nighttime) to test whether the cumulative LED conversions led to reduced firearm crimes at nighttime. Overall I don’t find any compelling evidence that firearm crimes were reduced post LED installs (for a single effect or looking at spatial heterogeneity). This graph shows in the aggregate the DiD parallel trends assumption holds citywide (on the log scale), but the identification strategy really relies on the DiD assumption within each grid cell (any good advice for graphically showing that with noisy low count data for many units I am all ears!).

For now just wanted to share the pre-print. To publish in peer-review I would need to do a bunch more work to get the lit review where most CJ reviewers would want it. Also want to work on spatial covariance adjustments (similar to here, but for GLM models). Have some R code started for that, but needs much more work/testing before ready for primetime. (Although as I say in the pre-print, these should just make standard errors larger, they won’t impact the point estimates.)

So no guarantees that will be done in anytime in the near future. But no reason to not share the pre-print in the meantime.

NIJ grants funding gun violence research

Before I get into the nitty gritty of this post, a few notes. First, my next post in the Criminal Justician series on ASEBP is up, Violent Crime Interventions That are Worth it. I discuss more of the costs with implementing hot spots policing and focussed deterrence from the police departments perspective, and why they are clearly worthwhile investments for many police departments facing violence problems.

Second, I want to point folks to Jacob Kaplan’s blog, most recent post The Covid Kings of Salami. Some of Jacob’s thoughts I disagree with (I think smaller papers are OK, or that policing what is big enough is a waste of time). But if you like my posts on CJ topics, you should check out Jacob’s as well.

Now onto the title – a work in progress at the moment, but working with Scott Jacques on the openness of funded US criminology research. A short post in response to the oft mistaken idea that gun violence research is banned in the US. This is confused logic related to the Dickey act saying awards for gun control advocacy are banned as being federally funded by the CDC.

There are other agencies who fund gun violence research, in particular here I have scraped data from the National Institute of Justice (what I think is likely to be the largest funder in this area). Here is some python code showing some analyses of those awards.

So first, here you can download and see the size of the scraped dataset of NIJ awards:

import pandas as pd

# award data scraped, stay tuned for code for that!
award_url = 'https://dl.dropbox.com/s/eon4iokv0qpllgl/NIJ_Awards.csv?dl=0'
award_df = pd.read_csv(award_url)
print(award_df.shape)
print(award_df['award_text'][0])

So as a first blush check for awards related to gun violence, we can just search the text for the award narrative for relevant terms, here I just search for GUN VIOLENCE and FIREARM. A more thorough investigation would either code the 7k awards or the original solicitations, but I think this will likely be largely accurate (probably slightly more false positives than false negatives).

award_df['award_textU'] = award_df['award_text'].str.upper()

# Lets try to find any of these (other text?)
word_list = ['GUN VIOLENCE','FIREARM']

for w in word_list:
    award_df[w] = 1*(award_df['award_textU'].str.find(w) > -1)

award_df['AnyGun'] = 1*(award_df[word_list].sum(axis=1) > 0)
print(award_df['AnyGun'].sum())

So we can see that we have 1,082 awards related to gun violence (out of 7,215 listed by the NIJ). Lets check out the total funding for these awards:

# Lets figure out the total allocated
award_df['AwardVal'] = award_df['field-award-amount'].str.strip()
award_df['AwardVal'] = award_df['AwardVal'].replace('[\$,]', '', regex=True)
award_df['AwardVal'] = pd.to_numeric(award_df['AwardVal'])
award_df['Tot'] = 1

cf = ['Tot','AwardVal']
award_df.groupby('AnyGun',as_index=False)[cf].sum()

So we have in the listed awards (that go back to 1998 but appear more consistently filled in starting in 2002), over 300 million in grant awards related to gun violence/firearm research. Here we can see the breakdown over time.

# See awards over time
gun_awards = award_df[award_df['AnyGun'] == 1].copy()
gun_awards.groupby('field-fiscal-year',as_index=False)[cf].sum()

So the awards gifted by NIJ no doubt have a different flavor/orientation than if you had the same money from CDC. (There are other orgs though, like NSF, who I am sure have funded research projects relevant to gun violence over time as well.) Sometimes people distinguish between “public health” vs “criminal justice” approaches, but this is a pretty superficial dichotomy (plenty of people in public health have gotten NIJ awards).

So you certainly could argue the Dickey amendment changed the nature of gun violence research being conducted. And since the CDC budget is so massive, I suppose you could argue that it reduced the overall amounts of gun violence research being funded (although it is likely 0 sum, more for firearm research would have slashed some other area). You could use the same argument to say NIJ though is underfunded instead of advocating for the CDC to write the checks though.

But the stronger statement I often see stated, that firearm research is entirely banned in the US, is not even close to being correct.

New paper: An Open Source Replication of a Winning Recidivism Prediction Model

Our paper on the NIJ forecasting competition (Gio Circo is the first author), is now out online first in the International Journal of Offender Therapy and Comparative Criminology (Circo & Wheeler, 2022). (Eventually it will be in special issue on replications and open science organized by Chad Posick, Michael Rocque, and Eric Connolly.)

We ended up doing the same type of biasing as did Mohler and Porter (2022) to ensure fairness constraints. Essentially we biased results to say no one was high risk, and this resulted in “fair” predictions. With fairness constraints or penalities you sometimes have to be careful what you wish for. And because not enough students signed up, me and Gio had more winnings distributed to the fairness competition (although we did quite well in round 2 competition even with the biasing).

So while that paper is locked down, we have the NIJ tech paper on CrimRXiv, and our ugly code on github. But you can always email for a copy of the actual published paper as well.

Of course since not an academic anymore, I am not uber focused on potential future work. I would like to learn more about survival type machine learning forecasts and apply it to recidivism data (instead of doing discrete 1,2,3 year predictions). But my experience is the machine learning models need very large datasets, even the 20k rows here are on the fringe where regression are close to equivalent to non-linear and tree based models.

Another potential application is simple models. Cynthia Rudin has quite a bit of recent work on interpretable trees for this (e.g. Liu et al. 2022), and my linked post has examples for simple regression weights. I suspect the simple regression weights will work reasonably well for this data. Likely not well enough to place on the scoreboard of the competition, but well enough in practice they would be totally reasonable to swap out due to the simpler results (Wheeler et al., 2019).

But for this paper, the main takeaway me and Gio want to tell folks is to create a (good) model using open source data is totally within the capabilities of PhD criminal justice researchers and data scientists working for these state agencies.They are quantitaive skills I wish more students within our field would pursue, as it makes it easier for me to hire you as a data scientist!

References

A new series: The Criminal Justician

In partnership with the American Society of Evidence Based Policing (ASEBP), I have started a new blog series on their website, The Criminal Justician. The first post is up, Denver’s STAR Program and Disorder Crime Reductions, which you can read if you have a membership.

ASEBP is an organization that brings together in the field police officers, as well as researchers, policy makers, and community leaders to promote scientific progress in the policing profession. For officers, analysts, and police researchers wanting to make a difference, it is definately an organization worth joining and participating in the trainings/conferences.

The blog series will be me discussing recent scientific research of relevance to policing. I break down complicated empirical results to be more accessible to a wider audience – either to understand the implications for the field or to critique the potential findings. If before you want to pony up the few dollars for joining ASEBP, here are some examples of past articles on my personal blog of similar scope:

I will still blog here about more technical things, like optimizing functions/statistical coding. But my more opinion pieces on current policing research will probably head over to the ASEBP blog series. In the hopper are topics like police scorecards, racial bias in predictive policing, and early intervention systems (with plans to post an article around once a month).

Outputs vs Outcomes and Agile

For my criminal justice followers, there is a project planning strategy, Agile, that dominates software engineering. The idea behind Agile is to formulate plans in short sprints (we do two week sprints at my work). So we have very broad based objectives (Epics) that can span a significant amount of time. Then we have shorter goals (Stories) that are intended to take up the sprint. Within each story, we further break down our work into specific tasks that we can estimate how long they will take. So something at my work may look like:

  • Build Model to Predict Readmission for Heart Attacks (Epic)
    • Create date pipeline for training data (Story)
      • SQL functions to prepare data (Task, 2 days)
      • python code to paramaterize SQL (Task, 3 days)
      • Unit tests for python code (Task, 1 day)
    • Build ML Model (Story)
      • evaluate different prediction models (Task, 2 days)
    • Deploy ML Model in production (Story)

Etc. People at this point often compare Agile vs Waterfall, where waterfall is more longish term planning (often on say a quarterly schedule). And Agile per its name is suppossed to be more flexible, and modify plans on short term. Most of my problems with Agile could apply though to Waterfall planning as well – short term project planning (almost by its nature) has to be almost solely focused on outputs and not outcomes.

Folks with a CJ background will know what I am talking about here. So police management systems often contrast focusing on easily quantifiable outputs, such as racking up traffic tickets and low level arrests, vs achieving real outcomes, such as increased traffic safety or reducing violent crime. While telling police officers to never do these things does not make sense, you can give feedback/nudge them to engage in higher quality short term outputs that should better promote those longer term outcomes you want.

Agile boards (where we post these Epics/Stories/Tasks, for people to keep tabs on what everyone is doing) are just littered with outputs that have little to no tangible connection to real life outcomes. Take my Heart Attack example. It may be there is a current Heart Attack prediction system in place based on a simple scorecard – utility in that case would be me comparing how much better my system is than the simpler scorecard method. If we are evaluating via dollars and cents, it may only make sense to evaluate how effective my system is in promoting better health outcomes (e.g. evaluating how well my predictive system reduces follow up heart attacks or some other measure of health outcomes).

The former example is not a unit of time (and so counts for nothing in the Agile framework). Although in reality it should be the first thing you do (and drop the project if you cannot sufficiently beat a simple baseline). You don’t get brownie points for failing fast in this framework though. In fact you look bad, as you did not deliver on a particular product.

The latter example unfortunately cannot be done in a short time period – we are often talking about timescales of years at that point instead of weeks. People can look uber productive on their Agile board, and can easily accomplish nothing of value over broad periods of time.

Writing this post as we are going through our yearly crisis of “we don’t do Agile right” at my workplace. There are other more daily struggles with Agile – who defines what counts as meeting an objective? Are we being sufficiently specific in our task documentation? Are people over/under worked on different parts of the team? Are we estimating the time it takes to do certain tasks accurately? Does our estimate include actual work, or folds in uncertainty due to things other teams are responsible for?

These short term crises of “we aren’t doing Agile right” totally miss the boat for me though. I formulate my work strategy by defining end goals, and then work backwards to plan the incremental outputs necessary to achieve those end goals. The incremental outputs are a means to that end goal, not the ends themselves. I don’t really care if you don’t fill out your short term tasks or mis-estimate something to take a week instead of a day – I (and the business) cares about the value added of the software/models you are building. It isn’t clear to me that looking good on your Agile board helps accomplish that.

Gun Buy Back Programs Probably Don’t Work

When I was still a criminology professor, I remember one day while out getting groceries receiving a cold call from a police department interested in collaborating. They asked if I could provide evidence to support their cities plan to implement sex offender residence restrictions. While taking the call I was walking past a stand for the DARE program.

A bit of inside pool for my criminology friends, but for others these are programs that have clearly been shown to not be effective. Sex offender restrictions have no evidence they reduce crimes, and DARE has very good evidence it does not work (and some mild evidence it causes iatrogenic effects – i.e. causes increased drug use among teenagers exposed to the program).

This isn’t a critique of the PD who called me – academics just don’t do a great job of getting the word out. (And maybe we can’t effectively, maybe PDs need to have inhouse people do something like the American Society of Evidence Based Policing course.)

One of the programs that is similar in terms of being popular (but sparse on evidence supporting it) are gun buy back programs. Despite little evidence that they are effective, cities still continue to support these programs. Both Durham and Raleigh recently implemented buy backs for example.


What is a gun buy back program? Police departments encourage people to turn in guns – no questions asked – and they get back money/giftcards for the firearms (often in the range of $50 to $200). The logic behind such programs is that by turning in firearms it prevents them from being used in subsequent crimes (or suicides). No questions asked is to encourage individuals who have even used the guns in a criminal manner to not be deterred from turning in the weapons.

There are not any meta-analyses of these programs, but the closest thing to it, a multi-city study by Ferrazares et al. (2021), analyzing over 300 gun buy backs does not find macro, city level evidence of reduced gun crimes subsequent to buy back programs. While one can cherry pick individual studies that have some evidence of efficacy (Braga & Wintemute, 2013; Phillips et al., 2013), the way these programs are typically run in the US they are probably not effective at reducing gun crime.

Lets go back to first principles – if we 100% knew a gun would be used in the commission of a crime, then “buying” that gun would likely be worth it. (You could say an inelastic criminal will find or maybe even purchase a new gun with the reward, Mullin (2001), so that purchase does not prevent any future crimes, but I am ignoring that here.)

We do not know that for sure any gun will be used in the commission of a crime – but lets try to put some guesstimates on the probability that it will be used in a crime. There are actually more guns in the US than there are people. But lets go with a low end total of 300 million guns (Braga & Wintemute, 2013). There are around half a million crimes committed with a firearm each year (Planty et al., 2013). So that gives us 500,000/300,000,000 ~ 1/600. So I would guess if you randomly confiscated 600 guns in the US, you would prevent 1 firearm crime.

This has things that may underestimate (one gun can be involved in multiple crimes, still the expected number of crimes prevented is the same), and others that overestimate (more guns, fewer violent crimes, and replacement as mentioned earlier). But I think that this estimate is ballpark reasonable – so lets say 500-1000 guns to reduce 1 firearm crime. If we are giving out $200 gift cards per weapon returned, that means we need to drop $100k to $200k to prevent one firearm crime.

Note I am saying one firearm crime (not homicide), if we were talking about preventing one homicide with $200k, that is probably worth it. That is not a real great return on investment though for the more general firearm crimes, which have costs to society typically in the lower 5 digit range.

Gun buy backs have a few things going against them though even in this calculation. First, the guns returned are not a random sample of guns. They tend to be older, long guns, and often not working (Kuhn et al., 2021). It is very likely the probability those specific guns would be used in the commission of a crime is smaller than 1/600. Second is just the pure scope of the programs, they are often just around a few hundred firearms turned in for any particular city. This is just too small a number to reasonably tell whether they are effective (and what makes the Australian case so different).

Gun buy backs are popular, and plausibly may be “worth it”. (If encouraging working hand guns (Braga & Wintemute, 2013) and the dollar rewards are more like $25-$50 the program is more palatable in my mind in terms of at least potentially being worth it from a cost/benefit perspective.) But with the way most of these studies are conducted, they are hopeless to identify any meaningful macro level crime reductions (at the city level, would need to be more like 20 times larger in scope to notice reductions relative to typical background variation). So I think more proven strategies, such as focussed deterrence or focusing on chronic offenders, are likely better investments for cities/police departments to make instead of gun buy backs.

References

The limit on the cost efficiency of gun violence interventions

Imagine a scenario where someone came out with technology that would 100% reduce traffic fatalities at a particular curve in a road. But, installation and maintenance of the tech would cost $36 million dollars per 100 feet per year. It is unlikely anyone would invest in such technology – perhaps if you had a very short stretch of road that resulted in a fatality on average once a month it would be worth it. In that case, the tech would result in $36/12 = $3 million dollars to ‘save a life’.

There are unlikely any stretches of roads that have this high of fatality rate though (and this does not consider potential opportunity costs of less effective but cheaper other interventions). So if we had a location that has a fatality once a year, we are then paying $36 million dollars to save one life. We ultimately have upper limits on what society will pay to save a life.

Working on gun violence prevention is very similar. While gun violence has potentially very large costs to society, see Everytown’s estimates of $50k to a nonfatal shooting and $270k for a fatality, preventing that gun violence is another matter.

The translation to gun violence interventions from the traffic scenario is ‘we don’t have people at super high risk of gun violence’ and ‘the interventions are not going to be 100% effective’.

My motivation to write this post is the READI intervention in Chicago, which has a price tag of around $60k per participant per 20 months. What makes this program then ‘worth it’ is the probability of entrants being involved with gun violence multiplied by the efficacy of the program.

Based on other work I have done on predicting gun violence (Wheeler et al., 2019b), I guesstimate that any gun violence predictive instrument spread over a large number of individuals will have at best positive predictive probabilities of 10% over a year. 10% risk of being involved in gun violence is incredibly high, a typical person will have something more on the order of 0.01% to 0.001% risk of being involved with gun violence. So what this means is if you have a group of 100 high risk people, I would expect ~10 of them to be involved in a shooting (either as a victim or offender).

This lines up almost perfectly with READI, which in the control group had 10% shot over 20 months. So I think READI actual did a very good job of referring high risk individuals to the program. I don’t think they could do any better of a job in referring even higher risk people.

This though implies that even with 100% efficacy (i.e. anyone who is in READI goes to 0% risk of involvement in gun violence), you need to treat ~10 people to prevent ~1 shooting victimization. 100% efficacy is not realistic, so lets go with 50% efficacy (which would still be really good for a crime prevention program, and is probably way optimistic given the null results). Subsequently this implies you need to treat ~20 people to prevent ~1 shooting. This results in a price tag of $1.2 million to prevent 1 shooting victimization. If we only count the price of proximal gun violence (as per the Everytown estimates earlier), READI is already cost-inefficient from the get go – a 100% efficacy you would still need around 10 people (so $600k) to reduce a single shooting.

The Chicago Crime Lab uses estimates from Cohen & Piquero (2009) to say that READI has a return on investment of 3:1, so per $60k saves around $180. These however count reductions over the life-course, including person lost productivity, not just state/victim costs, which I think are likely to be quite optimistic for ROI that people care about. (Productivity estimates always seem suspect to me, models I have put into production in my career have generated over 8 digits of revenue, but if I did not do that work someone else would have. I am replaceable.)

I think it is likely one can identify other, more cost effective programs to reduce gun violence compared to READI. READI has several components, part of which is a caseworker, cognitive behavior therapy (CBT), and a jobs program. I do not know cost breakdowns for each, but it may be some parts drive up the price without much benefit over the others.

I am not as much on the CBT bandwagon as others (I think it looks quite a bit like the other pysch research that has come into question more recently), but I think caseworkers are a good idea. The police department I worked with on the VOID paper had caseworkers as part of their intervention, as did focused deterrence programs I have been involved with (Wheeler et al., 2019a). Wes Skogan even discussed how caseworkers were part of Chicago CEASEFIRE/outreach workers on Jerry Ratcliffe’s podcast. For those not familiar, case workers are just social workers assigned to these high risk individuals, and they often help their charges with things like getting an ID/Drivers License and applying to jobs. So just an intervention of caseworkers assigned to high risk people I think is called for.

You may think many of these high risk individuals are not amenable to treatment, but my experience is a non-trivial number of them are willing to sit down and try to straighten their lives out, and they need help to do that it. Those are people case workers are a good potential solution.

Although I am a proponent of hot spots policing as well, if we are just talking about shootings, I don’t think hot spots will have a good return on investment either (Drake et al., 2022). Only if you widen the net to other crimes do a think hot spots makes sense (Wheeler & Reuter, 2021). And maybe here I am being too harsh, if you reduce other criminal behavior READIs cost-benefit ratio likely looks better. But just considering gun violence, I think dropping $60k per person is never going to be worth it in realistic high gun violence risk populations.

References

Home buying and collective efficacy

With the recent large appreciation in home values, around 20% in the prior year, there have been an increase in private investors purchasing homes to rent out. Recent stories on this by Tyler Dukes and colleagues have collated open parcel data to identify the scope of these companies across all of North Carolina.

For bit of background, I tried to purchase a home in Plano, TX early 2018. Homes in our price range at that time were going in a single day and typically a few thousand over asking price.

Fast forward to early 2021, I am full remote data scientist instead of a professor, and kiddo is in online school. Even with the pay bump, housing competition was even worse in Plano at this point, so we knew we were likely going to have to move school districts to be able to purchase a home. So we decided to strike out, and ended up looking around Raleigh. Ended up quite quickly deciding to purchase a new build home in the suburb of Clayton (totally recommend our realtor, Ellen Pitts, her crew did quite a bit of work for us remotely).

I was lucky to get in then it appears – many of the new developments in the area are being heavily scooped up by these equity firms (and rent would be ~$600 more for my home than the mortgage). So I downloaded the public data Dukes put together, and loaded it into Excel to make a quick map of the properties.

For a NC state view, we have big clusters in Charlotte, Greensboro and Raleigh:

We can zoom in, and here is an overview of triangle area:

So you can see that inside the loop in Raleigh is pretty sparse, but many of the newer developments on the east side have many more of the private firm purchased houses. Charlotte is much more infilled with these private firms purchasing properties.

Zooming in even further to my town of Clayton, there is quite a bit of variance in the proportion of private vs residential purchases across various developments. My development is less than 50% of these purchases, several developments though appear almost 100% private purchased though. (This is not my home/neighborhood FYI.)


So what does this have to do with collective efficacy? Traditionally areas with higher home ownership have been associated with lower rates of crime. For not criminologists reading my blog, one of the most prominent criminological theories is that state actions only move the needle slightly on increasing/decreasing crime, people enforcing social norms is a bigger factor that explains high crime vs low crime areas. Places with people churning out more frequently – which occurs in areas with more renters – tend to have fewer people effectively keeping the peace. Because social scientists love to make up words, we call this concept collective efficacy.

Downloading and looking at this data, while I was mostly just interested in zooming into my neighborhood and seeing the infill of renters, sparked a criminological hypothesis: I expect neighborhoods with higher rates of private equity purchased housing in the long run to have higher rates of criminal behavior.

This hypothesis will be difficult to test in the wild. It is partially confounded with capital – those who buy their homes accumulate more wealth over time (again mortgage is quite a bit cheaper than rent, so even ignoring home value appreciation this is true). But the variance in the number of homes purchased by private equity firms in different areas makes me wonder if there is enough variation to do a reasonable research design to test my hypothesis, especially in the Charlotte area in say two or three years post a development being finished.

State dependence and trajectory models

I am currently reviewing a paper that uses group based trajectory models (GBTM) – and to start this isn’t a critique of the paper. GBTM I think is a very useful descriptive tool (how this paper I am reading mostly uses it), and can be helpful in some predictive contexts as well.

It is much more difficult though to attribute a causal framework to those trajectories though. First, my favorite paper on this topic is Distinguishing facts and artifacts in group-based modeling (Skardhamar, 2010). Torbjørn in that paper simulates random data (not dissimilar to what I do here, but a few more complicated factors), and shows that purely random data will still result in GBTM identifying trajectories. You can go the other way as well, I have a blog post where I simulate actual latent trajectories and GBTM recovers them, and another example where fit stats clearly show a random effects continuous model is better for a different simulation. In real data though we don’t know the true model like these simulations, so we can only be reasonably skeptical that the trajectories we uncover really represent latent classes.

In particular, the paper I was reading is looking at a binary outcome, so you just observe a bunch of 0s and 1s over the time period. So given the limited domain, it is difficult to uncover really wild looking curves. They ended up finding a set of curves that although meet all the good fit stats, pretty much cover the domain of possibilities – one starting high an linearly sloping down, one starting low and sloping up, one flat high, one flat low, and a single curved up slope.

So often in criminology we interpret these latent trajectories as population heterogeneity – people on different curves are fundamentally different (e.g. Moffitt’s taxonomy for offending trajectories). But there are other underlying data generating processes that can result in similar trajectories – especially over a limited domain of 0/1 data.

Here I figured the underlying data the paper I am reviewing is subject to very strong state dependence – your value at t-1 is very strongly correlated to t. So here I simulate data in R, and use the flexmix package to fit the latent trajectories.

First, I simulate 1500 people over 15 time points. I assign them an original probability estimate uniformly, then I generate 15 0/1 observations, updating that probability slightly over time with an auto-correlation of 0.9. (Simulations are based on the logit scale, but then backed out into 0/1s.)

# R Code simulating state dependence 0/1
# data
library("flexmix")
set.seed(10)

# logit and inverse function
logistic <- function(x){1/(1+exp(-x))}
logit <- function(x){log(x/(1-x))}

# generate uniform probabilities
n <- 1500
orig_prob <- runif(n)

# translate to logits
ol <- logit(orig_prob)
df <- data.frame(id=1:n,op=orig_prob,ol)

# generate auto-correlated data for n = 10
auto_corr <- 0.90
tp <- 15
vl <- paste0('v',1:tp)
vc <- var(ol) #baseline variance, keep equal

for (v in vl){
   # updated logit
   rsd <- sqrt(vc - vc*(auto_corr^2))
   ol <- ol*0.9 + rnorm(n,0,rsd)
   # observed outcome
   df[,v] <- rbinom(n,1,logistic(ol))
}

This generates the data in wide format, so I reshape to long format needed to fit the models using flexmix, and I by default choose 5 trajectories (same as chosen in the paper I am reviewing).

# reshape wide to long
ld <- reshape(df, idvar="id", direction="long",
        varying = list(vl))

# fit traj model for binary outcomes
mod <- flexmix(v1 ~ time + I(time^2) | id,
               model = FLXMRmultinom(),
               data=ld, k=5)

rm <- refit(mod)
summary(rm)

Now I create smooth curves over the period to plot. I am lazy here, the X axis should actually be 1-15 (I simulated 15 discrete time points).

tc <- summary(rm)@components[[1]]
pd <- data.frame(c=1,t=seq(1,tp,length.out=100))
pd$tsq <- pd$t^2

co <- matrix(-999,nrow=3,ncol=5)

for (i in 1:5){
  vlab <- paste0('pred',i)
  co[,i] <- tc[[i]][,1]
}

pred <- as.matrix(pd) %*% co

# plot on probability scale
matplot(logistic(pred))

These are quite similar to the curves for the paper I am reviewing, a consistent low probability (5), and a consistent high (1), a downward mostly linear slope (3), and an upward linear slope (2), and then one parabola concave down (4) (in the paper they had one concave up).

I figured the initial probability I assigned would highly impact the curve the model assigned a person to in this simulation. It ends up being more spread out than I expected though.

# distribution of classes vs original probability
ld$clus <- clusters(mod)
r1 <- ld[ld$time == 1,]
clustjit <- r1$clus + runif(n,-0.2,0.2)
plot(clustjit,r1$op) # more spread out than I thought

So there is some tendency for each trajectory to be correlated based on the original probability, but it isn’t that strong.

If we look at the average max posterior probabilities, they are OK minus the parabola group 4.

# average posterior probability
pp <- data.frame(posterior(mod))
ld$pp <- pp[cbind(1:(n*tp),ld$clus)]
r1 <- ld[ld$time == 1,]
aggregate(pp ~ clus, data = r1, mean)
#   clus        pp
# 1    1 0.8923801
# 2    2 0.7903938
# 3    3 0.7535281
# 4    4 0.6380946
# 5    5 0.8419221

The paper I am reviewing has much higher APPs for each group, so maybe they are really representing pop heterogeneity instead of continuous state dependence, it is just really hard with such observational data to tell the difference.

An update on the WaPo Officer Involved Shooting Stats

Marisa Iati interviewed me for a few clips in a recent update of the WaPo data on officer involved fatal police shootings. I’ve written in the past the data are very consistent with a Poisson process, and this continues to be true.

So first thing Marisa said was that shootings in 2021 are at 1055 (up from 1021 in 2020). Is this a significant increase? I said no off the cuff – I knew the average over the time period WaPo has been collecting data is around 1000 fatal shootings per year, so given a Poisson distribution mean=variance, we know the standard deviation of the series is close to sqrt(1000), which approximately equals 60. So anything 1000 plus/minus 60 (i.e. 940-1060) is within the typical range you would expect.

In every interview I do, I struggle to describe frequentist concepts to journalists (and this is no different). This is not a critique of Marisa, this paragraph is certainly not how I would write it down on paper, but likely was the jumble that came out of my mouth when I talked to her over the phone:

Despite setting a record, experts said the 2021 total was within expected bounds. Police have fatally shot roughly 1,000 people in each of the past seven years, ranging from 958 in 2016 to last year’s high. Mathematicians say this stability may be explained by Poisson’s random variable, a principle of probability theory that holds that the number of independent, uncommon events in a large population will remain fairly stagnant absent major societal changes.

So this sort of mixes up two concepts. One, the distribution of fatal officer shootings (a random variable) can be very well approximated via a Poisson process. Which I will show below still holds true with the newest data. Second, what does this say about potential hypotheses we have about things that we think might influence police behavior? I will come back to this at the end of the post,

R Analysis at the Daily Level

So my current ptools R package can do a simple analysis to show that this data is very consistent with a Poisson process. First, install the most recent version of the package via devtools, then you can read in the WaPo data directly via the Github URL:

library(devtools)
install_github("apwheele/ptools")
library(ptools)

url <- 'https://raw.githubusercontent.com/washingtonpost/data-police-shootings/master/fatal-police-shootings-data.csv'
oid <- read.csv(url,stringsAsFactors = F)

Looking at the yearly statistics (clipping off events recorded so far in 2022), you can see that they are hypothetically very close to a Poisson distribution with a mean/variance of 1000, although perhaps have a slow upward trend over the years.

# Year Stats
oid$year <- as.integer(substr(oid$date,1,4))
year_stats <- table(oid$year)
print(year_stats)
mean(year_stats[1:7]) # average of 1000 per year
var(year_stats[1:7])  # variance just under 1000

We can also look at the distribution at shorter time intervals, here per day. First I aggregat the data to the daily level (including 0 days), second I use my check_pois function to get the comparison distributions:

#Now aggregating to count per day
oid$date_val <- as.Date(oid$date)
date_range <- paste0(seq(as.Date('2015-01-01'),max(oid$date_val),by='days'))
day_counts <- as.data.frame(table(factor(oid$date,levels=date_range)))
head(day_counts)

pfit <- check_pois(day_counts$Freq, 0, 10, mean(day_counts$Freq))
print(pfit)

The way to read this, for a mean of 2.7 fatal OIS per day (and given this many days), we would expect 169.7 0 fatality days in the sample (PoisF), but we actually observed 179 0 fatality days, so a residual of 9.3 in the total count. The trailing rows show the same in percentage terms, so we expect 6.5% of the days in the sample to have 0 fatalities according to the Poisson distribution, and in the actual data we have 6.9%.

You can read the same for the rest of the rows, but it is mostly the same. It is only very slight deviations from the baseline Poisson expected Poisson distribution. This data is the closest I have ever seen to real life, social behavioral data to follow a Poisson process.

For comparison, lets compare to the NYC shootings data I have saved in the ptools package.

# Lets check against NYC Shootings
data(nyc_shoot)
date_range <- paste0(seq(as.Date('2006-01-01'),max(nyc_shoot$OCCUR_DATE),by='days'))
shoot_counts <- as.data.frame(table(factor(nyc_shoot$OCCUR_DATE,levels=date_range)))

sfit <- check_pois(shoot_counts$Freq,0,max(shoot_counts$Freq),mean(shoot_counts$Freq))
round(sfit,1)

This is much more typical of crime data I have analyzed over my career (in that it deviates from a Poisson process by quite a bit). The mean is 4.4 shootings per day, but the variance is over 13. There are many more 0 days than expected (433 observed vs 73 expected). And there are many more high crime shooting days than expected (tail of the distribution even cut off). For example there are 27 days with 18 shootings, whereas a Poisson process would only expect 0.1 days in a sample of this size.

My experience though is that when the data is overdispersed, a negative binomial distribution will fit quite well. (Many people default to a zero-inflated, like Paul Allison I think that is a mistake unless you have a structural reason for the excess zeroes you want to model.)

So here is an example of fitting a negative binomial to the shooting data:

# Lets fit a negative binomial and check out
library(fitdistrplus)
fnb <- fitdist(shoot_counts$Freq,"nbinom")
print(fnb$estimate)

sfit$nb <- 100*mapply(dnbinom, x=sfit$Int, size=fnb$estimate[1], mu=fnb$estimate[2])
round(sfit[,c('Prop','nb')],1) # Much better overall fit

And this compares the percentages. So you can see observed 7.5% 0 shooting days, and expected 8.6% according to this negative binomial distribution. Much closer than before. And the tails are fit much closer as well, for example, days with 18 shootings are expected 0.2% of the time, and are observed 0.4% of the time.

So What Inferences Can We Make?

In social sciences, we are rarely afforded the ability to falsify any particular hypothesis – or in more lay-terms we can’t really ever prove something to be false beyond a reasonable doubt. We can however show whether empirical data is consistent or inconsistent with any particular hypothesis. In terms of Fatal OIS, several ready hypotheses ones may be interested in are Does increased police scrutiny result in fewer OIS?, or Did the recent increase in violence increase OIS?.

While these two processes are certainly plausible, the data collected by WaPo are not consistent with either hypothesis. It is possible both mechanisms are operating at the same time, and so cancel each other out, to result in a very consistent 1000 Fatal OIS per year. A simpler explanation though is that the baseline rate has not changed over time (Occam’s razor).

Again though we are limited in our ability to falsify these particular hypotheses. For example, say there was a very small upward trend, on the order of something like +10 Fatal OIS per year. Given the underlying variance of Poisson variables, even with 7+ years of data it would be very difficult to identify that small of an upward trend. Andrew Gelman likens it to measuring the weight of a feather carried by a Kangaroo jumping on the scale.

So really we could only detect big changes that swing OIS by around 100 events per year I would say offhand. Anything smaller than that is likely very difficult to detect in this data. And so I think it is unlikely any of the recent widespread impacts on policing (BLM, Ferguson, Covid, increased violence rates, whatever) ultimately impacted fatal OIS in any substantive way on that order of magnitude (although they may have had tiny impacts at the margins).

Given that police departments are independent, this suggests the data on fatal OIS are likely independent as well (e.g. one fatal OIS does not cause more fatal OIS, nor the opposite one fatal OIS does not deter more fatal OIS). Because of the independence of police departments, I am not sure there is a real great way to have federal intervention to reduce the number of fatal OIS. I think individual police departments can increase oversight, and maybe state attorney general offices can be in a better place to use data driven approaches to oversee individual departments (like ProPublica did in New Jersey). I wouldn’t bet money though on large deviations from that fatal 1000 OIS anytime soon though.