Hot spots of crime in Raleigh and home buying

So my realtor, Ellen Pitts (who is highly recommended, helped us a ton remotely moving into Raleigh), has a YouTube channel where she talks about real estate trends. Her most recent video she discussed a bit about crime in Raleigh relative to other cities because of the most recent shooting.

My criminologist hot take is that generally most cities in the US are relatively low crime. So Ellen shows Dallas has quite a few more per-capita shootings than Raleigh, but Dallas is quite safe “overall”. Probably somewhat contra to what most people think, the cities that in my opinion really have the most crime problems tend to be smaller rust belt cities. I love Troy, NY (where I was a crime analyst for a few years), but Troy is quite a bit rougher around the edges than Raleigh or Dallas.

So this post is more about, you have already chosen to move to Raleigh – if I am comparing house 1 and house 2 (or looking at general neighborhoods), do I need to worry about crime in this specific location?

So for a few specific resources/strategies for the home hunter. Not just in Raleigh, but many cities now have an open data portal. You can often look at crime. Here is an example with the Raleigh open data:

So if you have a specific address in mind, you can go and see the recent crime around that location (cities often fuzz the address a bit, so the actual points are just nearby on that block of the street). Blue dots in that screenshot are recent crimes in 2022 against people (you can click on each dot and get a more specific breakdown). Be prepared when you do this – crime is everywhere. But that said the vast majority of minor crime incidents should not deter you from buying a house or renting at a particular location.

Note I recommend looking at actual crime data (points on a map) for this. Several vendors release crime stats aggregated to neighborhoods or zipcodes, but these are of very low quality. (Often they “make up” data when it doesn’t exist, and when data does exist they don’t have a real great way to rank areas of low or high crime.)

For the more high level, should I worry about this neighborhood, I made an interactive hotspot map.

For the methodology, I focused on crimes that I would personally be concerned with as a homeowner. If I pull larceny crimes, I am sure the Target in North Hills would be a hotspot (but I would totally buy a condo in North Hills). So this pulls the recent crime data from Raleigh open data starting in 2020, but scoops up aggravated assaults, interpersonal robberies, weapon violations, and residential burglaries. Folks may be concerned about drug incidents and breaking into cars as well, but my experience those also do not tend to be in residential areas. The python code to replicate the map is here.

Then I created DBScan clusters that had at least 34 crimes – so these areas average at least one of these crimes per month over the time period I sampled. Zooming in, even though I tried to filter for more potentially residential related crimes, you can see the majority of these hot spots of crime are commercial areas in Raleigh. So for example you can zoom in and check out the string of hot spots on Capital Blvd (and if you click a hot spot you can get breakdowns of specific crime stats I looked at):

Very few of these hot spots are in residential neighborhoods – most are in more commercial areas. So when considering looking at homes in Raleigh, there are very few spots I would worry about crime at all in the city when making a housing choice. If moving into a neighborhood with a higher proportion of renters I think is potentially more important long term signal than crime here in Raleigh.

A new series: The Criminal Justician

In partnership with the American Society of Evidence Based Policing (ASEBP), I have started a new blog series on their website, The Criminal Justician. The first post is up, Denver’s STAR Program and Disorder Crime Reductions, which you can read if you have a membership.

ASEBP is an organization that brings together in the field police officers, as well as researchers, policy makers, and community leaders to promote scientific progress in the policing profession. For officers, analysts, and police researchers wanting to make a difference, it is definately an organization worth joining and participating in the trainings/conferences.

The blog series will be me discussing recent scientific research of relevance to policing. I break down complicated empirical results to be more accessible to a wider audience – either to understand the implications for the field or to critique the potential findings. If before you want to pony up the few dollars for joining ASEBP, here are some examples of past articles on my personal blog of similar scope:

I will still blog here about more technical things, like optimizing functions/statistical coding. But my more opinion pieces on current policing research will probably head over to the ASEBP blog series. In the hopper are topics like police scorecards, racial bias in predictive policing, and early intervention systems (with plans to post an article around once a month).

Gun Buy Back Programs Probably Don’t Work

When I was still a criminology professor, I remember one day while out getting groceries receiving a cold call from a police department interested in collaborating. They asked if I could provide evidence to support their cities plan to implement sex offender residence restrictions. While taking the call I was walking past a stand for the DARE program.

A bit of inside pool for my criminology friends, but for others these are programs that have clearly been shown to not be effective. Sex offender restrictions have no evidence they reduce crimes, and DARE has very good evidence it does not work (and some mild evidence it causes iatrogenic effects – i.e. causes increased drug use among teenagers exposed to the program).

This isn’t a critique of the PD who called me – academics just don’t do a great job of getting the word out. (And maybe we can’t effectively, maybe PDs need to have inhouse people do something like the American Society of Evidence Based Policing course.)

One of the programs that is similar in terms of being popular (but sparse on evidence supporting it) are gun buy back programs. Despite little evidence that they are effective, cities still continue to support these programs. Both Durham and Raleigh recently implemented buy backs for example.


What is a gun buy back program? Police departments encourage people to turn in guns – no questions asked – and they get back money/giftcards for the firearms (often in the range of $50 to $200). The logic behind such programs is that by turning in firearms it prevents them from being used in subsequent crimes (or suicides). No questions asked is to encourage individuals who have even used the guns in a criminal manner to not be deterred from turning in the weapons.

There are not any meta-analyses of these programs, but the closest thing to it, a multi-city study by Ferrazares et al. (2021), analyzing over 300 gun buy backs does not find macro, city level evidence of reduced gun crimes subsequent to buy back programs. While one can cherry pick individual studies that have some evidence of efficacy (Braga & Wintemute, 2013; Phillips et al., 2013), the way these programs are typically run in the US they are probably not effective at reducing gun crime.

Lets go back to first principles – if we 100% knew a gun would be used in the commission of a crime, then “buying” that gun would likely be worth it. (You could say an inelastic criminal will find or maybe even purchase a new gun with the reward, Mullin (2001), so that purchase does not prevent any future crimes, but I am ignoring that here.)

We do not know that for sure any gun will be used in the commission of a crime – but lets try to put some guesstimates on the probability that it will be used in a crime. There are actually more guns in the US than there are people. But lets go with a low end total of 300 million guns (Braga & Wintemute, 2013). There are around half a million crimes committed with a firearm each year (Planty et al., 2013). So that gives us 500,000/300,000,000 ~ 1/600. So I would guess if you randomly confiscated 600 guns in the US, you would prevent 1 firearm crime.

This has things that may underestimate (one gun can be involved in multiple crimes, still the expected number of crimes prevented is the same), and others that overestimate (more guns, fewer violent crimes, and replacement as mentioned earlier). But I think that this estimate is ballpark reasonable – so lets say 500-1000 guns to reduce 1 firearm crime. If we are giving out $200 gift cards per weapon returned, that means we need to drop $100k to $200k to prevent one firearm crime.

Note I am saying one firearm crime (not homicide), if we were talking about preventing one homicide with $200k, that is probably worth it. That is not a real great return on investment though for the more general firearm crimes, which have costs to society typically in the lower 5 digit range.

Gun buy backs have a few things going against them though even in this calculation. First, the guns returned are not a random sample of guns. They tend to be older, long guns, and often not working (Kuhn et al., 2021). It is very likely the probability those specific guns would be used in the commission of a crime is smaller than 1/600. Second is just the pure scope of the programs, they are often just around a few hundred firearms turned in for any particular city. This is just too small a number to reasonably tell whether they are effective (and what makes the Australian case so different).

Gun buy backs are popular, and plausibly may be “worth it”. (If encouraging working hand guns (Braga & Wintemute, 2013) and the dollar rewards are more like $25-$50 the program is more palatable in my mind in terms of at least potentially being worth it from a cost/benefit perspective.) But with the way most of these studies are conducted, they are hopeless to identify any meaningful macro level crime reductions (at the city level, would need to be more like 20 times larger in scope to notice reductions relative to typical background variation). So I think more proven strategies, such as focussed deterrence or focusing on chronic offenders, are likely better investments for cities/police departments to make instead of gun buy backs.

References

My journey submitting to CRAN

So my R package ptools is up on CRAN. CRAN obviously does an important service – I find the issues I had to deal with pedantic – but will detail my struggles here, mostly so others hopefully do not have to deal with the same issues in the future. Long story short I knew going in it can be tough and CRAN did not disappoint.

Initially I submitted the package in early June, which it passed the email verification, but did not receive any email back after that. I falsely presumed it was in manual review. After around a month I sent an email to cran-sysadmin. The CRAN sysadmin promptly sent an email back with the reason it auto-failed – examples took too long – but not sure why I did not receive an auto-message back (so it never got to the manual review stage). When I got auto-fail messages at the equivalent stage in later submissions, it was typically under an hour to get that stage auto-fail message back.

So then I went to fixing the examples that took too long (which on my personal machine all run in under 5 seconds, I have a windows $400 low end “gaming” desktop, with an extra $100 in RAM, so I am not running some supercomputer here as background). Running devtools check() is not the same as running R CMD check --as-cran path\package.tar.gz, but maybe check_built() is, I am not sure. So first note to self just use the typical command line tools and don’t be lazy with devtools.

Initially I commented out sections of the examples that I knew took too long. Upon manual review though, was told don’t do that and to wrap too long of examples in donttest{}. Stochastic changes in run times even made me fail a few times at this – some examples passed the time check in some runs but failed in others. Some examples that run pretty much instantly on my machine failed in under 10 seconds for windows builds on CRAN’s checks. (My examples use plots on occasion, and it may be spplot was the offender, as well as some of my functions that are not fast and use loops internally.) I have no advice here than to just always wrap plot functions in donttest{}, as well as anything too complicated for an abacus. There is no reliable way (that I can figure) to know examples that are very fast on my machine will take 10+ seconds on CRAN’s checks.

But doing all of these runs resulted in additional Notes in the description about spelling errors. At first it was last names in citations (Wheeler and Ratcliffe). So I took those citations out to prevent the Note. Later in manual review I was asked to put them back in. Occasionally a DOI check would fail as well, although it is the correct DOI.

One of the things that is confusing to me – some of the Note’s cause automatic failures (examples too long) and others do not (spelling errors, DOI check). The end result messages to me are the same though (or at least I don’t know how to parse a “this is important” Note vs a “whatever not a big deal” Note). The irony of this back and forth related to these spelling/DOI notes in the description is that the description went through changes only to get back to what is was originally.

So at this point (somewhere around 10+ submission attempts), 7/16, it finally gets past the auto/human checks to the point it is uploaded to CRAN. Finished right – false! I then get an automated email from Brian Ripley/CRAN later that night saying it is up, but will be removed on 8/8 because Namespace in Imports field not imported from: 'maptools'.

One function had requireNamespace("maptools") to use the conversion functions in maptools to go between sp/spatspat objects. This caused that “final” note about maptools not being loaded. To fix this, I ended up just removing maptools dependency altogether, as using unexported functions, e.g. maptools:::func causes a note when I run R CMD check locally (so presume it will auto-fail). There is probably a smarter/more appropriate way to use imports – I default though to doing something I hope will pass the CRAN checks though.

I am not sure why this namespace is deal breaker at this stage (after already on CRAN) and not earlier stages. Again this is another Note, not a warning/error. But sufficient to get CRAN to remove my package in a few weeks if I don’t fix. This email does not have the option “send email if a false positive”.

When resubmitting after doing my fixes, I then got a new error for the same package version (because it technically is on CRAN at this point), so I guess I needed to increment to 1.0.1 and not fix the 1.0.0 package at this point. Also now the DOI issue in the description causes a “warning”. So I am not sure if this update failed because of package version (which doesn’t say note or warning in the auto-fail email) or because of DOI failure (which again is now a warning, not a Note).

Why sometimes a DOI failure is a warning and other times it is a note I do not know. At some later stage I just take this offending DOI out (against the prior manual review), as it can cause auto-failures (all cites are in the examples/docs as well).

OK, so package version incremented and namespace error fixed. Now in manual review for the 1.0.1 version, get a note back to fix my errors – one of my tests fails on noLD/M1Mac (what is noLD you may ask? It is “no long doubles”). These technically failed on prior as well, but I thought I just needed to pass 2+ OS’s to get on CRAN. I send an email to Uwe Ligges at this point (as he sent an email about errors in prior 1.0.0 versions at well) to get clarity about what exactly they care about (since the reason I started round 2 was because of the Namespace threat, not the test errors on Macs/noLD). Uwe responds very fast they care about my test that fails, not the DOI/namespace junk.

So in some of my exact tests I have checks along the line ref <- c(0.25,0.58); act <- round(f,2) where f is the results scooped up from my prior function calls. The note rounds the results to the first digit, e.g. 0.2 0.5 in the failure (I suspect this is some behavior for testhat in terms of what is printed to the console for the error, but I don’t know how exactly to fix the function calls so no doubles will work). I just admit defeat and comment out the part of this test function that I think is causing the failure, any solution I am not personally going to be able to test in my setup to see if it works. Caveat Emptor, be aware my exact test power calculation functions are not so good if you are on a machine that can’t have long doubles (or M1 Mac’s I guess, I don’t fricken know).

OK, so that one test fixed, upon resubmission (the following day) I get a new error in my tests (now on Windows) – Error in sp::CRS(...): PROJ4 argument-value pairs must begin with +. I have no clue why this is showing an error now, for the first time going on close to 20 submissions over the past month and a half.

The projection string I pass definitely has a “+” at the front – I don’t know and subsequent submissions to CRAN even after my attempts to fix (submitting projections with simpler epsg codes) continue to fail now. I give up and just remove that particular test.

Uwe sends an updated email in manual review, asking why I removed the tests and did not fix them (or fix my code). I go into great detail about the new SP error (that I don’t think is my issue), and that I don’t know the root cause of the noLD/Mac error (and I won’t be able to debug before 8/8), that the code has pretty good test coverage (those functions pass the other tests for noLD/Mac, just one), and ask for his grace to upload. He says OK patch is going to CRAN. It has been 24 hours since then, so cannot say for sure I will not get a ‘will be removed’ auto-email.

To be clear these issues back and forth are on me (I am sure the \donttest{} note was somewhere in online documentation that I should have known). About the only legit complaint I have in the process is that the “Note” failure carries with it some ambiguity – some notes are deal breakers and others aren’t. I suspect this is because many legacy packages fail these stringent of checks though, so they need to not auto-fail and have some discretion.

The noLD errors make me question reality itself – does 0.25 = 0.2 according to M1 Mac’s? Have I been living a lie my whole life? Do I really know my code works? I will eventually need to spin up a Docker image and try to replicate the noLD environment to know what is going on with that one exact test power function.

For the projection errors, I haven’t travelled much recently – does Long Island still exist? Is the earth no longer an ellipsoid? At our core are we just binary bits flipping the neural networks of our brain – am I no better than the machine?

There is an irony here that people with better test code coverage are more likely to fail the auto-checks (although those packages are also more likely to be correct!). It is intended and reasonable behavior from CRAN, but it puts a very large burden on the developer (it is not easy to debug noLD behavior on your own, and M1 Mac’s are effectively impossible unless you wish to pony up the cash for one).


CRAN’s model is much different than python’s PyPI, in that I could submit something to PyPI that won’t install at all, or will install but cause instant errors when running import mypackage. CRANs approach is more thorough, but as I attest to above is quite a bit on the pedantic side (there are no “functional” changes to my code in the last month I went through the back and forth).

The main thing I really care about in a package repository is that it does not have malicious code that does suspicious os calls and/or sends suspicious things over the internet. It is on me to verify the integrity of the code in the end (even if the examples work it doesn’t mean the code is correct, I have come across a few packages on R that have functions that are obviously wrong/misleading). This isn’t an open vs closed source thing – you need to verify/sanity check some things work as expected on your own no matter what.

So I am on the fence whether CRAN’s excessive checking is “worth it” or not. Ultimately since you can do:

library(devtools)
install_github("apwheele/ptools")

Maybe it does not matter in the end. And you can peruse the github actions to see the current state of whether it runs on different operating systems and avoid CRAN altogether.

Job advice for entry crime analysts

I post occasionally on the Crime Analysis Reddit, and a few recent posts I mentioned about expanding the net to private sector gigs for those interested in crime analysis. And got a question from a recent student as well, so figured a blog post on my advice is in order.

For students interested in crime analysis, it is standard advice to do an internship (while a student), and that gets you a good start on networking. But if that ship has sailed and you are now finished with school and need to get a job that does not help. Also standard to join the IACA (and if you have a local org, like TXLEAN for Texas, you can join that local org and get IACA membership at the same time). They have job boards for openings, and for local it is a good place to network as well for entry level folks. IACA has training material available as well.

Because there are not that many crime analysis jobs, I tell students to widen their net and apply to any job that lists “analyst” in the title. We hire many “business analysts” at Gainwell, and while having a background in healthcare is nice it is not necessary. They mostly do things in Excel, Powerpoint, and maybe some SQL. Probably more have a background in business than healthcare specifically. Feel free to take any background experience in the job description not as requirements but as “nice to have”.

These are pretty much the same data skills people use in crime analysis. So if you can do one you can do the other.

This advice is also true for individuals who are currently crime analysts and wish to pursue other jobs. Unfortunately because crime analysis is more niche in departments, there is not much upward mobility. Other larger organizations that have analysts will just by their nature have more senior positions to work towards over your career. Simultaneously you are likely to have a larger salary in the private sector than public sector for even the same entry level positions.

Don’t get the wrong impression on the technical skills needed for these jobs if you read my blog. Even more advanced data science jobs I am mostly writing python + SQL. I am not writing bespoke optimization functions very often. So in terms of skills for analyst positions I just suggest focusing on Excel. My crime analysis course materials I intentionally did in a way to get you a broad background that is relevant for other analyst positions as well (some SQL/Powerpoint, but mostly Excel).

Sometimes people like to think doing crime analysis is a public service, so look down on going to private sector. Plenty of analysts in banks/healthcare do fraud/waste/abuse that have just as large an impact on the public as do crime analysts, so I think this opinion is misguided in general.

Many jobs at Gainwell get less than 10 applicants. Even if these jobs have listed healthcare background requirements, if they don’t have options among the pool those doing the hiring will lower their expectations. I imagine it is the same for many companies. Just keep applying to analyst jobs and you will land something eventually.

I wish undergrad programs did a better job preparing social science students with tech skills. It is really just minor modifications – courses teaching Excel/SQL (maybe some coding for real go-getters). Better job at making stats relevant to the real world business applications (calculating expected values/variance and trends in those is a common task, doing null hypothesis significance testing is very rare). But you can level up on Excel with various online resources, my course included.

Estimating Criminal Lifetime Value

At work I am currently working on estimating/forecasting healthcare spending. Similar to work I have done on forecasting person level crime risks (Wheeler et al., 2019), I build the predictive model dataset like this:

CrimeYear2020 PriorCrimeA PriorCrimeB
     0              2          3
     1              5          0
     0              0          0

etc. So I flatten people to a single row, and as covariates include prior cumulative crime histories. Most people do this similarly in the healthcare setting, so it looks like:

SpendingYear2020 PriorComorbidA PriorComorbidB
     3000              1          2
      500              3          0
    10000              0          0

Or sometimes people do a longitudinal dataset, where it is a spending*year*person panel (see Lauffenburger et al., 2020 for an example). I find this approach annoying for a few reasons though. One, it requires arbitrary temporal binning, which is somewhat problematic in these transaction level databases. We are talking for chronic offenders a few crimes per year is quite high, and ditto in the healthcare setting a few procedures a year can be very costly. So there is not much data to estimate the underlying function over time.

A second aspect I think is bad is that it doesn’t take into account the recency of the patterns. So the variables on the right hand side can be very old or very new. And with transaction level databases it is somewhat difficult to define how to estimate the lookback – do you consider it normalized by time? The VOID paper I mentioned we evaluated the long term scores, but the PD that does that chronic offender system has two scores – one a cumulative history and another a 90 day history to attempt to deal with that issue (again ad-hoc).

One approach to this issue from marketing research I have seen from very similar types of transactions databases are models to estimate Customer Lifetime Value (Fader et al. 2005). These models in the end generate a dataset that looks like this:

Person    RecentMonths  TotalEvents AveragePurchase
  A            3             5            $50
  B            1             2           $100
  C            9             8            $25

TotalEvents should be straightforward, RecentMonths just is a measure of the time since the last purchase, and then you have the average value of the purchases. And using just this data, estimates the probability of any future purchases, as well as projects the total value of the future average purchases. So here I use an example of this approach, using the Wolfgang Philly cohort public data. I am not going into the model more specifically (read some of the Bruce Hardie notes to get a flavor).

I have created some python code to follow along and apply these same customer lifetime value estimates to chronic offender data. Most examples of weighting crime harm apply it to spatial areas (Mitchell, 2019; Wheeler & Reuter, 2021), but you can apply it the same to chronic offender lists (Liggins et al., 2019).

Example Criminal Lifetime Value in Python

First, install the lifetimes python library – Cam’s documentation is excellent and makes the data manipulation/modelling quite simple.

Here I load in the transaction level crime data, e.g. it just have person A, 1/5/1960, 1000, where the 1000 is a crime seriousness index created by Wolfgang. Then the lifetimes package has some simple functions to turn our data into the frequency/recency format.

Note that for these models, you drop the first event in the series. To build a model to do train/test, I also split the data into evens before 1962, and use 1962 as the holdout test period.

import lifetimes as lt
import pandas as pd

# Just the columns from dataset II
# ID, SeriousScore, Date
df = pd.read_csv('PhilData.csv')
df['Date'] = pd.to_datetime(df['Date'])

# Creating the cumulative data
# Having holdout for one year in future
sd = lt.utils.calibration_and_holdout_data(df,'ID','Date',
              calibration_period_end='12-31-1961',
              observation_period_end='12-31-1962',
              freq='M',
              monetary_value_col='SeriousScore')

# Only keeping people with 2+ events in prior period
sd = sd[sd['frequency_cal'] > 0].copy()
sd.head()

Recency_cal is how many months since a prior crime (starting in 1/1/1962), frequency is the total number of events (minus 1, so number of repeat events technically), and the monetary_value_cal here is the average of the crime seriousness across all the events. The way this function works, the variables with the subscript _cal are in the training period, and _holdout are events in the 1962 period. For subsequent models I subset out people with at least 2 events total in the modeling.

Now we can fit a model to estimate the predicted number of future crimes a person will commit – so this does not take into account the seriousness of those crimes. The final groupby statement shows the predicted number of crimes vs those actually committed, broken down by number of crimes in the training time period. You can see the model is quite well calibrated over the entire sample.

# fit BG model
bgf = lt.BetaGeoFitter(penalizer_coef=0)
bgf.fit(sd['frequency_cal'],sd['recency_cal'],sd['T_cal'])

# look at fit of BG model
t = 12
sd['pred_events'] = bgf.conditional_expected_number_of_purchases_up_to_time(t, sd['frequency_cal'], sd['recency_cal'],sd['T_cal'])
sd.groupby('frequency_cal',as_index=False)[['frequency_holdout','pred_events']].sum() # reasonable

Now we can fit a model to estimate the average crime severity score for an individual as well. Then you can project a future cumulative score for an offender (here over a horizon of 1 year), by multiple the predicted number of events times the estimate of the average severity of the events, what I label as pv here:

# See conditional seriousness
sd['pred_ser'] = ggf.conditional_expected_average_profit(
                              sd['frequency_cal'],
                              sd['monetary_value_cal'])

sd['pv'] = sd['pred_ser']*sd['pred_events']
sd['cal_tot_val'] = sd['monetary_value_holdout']*sd['frequency_holdout']
# Not great correlation, around 0.2
vc = ['frequency_holdout','monetary_value_holdout','cal_tot_val','pred_events','pv']
sd[vc].corr()

The correlation between pv and the holdout cumulative crime severity cal_tot_val, is not great at 0.26. But lets look at this relative to the more typical approach analysts will do, simply rank prior offenders based on either total number of events or the crime seriousness:

# Lets look at this method via just ranking prior
# seriousness or frequency
sd['rank_freq'] = sd['frequency_cal'].rank(method='first',ascending=True)
sd['rank_seri'] = (sd['monetary_value_cal']*sd['frequency_cal']).rank(method='first',ascending=True)
vc += ['rank_freq','rank_seri']
sd[vc].corr()[vc[-3:]]

So we can see that pv outperforms ranking based on total crimes (rank_freq), or ranking based on the cumulative serious score for offenders (rank_seri) in terms of the correlation for either the total number of future events or the cumulative crime harm.

If we look at capture rates, e.g. pretend we highlight the top 50 chronic offenders for intervention, we can see the criminal lifetime value pv estimate outperforms either simple ranking scheme by quite a bit:

# Look at capture rates by ranking
topn = 50
res_summ = []
for v in vc[-3:]:
    rank = sd[v].rank(method='first',ascending=False)
    locv = sd[rank <= topn].copy()
    tot_crimes = locv['frequency_holdout'].sum()
    tot_ser = locv['cal_tot_val'].sum()
    res_summ.append( [v,tot_crimes,tot_ser,topn] )

res_df = pd.DataFrame(res_summ,columns=['Var','TotCrimes','TotSer','TotN'])
res_df

In terms of the seriousness projection, it is reasonably well calibrated over the entire sample, but has a very tiny variance – it basically just predicts the average crime serious score over the sample and assigns that as the prediction going forward:

# Cumulative stats over sample reasonable
# variance much too small
sd[['cal_tot_val','pv']].describe()

So what this means is that if say Chicago READI wanted to do estimates to reasonably justify the max dollar cost for their program (over a large number of individuals) that would be reasonable. And this is how most marketing people use this info, average benefits of retaining a customer.

For individual projections though, e.g. I think OffenderB will generate between [low,high] crime harm in the next year, this is not quite up to par. I am hoping though to pursue these models further, maybe either in a machine learning/regression framework to estimate the parameters directly, or to use mixture models in an equivalent way that marketers use “segmentation” to identify different types of customers. Knowing the different way people have formulated models though is very helpful to be able to build a machine learning framework, which you can incorporate covariates.

References

Staggered Treatment Effect DiD count models

So I have been dealing with various staggered treatments for difference-in-difference (DiD) designs for crime data analysis on how interventions reduce crime. I’ve written about in the past mine and Jerry’s WDD estimator (Wheeler & Ratcliffe, 2018), as well as David Wilson’s ORR estimator (Wilson, 2022).

There has been quite a bit of work in econometrics recently describing how the traditional way to apply this design to staggered treatments using two-way fixed effects can be misleading, see Baker et al. (2022) for human readable overview.

The main idea is that in the scenario where you have treatment heterogeneity (TH from here on) (either over time or over units), the two-way fixed effects estimator is a weird average that can misbehave. Here are just some notes of mine though on fitting the fully saturated model, and using post-hoc contrasts (in R) to look at that TH as well as to estimate more reasonable average treatment effects.

So first, we can trick R to use glm to get my WDD estimator (or of course Wilson’s ORR estimator) for the DiD effect with count data. Here is a simple example from my prior blog post:

# R code for DiD model of count data
count <- c(50,30,60,55)
post <- c(0,1,0,1)
treat <- c(1,1,0,0)

df <- data.frame(count,post,treat)

# Wilson ORR estimate
m1 <- glm(count ~ post + treat + post*treat,data=df,family="poisson")
summary(m1)

And here is the WDD estimate using glm passing in family=poisson(link="identity"):

m2 <- glm(count ~ post + treat + post*treat,data=df,
          family=poisson(link="identity"))
summary(m2)

And we can see this is the same as my WDD in the ptools package:

library(ptools) # via https://github.com/apwheele/ptools
wdd(c(60,55),c(50,30))

Using glm will be more convenient than me scrubbing up all the correct weights, as I’ve done in the past examples (such as temporal weights and different area sizes). It is probably the case you can use different offsets in regression to accomplish similar things, but for this post just focusing on extending the WDD to varying treatment timing.

Varying Treatment Effects

So the above scenario is a simple pre/post with only one treated unit. But imagine we have two treated units and three time periods. This is very common in real life data where you roll out some intervention to more and more areas over time.

So imagine we have a set of crime data, G1 is rolled out first, so the treatment is turned on for periods One & Two, G2 is rolled out later, and so the treatment is only turned on for period Two.

Period    Control     G1     G2
Base          50      70     40
One           60      70     50
Two           70      80     50

I have intentionally created this example so the average treatment effect per period per unit is 10 crimes. So no TH. Here is the R code to show off the typical default two-way fixed effects model, where we just have a dummy variable for unit+timeperiods that are treated.

# Examples with Staggered Treatments
df <- read.table(header=TRUE,text = "
 Period    Control     G1     G2
 Base          50      70     40
 One           60      70     50
 Two           70      80     50
")

# reshape wide to long
nvars <- c("Control","G1","G2")
dfl <- reshape(df,direction="long",
               idvar="Period",
               varying=list(nvars),
               timevar="Unit")

dfl$Unit <- as.factor(dfl$Unit)
names(dfl)[3] <- 'Crimes'

# How to set up design matrix appropriately?
dfl$PostTreat <- c(0,0,0,0,1,1,0,0,1)

m1 <- glm(Crimes ~ PostTreat + Unit + Period,
          family=poisson(link="identity"),
          data=dfl)

summary(m1) # TWFE, correct point estimate

The PostTreat variable is the one we are interested in, and we can see that we have the correct -10 estimate as we expected.

OK, so lets create some treatment heterogeneity, here now G1 has no effects, and only G2 treatment works.

dfl[dfl$Unit == 2,'Crimes'] <- c(70,80,90)

m2 <- glm(Crimes ~ PostTreat + Unit + Period,
          family=poisson(link="identity"),
          data=dfl)

summary(m2) # TWFE, estimate -5.29, what?

So you may naively think that this should be something like -5 (average effect of G1 + G2), or -3.33 (G1 gets a higher weight since it is turned on for the 2 periods, whereas G2 is only turned on for 1). But nope rope, we get -5.529.

We can estimate the effects of G1 and G2 seperately though in the regression equation:

# Lets seperate out the two units effects
dfl$pt1 <- 1*(dfl$Unit == 2)*dfl$PostTreat
dfl$pt2 <- 1*(dfl$Unit == 3)*dfl$PostTreat

m3 <- glm(Crimes ~ pt1 + pt2 + Unit + Period,
          family=poisson(link="identity"),
          data=dfl)

summary(m3) # Now we get the correct estimates

And now we can see that as expected, the effect for G2 is the pt2 coefficient, which is -10. And the effect for G1, the pt1 coefficient, is only floating point error different than 0.

To then get a cumulative crime reduction effect for all of the areas, we can use the multcomp library and the glht function and construct the correct contrast matrix. Here the G1 effect gets turned on for 2 periods, and the G2 effect is only turned on for 1 period.

library(multcomp)
cont <- matrix(c(0,2,1,0,0,0,0),1)
cumtreat <- glht(m3,cont) # correct cumulative
summary(cumtreat)

And if we want an ‘average treatment effect per unit and per period’, we just change the weights in the contrast matrix:

atreat <- glht(m3,cont/3) # correct average over 3 periods
summary(atreat)

And this gets us our -3.33 that is a more reasonable average treatment effect. Although you would almost surely just focus on that the G2 area intervention worked and the G1 area did not.

You can also fit this model alittle bit easier using R’s style formula instead of rolling your own dummy variables via the formula Crimes ~ PostTreat:Unit + Unit + Period:

But, glht does not like it when you have dropped levels in these interactions, so I don’t do this approach directly later on, but construct the model matrix and drop non-varying columns.

Next lets redo the data again, and now have time varying treatments. Now only period 2 is effective, but it is effective across both the G1 and G2 locations. Here is how I construct the model matrix, and what the resulting sets of dummy variables looks like:

# Time Varying Effects
# only period 2 has an effect

dfl[dfl$Unit == 2,'Crimes'] <- c(70,80,80)

# Some bookkeeping to make the correct model matrix
mm <- as.data.frame(model.matrix(~ -1 + PostTreat:Period + Unit + Period, dfl))
mm <- mm[,names(mm)[colSums(mm) > 0]] # dropping zero columns
names(mm) <- gsub(":","_",names(mm))  # replacing colon
mm$Crimes <- dfl$Crimes
print(mm)

Now we can go ahead and fit the model without the intercept.

# Now can fit the model
m6 <- glm(Crimes ~ . -1,
          family=poisson(link="identity"),
          data=mm)

summary(m6)

And you can see we estimate the correct effects here, PostTreat_PeriodOne has a zero estimate, and PostTreat_PeriodTwo has a -10 estimate. And now our cumulative crimes reduced estimate -20

cumtreat2 <- glht(m6,"1*PostTreat_PeriodOne + 2*PostTreat_PeriodTwo=0")
summary(cumtreat2)

And if we did the average, it would be -6.66.

Now for the finale – we can estimate the saturated model with time-and-unit varying treatment effects. Here is what the design matrix looks like, just a bunch of columns with a single 1 turned on:

# Now for the whole shebang, unit and period effects
mm2 <- as.data.frame(model.matrix(~ -1 + Unit:PostTreat:Period + Unit + Period, dfl))
mm2 <- mm2[,names(mm2)[colSums(mm2) > 0]] # dropping zero columns
names(mm2) <- gsub(":","_",names(mm2))  # replacing colon
mm2$Crimes <- dfl$Crimes
print(mm2)

And then we can fit the model the same way:

m7 <- glm(Crimes ~ . -1,
          family=poisson(link="identity"),
          data=mm2)

summary(m7) # Now we get the correct estimates

And you can see our -10 estimate for Unit2_PostTreat_PeriodTwo and Unit3_PostTreat_PeriodTwo as expected. You can probably figure out how to get the cumulative or the average treatment effects at this point:

tstr <- "Unit2_PostTreat_PeriodOne + Unit2_PostTreat_PeriodTwo + Unit3_PostTreat_PeriodTwo = 0"
cumtreat3 <- glht(m7,tstr)
summary(cumtreat3)

We can also use this same framework to get a unit and time varying estimate for Wilson’s ORR estimator, just using family=poisson with its default log link function:

m8 <- glm(Crimes ~ . -1,
          family=poisson,
          data=mm2)

summary(m8)

It probably does not make sense to do a cumulative treatment effect in this framework, but I think an average is OK:

avtreatorr <- glht(m8,
  "1/3*Unit2_PostTreat_PeriodOne + 1/3*Unit2_PostTreat_PeriodTwo + 1/3*Unit3_PostTreat_PeriodTwo = 0")
summary(avtreatorr)

So the average linear coefficient is -0.1386, and if we exponentiate that we have an IRR of 0.87, so on average when a treatment occurred in this data a 13% reduction. (But beware, I intentionally created this data so the parallel trends for the DiD analysis were linear, not logarithmic).

Note if you are wondering about robust estimators, Wilson suggests using quasipoisson, e.g. glm(Crimes ~ . -1,family="quasipoisson",data=mm2), which works just fine for this data. The quasipoisson or other robust estimators though return 0 standard errors for the saturated family=poisson(link="identity") or family=quasipoisson(link="identity").

E.g. doing

library(sandwich)
cumtreat_rob <- glht(m7,tstr,vcov=vcovHC,type="HC0")
summary(cumtreat_rob)

Or just looking at robust coefficients in general:

library(lmtest)
coeftest(m7,vcov=vcovHC,type="HC0")

Returns 0 standard errors. I am thinking with the saturated model and my WDD estimate, you get the issue with robust standard errors described in Mostly Harmless Econometrics (Angrist & Pischke, 2008), that they misbehave in small samples. So I am a bit hesitant to suggest them without more work to establish they behave the way they should in smaller samples.

References

  • Angrist, J.D., & Pischke, J.S. (2008). Mostly Harmless Econometrics. Princeton University Press.
  • Baker, A.C., Larcker, D.F., & Wang, C.C. (2022). How much should we trust staggered difference-in-differences estimates? Journal of Financial Economics, 144(2), 370-395.
  • Wheeler, A.P., & Ratcliffe, J.H. (2018). A simple weighted displacement difference test to evaluate place based crime interventions. Crime Science, 7(1), 1-9.
  • Wilson, D.B. (2022). The relative incident rate ratio effect size for count-based impact evaluations: When an odds ratio is not an odds ratio. Journal of Quantitative Criminology, 38(2), 323-341.

The limit on the cost efficiency of gun violence interventions

Imagine a scenario where someone came out with technology that would 100% reduce traffic fatalities at a particular curve in a road. But, installation and maintenance of the tech would cost $36 million dollars per 100 feet per year. It is unlikely anyone would invest in such technology – perhaps if you had a very short stretch of road that resulted in a fatality on average once a month it would be worth it. In that case, the tech would result in $36/12 = $3 million dollars to ‘save a life’.

There are unlikely any stretches of roads that have this high of fatality rate though (and this does not consider potential opportunity costs of less effective but cheaper other interventions). So if we had a location that has a fatality once a year, we are then paying $36 million dollars to save one life. We ultimately have upper limits on what society will pay to save a life.

Working on gun violence prevention is very similar. While gun violence has potentially very large costs to society, see Everytown’s estimates of $50k to a nonfatal shooting and $270k for a fatality, preventing that gun violence is another matter.

The translation to gun violence interventions from the traffic scenario is ‘we don’t have people at super high risk of gun violence’ and ‘the interventions are not going to be 100% effective’.

My motivation to write this post is the READI intervention in Chicago, which has a price tag of around $60k per participant per 20 months. What makes this program then ‘worth it’ is the probability of entrants being involved with gun violence multiplied by the efficacy of the program.

Based on other work I have done on predicting gun violence (Wheeler et al., 2019b), I guesstimate that any gun violence predictive instrument spread over a large number of individuals will have at best positive predictive probabilities of 10% over a year. 10% risk of being involved in gun violence is incredibly high, a typical person will have something more on the order of 0.01% to 0.001% risk of being involved with gun violence. So what this means is if you have a group of 100 high risk people, I would expect ~10 of them to be involved in a shooting (either as a victim or offender).

This lines up almost perfectly with READI, which in the control group had 10% shot over 20 months. So I think READI actual did a very good job of referring high risk individuals to the program. I don’t think they could do any better of a job in referring even higher risk people.

This though implies that even with 100% efficacy (i.e. anyone who is in READI goes to 0% risk of involvement in gun violence), you need to treat ~10 people to prevent ~1 shooting victimization. 100% efficacy is not realistic, so lets go with 50% efficacy (which would still be really good for a crime prevention program, and is probably way optimistic given the null results). Subsequently this implies you need to treat ~20 people to prevent ~1 shooting. This results in a price tag of $1.2 million to prevent 1 shooting victimization. If we only count the price of proximal gun violence (as per the Everytown estimates earlier), READI is already cost-inefficient from the get go – a 100% efficacy you would still need around 10 people (so $600k) to reduce a single shooting.

The Chicago Crime Lab uses estimates from Cohen & Piquero (2009) to say that READI has a return on investment of 3:1, so per $60k saves around $180. These however count reductions over the life-course, including person lost productivity, not just state/victim costs, which I think are likely to be quite optimistic for ROI that people care about. (Productivity estimates always seem suspect to me, models I have put into production in my career have generated over 8 digits of revenue, but if I did not do that work someone else would have. I am replaceable.)

I think it is likely one can identify other, more cost effective programs to reduce gun violence compared to READI. READI has several components, part of which is a caseworker, cognitive behavior therapy (CBT), and a jobs program. I do not know cost breakdowns for each, but it may be some parts drive up the price without much benefit over the others.

I am not as much on the CBT bandwagon as others (I think it looks quite a bit like the other pysch research that has come into question more recently), but I think caseworkers are a good idea. The police department I worked with on the VOID paper had caseworkers as part of their intervention, as did focused deterrence programs I have been involved with (Wheeler et al., 2019a). Wes Skogan even discussed how caseworkers were part of Chicago CEASEFIRE/outreach workers on Jerry Ratcliffe’s podcast. For those not familiar, case workers are just social workers assigned to these high risk individuals, and they often help their charges with things like getting an ID/Drivers License and applying to jobs. So just an intervention of caseworkers assigned to high risk people I think is called for.

You may think many of these high risk individuals are not amenable to treatment, but my experience is a non-trivial number of them are willing to sit down and try to straighten their lives out, and they need help to do that it. Those are people case workers are a good potential solution.

Although I am a proponent of hot spots policing as well, if we are just talking about shootings, I don’t think hot spots will have a good return on investment either (Drake et al., 2022). Only if you widen the net to other crimes do a think hot spots makes sense (Wheeler & Reuter, 2021). And maybe here I am being too harsh, if you reduce other criminal behavior READIs cost-benefit ratio likely looks better. But just considering gun violence, I think dropping $60k per person is never going to be worth it in realistic high gun violence risk populations.

References

Home buying and collective efficacy

With the recent large appreciation in home values, around 20% in the prior year, there have been an increase in private investors purchasing homes to rent out. Recent stories on this by Tyler Dukes and colleagues have collated open parcel data to identify the scope of these companies across all of North Carolina.

For bit of background, I tried to purchase a home in Plano, TX early 2018. Homes in our price range at that time were going in a single day and typically a few thousand over asking price.

Fast forward to early 2021, I am full remote data scientist instead of a professor, and kiddo is in online school. Even with the pay bump, housing competition was even worse in Plano at this point, so we knew we were likely going to have to move school districts to be able to purchase a home. So we decided to strike out, and ended up looking around Raleigh. Ended up quite quickly deciding to purchase a new build home in the suburb of Clayton (totally recommend our realtor, Ellen Pitts, her crew did quite a bit of work for us remotely).

I was lucky to get in then it appears – many of the new developments in the area are being heavily scooped up by these equity firms (and rent would be ~$600 more for my home than the mortgage). So I downloaded the public data Dukes put together, and loaded it into Excel to make a quick map of the properties.

For a NC state view, we have big clusters in Charlotte, Greensboro and Raleigh:

We can zoom in, and here is an overview of triangle area:

So you can see that inside the loop in Raleigh is pretty sparse, but many of the newer developments on the east side have many more of the private firm purchased houses. Charlotte is much more infilled with these private firms purchasing properties.

Zooming in even further to my town of Clayton, there is quite a bit of variance in the proportion of private vs residential purchases across various developments. My development is less than 50% of these purchases, several developments though appear almost 100% private purchased though. (This is not my home/neighborhood FYI.)


So what does this have to do with collective efficacy? Traditionally areas with higher home ownership have been associated with lower rates of crime. For not criminologists reading my blog, one of the most prominent criminological theories is that state actions only move the needle slightly on increasing/decreasing crime, people enforcing social norms is a bigger factor that explains high crime vs low crime areas. Places with people churning out more frequently – which occurs in areas with more renters – tend to have fewer people effectively keeping the peace. Because social scientists love to make up words, we call this concept collective efficacy.

Downloading and looking at this data, while I was mostly just interested in zooming into my neighborhood and seeing the infill of renters, sparked a criminological hypothesis: I expect neighborhoods with higher rates of private equity purchased housing in the long run to have higher rates of criminal behavior.

This hypothesis will be difficult to test in the wild. It is partially confounded with capital – those who buy their homes accumulate more wealth over time (again mortgage is quite a bit cheaper than rent, so even ignoring home value appreciation this is true). But the variance in the number of homes purchased by private equity firms in different areas makes me wonder if there is enough variation to do a reasonable research design to test my hypothesis, especially in the Charlotte area in say two or three years post a development being finished.

An update on the WaPo Officer Involved Shooting Stats

Marisa Iati interviewed me for a few clips in a recent update of the WaPo data on officer involved fatal police shootings. I’ve written in the past the data are very consistent with a Poisson process, and this continues to be true.

So first thing Marisa said was that shootings in 2021 are at 1055 (up from 1021 in 2020). Is this a significant increase? I said no off the cuff – I knew the average over the time period WaPo has been collecting data is around 1000 fatal shootings per year, so given a Poisson distribution mean=variance, we know the standard deviation of the series is close to sqrt(1000), which approximately equals 60. So anything 1000 plus/minus 60 (i.e. 940-1060) is within the typical range you would expect.

In every interview I do, I struggle to describe frequentist concepts to journalists (and this is no different). This is not a critique of Marisa, this paragraph is certainly not how I would write it down on paper, but likely was the jumble that came out of my mouth when I talked to her over the phone:

Despite setting a record, experts said the 2021 total was within expected bounds. Police have fatally shot roughly 1,000 people in each of the past seven years, ranging from 958 in 2016 to last year’s high. Mathematicians say this stability may be explained by Poisson’s random variable, a principle of probability theory that holds that the number of independent, uncommon events in a large population will remain fairly stagnant absent major societal changes.

So this sort of mixes up two concepts. One, the distribution of fatal officer shootings (a random variable) can be very well approximated via a Poisson process. Which I will show below still holds true with the newest data. Second, what does this say about potential hypotheses we have about things that we think might influence police behavior? I will come back to this at the end of the post,

R Analysis at the Daily Level

So my current ptools R package can do a simple analysis to show that this data is very consistent with a Poisson process. First, install the most recent version of the package via devtools, then you can read in the WaPo data directly via the Github URL:

library(devtools)
install_github("apwheele/ptools")
library(ptools)

url <- 'https://raw.githubusercontent.com/washingtonpost/data-police-shootings/master/fatal-police-shootings-data.csv'
oid <- read.csv(url,stringsAsFactors = F)

Looking at the yearly statistics (clipping off events recorded so far in 2022), you can see that they are hypothetically very close to a Poisson distribution with a mean/variance of 1000, although perhaps have a slow upward trend over the years.

# Year Stats
oid$year <- as.integer(substr(oid$date,1,4))
year_stats <- table(oid$year)
print(year_stats)
mean(year_stats[1:7]) # average of 1000 per year
var(year_stats[1:7])  # variance just under 1000

We can also look at the distribution at shorter time intervals, here per day. First I aggregat the data to the daily level (including 0 days), second I use my check_pois function to get the comparison distributions:

#Now aggregating to count per day
oid$date_val <- as.Date(oid$date)
date_range <- paste0(seq(as.Date('2015-01-01'),max(oid$date_val),by='days'))
day_counts <- as.data.frame(table(factor(oid$date,levels=date_range)))
head(day_counts)

pfit <- check_pois(day_counts$Freq, 0, 10, mean(day_counts$Freq))
print(pfit)

The way to read this, for a mean of 2.7 fatal OIS per day (and given this many days), we would expect 169.7 0 fatality days in the sample (PoisF), but we actually observed 179 0 fatality days, so a residual of 9.3 in the total count. The trailing rows show the same in percentage terms, so we expect 6.5% of the days in the sample to have 0 fatalities according to the Poisson distribution, and in the actual data we have 6.9%.

You can read the same for the rest of the rows, but it is mostly the same. It is only very slight deviations from the baseline Poisson expected Poisson distribution. This data is the closest I have ever seen to real life, social behavioral data to follow a Poisson process.

For comparison, lets compare to the NYC shootings data I have saved in the ptools package.

# Lets check against NYC Shootings
data(nyc_shoot)
date_range <- paste0(seq(as.Date('2006-01-01'),max(nyc_shoot$OCCUR_DATE),by='days'))
shoot_counts <- as.data.frame(table(factor(nyc_shoot$OCCUR_DATE,levels=date_range)))

sfit <- check_pois(shoot_counts$Freq,0,max(shoot_counts$Freq),mean(shoot_counts$Freq))
round(sfit,1)

This is much more typical of crime data I have analyzed over my career (in that it deviates from a Poisson process by quite a bit). The mean is 4.4 shootings per day, but the variance is over 13. There are many more 0 days than expected (433 observed vs 73 expected). And there are many more high crime shooting days than expected (tail of the distribution even cut off). For example there are 27 days with 18 shootings, whereas a Poisson process would only expect 0.1 days in a sample of this size.

My experience though is that when the data is overdispersed, a negative binomial distribution will fit quite well. (Many people default to a zero-inflated, like Paul Allison I think that is a mistake unless you have a structural reason for the excess zeroes you want to model.)

So here is an example of fitting a negative binomial to the shooting data:

# Lets fit a negative binomial and check out
library(fitdistrplus)
fnb <- fitdist(shoot_counts$Freq,"nbinom")
print(fnb$estimate)

sfit$nb <- 100*mapply(dnbinom, x=sfit$Int, size=fnb$estimate[1], mu=fnb$estimate[2])
round(sfit[,c('Prop','nb')],1) # Much better overall fit

And this compares the percentages. So you can see observed 7.5% 0 shooting days, and expected 8.6% according to this negative binomial distribution. Much closer than before. And the tails are fit much closer as well, for example, days with 18 shootings are expected 0.2% of the time, and are observed 0.4% of the time.

So What Inferences Can We Make?

In social sciences, we are rarely afforded the ability to falsify any particular hypothesis – or in more lay-terms we can’t really ever prove something to be false beyond a reasonable doubt. We can however show whether empirical data is consistent or inconsistent with any particular hypothesis. In terms of Fatal OIS, several ready hypotheses ones may be interested in are Does increased police scrutiny result in fewer OIS?, or Did the recent increase in violence increase OIS?.

While these two processes are certainly plausible, the data collected by WaPo are not consistent with either hypothesis. It is possible both mechanisms are operating at the same time, and so cancel each other out, to result in a very consistent 1000 Fatal OIS per year. A simpler explanation though is that the baseline rate has not changed over time (Occam’s razor).

Again though we are limited in our ability to falsify these particular hypotheses. For example, say there was a very small upward trend, on the order of something like +10 Fatal OIS per year. Given the underlying variance of Poisson variables, even with 7+ years of data it would be very difficult to identify that small of an upward trend. Andrew Gelman likens it to measuring the weight of a feather carried by a Kangaroo jumping on the scale.

So really we could only detect big changes that swing OIS by around 100 events per year I would say offhand. Anything smaller than that is likely very difficult to detect in this data. And so I think it is unlikely any of the recent widespread impacts on policing (BLM, Ferguson, Covid, increased violence rates, whatever) ultimately impacted fatal OIS in any substantive way on that order of magnitude (although they may have had tiny impacts at the margins).

Given that police departments are independent, this suggests the data on fatal OIS are likely independent as well (e.g. one fatal OIS does not cause more fatal OIS, nor the opposite one fatal OIS does not deter more fatal OIS). Because of the independence of police departments, I am not sure there is a real great way to have federal intervention to reduce the number of fatal OIS. I think individual police departments can increase oversight, and maybe state attorney general offices can be in a better place to use data driven approaches to oversee individual departments (like ProPublica did in New Jersey). I wouldn’t bet money though on large deviations from that fatal 1000 OIS anytime soon though.