How arrests reduce near repeats: Breaking the Chain paper published

My paper (with colleagues Jordan Riddell and Cory Haberman), Breaking the chain: How arrests reduce the probability of near repeat crimes, has been published in Criminal Justice Review. If you cannot access the peer reviewed version, always feel free to email and I can send an offprint PDF copy. (For those not familiar, it is totally OK/legal for me to do this!) Or if you don’t want to go to that trouble, I have a pre-print version posted here.

The main idea behind the paper is that crimes often have near-repeat patterns. That is, if you have a car break in on 100 1st St on Monday, the probability you have another car break in at 200 1st St later in the week is higher than typical. This is most often caused by the same person going and committing multiple offenses in a short time period. So a way to prevent that would on its face be to arrest the individual for the initial crime.

I estimate models showing the reduction in the probability of a near repeat crime if an arrest occurs, based on publicly available Dallas PD data (paper has links to replication code). Because near repeat in space & time is a fuzzy concept, I estimate models showing reductions in near repeats for several different space-time thresholds.

So here the model is Prob[Future Crime = I(time < t & distance < d)] ~ f[Beta*Arrest + sum(B_x*Control_x)] where the f function is a logistic function, and I plot the Beta estimates given different time and space look aheads. Points indicate statistical significance, so you can see they tend to be negative for many different crime and different specifications (with a linear coefficient of around -0.3).

Part of the reason I pursued this is that the majority of criminal justice responses to near repeat patterns in the past were target hardening or traditional police patrol. Target hardening (e.g. when a break in occurs, go to the neighbors and tell them to lock their doors) does not appear to be effective, but traditional patrol does (see the work of Rachel/Robert Santos for example).

It seems to me ways to increase arrest rates for crimes is a natural strategy that is worthwhile to explore for police departments. Easier said than done, but one way may be to prospectively identify incidents that are likely to spawn near repeats and give them higher priority in assigning detectives. In many urban departments, lower level property crimes are never assigned a detective at all.

Open Data and Reproducible Criminology Research

This is part of a special issue put together by Jonathan Grubb and Grant Drawve on spatial approaches to community violence. Jon and Grant specifically asked contributors to discuss a bit about open data standards and replication materials. I repost my thoughts on that here in full:

In reference to reproducibility of the results, we have provided replication materials. This includes the original data sources collated from open sources, as well as python, Stata, and SPSS scripts used to conduct the near-repeat analysis, prepare the data, generate regression models, and graph the results. The Dallas Police Department has provided one of the most comprehensive open sources of crime data among police agencies in the world (Ackerman & Rossmo, 2015; Wheeler et al., 2017), allowing us the ability to conduct this analysis. But it also identifies one particular weakness in the data as well – the inability to match the time stamp of the occurrence of an arrest to when the crime occurred. It is likely the case that open data sources provided by police departments will always need to undergo periodic revision to incorporate more information to better the analytic potential of the data.

For example, much analysis of the arrest and crime relationship relies on either aggregate UCR data (Chamlin et al., 1992), or micro level NIBRS data sources (Roberts, 2007). But both of these data sources lack specific micro level geographic identifiers (such as census tract or addresses of the events), which precludes replicating the near repeat analysis we conduct. If however NIBRS were to incorporate address level information, it would be possible to conduct a wide spread analysis of the micro level deterrence effects of arrests on near repeat crimes across many police jurisdictions. That would allow much broader generalizability of the results, and not be dependent on idiosyncratic open data sources or special relationships between academics and police departments. Although academic & police practitioner relationships are no doubt a good thing (for both police and academics), limiting the ability to conduct analysis of key policing processes to the privileged few is not.

That being said, currently both for academics and police departments there are little to no incentives to provide open data and reproducible code. Police departments have some slight incentives, such as assistance from governmental bodies (or negative conditions for funding conditional on reporting). As academics we have zero incentives to share our code for this manuscript. We do so simply because that is a necessary step to ensure the integrity of scientific research. Relying on the good will of researchers to share replication materials has the same obvious disadvantage that allowing police departments to pick and choose what data to disseminate does – it can be capricious. What a better system to incentivize openness may look like we are not sure, but both academics and police no doubt need to make strides in this area to be more professional and rigorous.

Podcast and Video Shout Outs

So y’all know I really enjoy blogs. So much so I think they often have a higher value added than traditional peer review papers. There are other mediums I would like to recognize, and those are Podcasts and video tutorials. So while I like to do lab tutorials (pretty much like my blog posts in which I step through some code), I know many students would prefer I do videos and lectures. And I admit some of these I have seen done quite well on Coursera for example.

Another source I have been consuming quite a bit lately are Podcasts. These often take the form of an interview. So are not technical in nature, but are more soft story telling, such as talking about a particular topical area the interviewee is expert in, or that persons career path. So here are my list of these resources I have personally learned from and enjoyed.

None of these I have listened/watched 100% of the offerings, but have listened/watch multiple episodes (and will continue to listen/watch more)! These are very criminal justice focused, so would love to branch out to data science and health care resources if folks have suggestions!

Podcasts

Reducing Crime – Jerry Ratcliffe interviews a mix of academics and folks working in the criminal justice field. I have quite a few of these episodes I found personally very informative. John Eck, Kim Rossmo, and Phil Goff were perhaps my favorites of academics. Danny Murphy and Thomas Abt were really good as well (for my favorite non-academics offhand).

Niro Knowledge – Nicholas Roy is a current crime analyst, and interviews other crime analysts and academics. Favorite interviews so far are Cynthia Lum and Renee Mitchell. Similar to reducing crime is typically more focused on a particular topic of interest to the person being interviewed (e.g. Renee talked about her work on crime harm indices).

Analyst Talk – This is a podcast hosted by Jason Elder where he interviews crime analysts from all over about their careers. Annie Thompson and my former colleague Shelagh Dorn’s are my favorite so far, but I also need to listen in sometime on Sean Bair’s series of talks as well.

Abt Podcasts – This I only came across a week ago, but have listened to several on data science, CJ, and social determinants of health. These are a bit different than the other podcasts here, they are shorter and have two individuals from different fields discuss social science relevant to the chosen topic.

Videos

Canadian Society of Evidence Based Policing – Has many interviews of academics in crim/cj. I have an interview with them (would not recommend, I need to work on sitting still!) I really enjoyed the Peter Neyroud interview though is my favorite.

UARK CASDAL – These are instructional videos uploaded by Grant Drawve, mostly around doing crime analysis in Excel, but also has a few in ArcGIS.

StatQuest with Josh Starmer – This is one of the few non crim/cj examples I watch regularly. As interview questions at my work place for entry data scientists we often ask folks to explain machine learning models (such as random forests or XGBoost) in some simple terms. These videos are excellent resources to get you to understand the basics of the mathematics behind the techniques.

Again let me know if of podcasts/video series I am missing out on in the comments!

The spatial dispersion of NYC shootings in 2020

If you had asked me at the start of widespread Covid lockdown measures what the effect would be on crime, I am pretty sure I would have guessed it will make crime go down. Fewer people out and about causes fewer interactions that can lead to a crime. That isn’t how it has shaped up though, quite a few places have seen increases in serious violent crime. One of the most dramatic examples of this is that shootings in NYC doubled from 900 in 2019 to over 1800 in 2020. I am going to show how to generate this chart later via some R code, but it is easier to show than to say. NYPD’s open data on shootings (historical, current) go back to 2006.

I know I am critical on this site of folks overinterpreting crime increases, for example going from 20 to 35 is pretty weak evidence of an increase given the inherent variance for low count Poisson data (a Poisson e-test has a p-value of 0.04 in that case). But going from 900 to 1800 is a much clearer signal.

Jerry Ratcliffe recently posted an R library to do his crime dispersion analysis, so I figured this would be an excellent example use case. The idea behind this analysis is spatial – we know there is a crime increase, but did the increase happen everywhere, or did it just happen in a few locations. Here I am going to use the NYPD shooting data aggregated at the precinct level to test this.

As another note, while I often use micro-spatial units of analysis in my work, this method, along with others (such as the sppt test), are just not going to work out for very low count, very tiny spatial units of analysis. I would suggest offhand to only do this analysis if the spatial units of analysis under study have an average of at least 10 crimes per area in the pre time period. Which is right about on the mark for the precinct analysis in NYC.

Here is the data and R code to follow along, below I will give a walkthrough.

Crime increase dispersion analysis in R

So first as some front matter, I load in my libraries (Jerry’s crimedispersion you can install from github via devtools, see his page for an example), and the function I define here I’ve gone over in a prior blog post of mine as well.

###############################
library(ggplot2)
library(crimedispersion)

# Increase contours, see https://andrewpwheeler.com/2020/02/21/some-additional-plots-to-go-with-crime-increase-dispersion/
make_cont <- function(pre_crime,post_crime,levels=c(-3,0,3),lr=10,hr=max(pre_crime)*1.05,steps=1000){
    #calculating the overall crime increase
    ov_inc <- sum(post_crime)/sum(pre_crime)
    #Making the sequence on the square root scale
    gr <- seq(sqrt(lr),sqrt(hr),length.out=steps)^2
    cont_data <- expand.grid(gr,levels)
    names(cont_data) <- c('x','levels')
    cont_data$inc <- cont_data$x*ov_inc
    cont_data$lines <- cont_data$inc + cont_data$levels*sqrt(cont_data$inc)
    return(as.data.frame(cont_data))
}

my_dir <- 'D:\\Dropbox\\Dropbox\\Documents\\BLOG\\NYPD_ShootingIncrease\\Analysis'
setwd(my_dir)
###############################

Now we are ready to import our data and stack them into a new data frame. (These are individual incident level shootings, not aggregated. If I ever get around to it I will do an analysis of fatality and distance to emergency rooms like I did with the Philly data.)

###############################
# Get the NYPD data and stack it
# From https://data.cityofnewyork.us/Public-Safety/NYPD-Shooting-Incident-Data-Year-To-Date-/5ucz-vwe8
# And https://data.cityofnewyork.us/Public-Safety/NYPD-Shooting-Incident-Data-Historic-/833y-fsy8
# On 2/1/2021
old <- read.csv('NYPD_Shooting_Incident_Data__Historic_.csv', stringsAsFactors=FALSE)
new <- read.csv('NYPD_Shooting_Incident_Data__Year_To_Date_.csv', stringsAsFactors=FALSE)

# Just one column off
print( cbind(names(old), names(new)) )
names(new) <- names(old)
shooting <- rbind(old,new)
###############################

Now we just want to do aggregate counts of these shootings per year and per precinct. So first I substring out the year, then use table to get aggregate counts in R, then make my nice time series graph using ggplot.

###############################
# Create the current year and aggregate
shooting$Year <- substr(shooting$OCCUR_DATE, 7, 10)
year_stats <- as.data.frame(table(shooting$Year))
year_stats$Year <- as.numeric(as.character(year_stats$Var1))
year_plot <- ggplot(data=year_stats, aes(x=Year,y=Freq)) + 
             geom_line(size=1) + geom_point(shape=21, colour='white', fill='black', size=4) +
             scale_y_continuous(breaks=seq(900,2100,by=100)) +
             scale_x_continuous(breaks=2006:2020) +
             theme(axis.title.x=element_blank(), axis.title.y=element_blank(),
                   panel.grid.minor = element_blank()) + 
             ggtitle("NYPD Shootings per Year")

year_plot
# Not quite the same as Petes, https://copinthehood.com/shooting-in-nyc-2020/
###############################

Part of the reason I do this is not because I don’t trust Pete’s analysis, but because I don’t want to embed pictures from someone elses website! So wanted to recreate the time series graph myself. So next up we need to do the same aggregating, but not for the whole city, but by each precinct. You can use the same table method again, but simply pass in additional columns. That gets you the data in long format, so then I reshape it to wide for later analysis (so each row is a single precinct and each column is a yearly count of shootings). (Note there have been some splits in precincts over the years IIRC, I don’t worry about that here, will cause it to be 0,0 in the 2019/2020 data I look at.)

###############################
#Now aggregating to year and precinct
counts <- as.data.frame(table(shooting$Year, shooting$PRECINCT))
names(counts) <- c('Year','PCT','Count')
# Reshape long to wide
count_wide <-  reshape(counts, idvar = "PCT", timevar = "Year", direction = "wide")
###############################

And now we can give Jerry’s package a test run, where you just pass it your variable names.

# Jerrys function for crime increase dispersion
output <- crimedispersion(count_wide, 'PCT', 'Count.2019', 'Count.2020')
output

The way to understand this is in a hypothetical world in which we could reduce shootings in one precinct at a time, we would need to reduce shootings in 57 of the 77 precincts to reduce 2020 shootings to 2019 levels. So this suggests very widespread increases, it isn’t just concentrated among a few precincts.

Another graph I have suggested to explore this, while taking into account the typical variance with Poisson count data, is to plot the pre crime counts on the X axis, and the post crime counts on the Y axis.

###############################
# My example contour with labels
cont_lev <- make_cont(count_wide$Count.2019, count_wide$Count.2020, lr=5)

eq_plot <- ggplot() + 
           geom_line(data=cont_lev, color="darkgrey", linetype=2, 
                     aes(x=x,y=lines,group=levels)) +
           geom_point(data=count_wide, shape = 21, colour = "black", fill = "grey", size=2.5, 
                      alpha=0.8, aes(x=Count.2019,y=Count.2020)) +
           scale_y_continuous(breaks=seq(0,140,by=10) +
           scale_x_continuous(breaks=seq(0,70,by=5)) +
           coord_cartesian(ylim = c(0, 140)) +
           xlab("2019 Shootings Per Precinct") + ylab("2020 Shootings")
eq_plot
###############################

The contour lines show the hypothesis that crime increased (by around 100% here). So if a point is near the middle line, it follows that doubled mark almost exactly. The upper/lower lines indicate the typical variance, which is a very good fit to the data here you can see. Very few points are outside the boundaries.

Both of these analyses point to the fact that shooting increases were widespread across NYC precincts. Pretty much everywhere doubled in the number of shootings, it is just some places had a larger baseline to double than others (and the data has some noise, you can pick out some places that did not increase if you cherry pick the data).

And as a final R note, if you want to save these graphs as a nice high resolution PNG, here is an example with Jerry’s dispersion object:

# Saving dispersion plot as a high res PNG
png(file = "ODI.png", bg = "transparent", height=5, width=9, units="in", res=1000, type="cairo")
output #this is the object from Jerrys crimedispersion() function earlier
dev.off()

Going forward I am wondering if there is a good way to do spatial monitoring for crime data like this, like some sort of control chart that takes into account both space and time. So isn’t retrospective a year later recap, but in near real time identify spatial increases.

Other References of Interest

  • Justin Nix & company have a few blog posts looking at NYC data as well. In the first they talk about the variance in cities, many are up but several are down as well in violence. A later post though updated with the clear increase in shootings in NYC.
  • There are too many papers at this point for me to do a bibliography of all the Covid and crime updates, but two open examples are Matt Ashby did a paper on several US cities, and Campedelli et al have an analysis of Chicago. Each show variance again, so no universal up or down in trends, but various examples of increases or decreases both between cities and between different crime types within a city.

My online course lab materials and musings about online teaching

I often refer folks to the courses I have placed online. Just for an update for everyone, if you look at the top of my website, I have pages for each of my courses at the header of my page. Several of these are just descriptions and syllabi, but the few lab based courses I have done over the years I have put my materials entirely online. So those are:

And each of those pages links to a GitHub page where all the lab goodies are stored.

The seminar in research focuses on popular quasi-experimental designs in CJ, and has code in R/Stata/SPSS for the weekly lessons. (Will need to update with python, I may need to write my own python margins library though!)

Grad GIS is mostly old ArcGIS tutorials (I don’t think I will update ArcPro, will see when Eric Piza’s new book comes out and just suggest that probably). Even though the screenshots are perhaps old at this point though the ideas/workflow are not. (It also has some tutorials on other open source tools, such as CrimeStat, Jerry’s Near Repeat Calculator, GeoDa, spatial regression analysis in R, and Mallesons/Andresens SPPT tool are examples I remember offhand.)

Undergrad Crime Analysis is mostly focused on number crunching relevant to crime analysts in Excel, although has a few things in Access (making SQL queries), and making a BOLO in publisher.

So for folks self-learning of course use those resources however you want. My suggestion is to skim through the syllabus, see if you want to learn about any particular lesson, and then jump right to that one. No need to slog through the whole course if you are just interested in one specific thing.

They are also freely available to any instructors who want to adapt those materials for their own courses as well.


One of the things that has disappointed me about the teaching response to Covid is instead of institutions taking the opportunity to really invest in online teaching, people are just running around with their heads cut off and offering poor last minute hybrid courses. (This is both for the kiddos as well as higher education.)

If you have ever taken a Coursera course, they are a real production! And the ones I have tried have all been really well done; nice videos, interactive quizzes with immediate feedback, etc. A professor on their own though cannot accomplish that, we would need investment from the University in filming and in scripting the webpage. But once it is finished, it can be delivered to the masses.

So instead of running courses with a tiny number of students, I think it makes more sense for Universities to actually pony up resources to help professors make professional looking online courses. Not the nonsense with a bad recorded lecture and a discussion board. It is IMO better to give someone a semester sabbatical to develop a really nice online course than make people develop them at the last minute. Once the course is set up, you really only need to administer the course, which takes much less work.

Another interested party may be professional organizations. For example, the American Society of Criminology could make an ad-hoc committee to develop a model curriculum for an intro criminology course. You can see in my course pages I taught this at one point – there is no real reason why every criminology teacher needs to strike out on their own. This is both more work for the individual teacher, as well as introduces quite a bit of variation in the content that crim/cj students receive.

Even if ASC started smaller, say promoting individual lessons, that would be lovely. Part of the difficulty in teaching a broad course like Intro to Criminology is that I am not an expert on all of criminology. So for example if someone made a lesson plan/video for bio-social criminology, I would be more apt to use that. Think instead of a single textbook, leveraging multi-media.


It is a bit ironic, but one of the reasons I was hired at HMS was to internally deliver data science training. So even though I am in the private sector I am still teaching!

Like I said previously, you are on your own for developing teaching content at the University. There is very little oversight. I imagine many professors will cringe at my description, but one of the things I like at HMS is the collaboration in developing materials. So I initially sat down with my supervisor and project manager to develop the overall curricula. Then for individual lessons I submit my slides/lab portion to my supervisor to get feedback, and also do a dry run in front of one of my peers on our data science team to get feedback. Then in the end I do a recorded lecture – we limit to something like 30 people on WebEx so it is not lagging, but ultimately everyone in the org can access the video recording at a later date.

So again I think this is a better approach. It takes more time, and I only do one lecture at a time (so take a month or two to develop one lecture). But I think that in the end this will be a better long term investment than the typical way Uni’s deliver courses.

Checking a Poisson distribution fit: An example with officer involved shooting deaths WaPo data (R functions)

So besides code on my GitHub page, I have a list of various statistic functions I’ve scripted on the blog over the years on my code snippets page. One of those functions I will illustrate today is some R code to check the fit of the Poisson distribution. Many of my crime analysis examples rely on crime data being approximately Poisson distributed. Additionally it is relevant in regression model building, e.g. should I use a Poisson GLM or do I need to use some type of zero-inflated model?

Here is a brief example to show how my R code works. You can source it directly from my dropbox page. Then I generated 10k simulated rows of Poisson data with a mean of 0.2. So I see many people in CJ make the mistake that, OK my data has 85% zeroes, I need to use some sort of zero-inflated model. If you are working with very small spatial/temporal units of analysis and/or rare crimes, it may be the mean of the distribution is quite low, and so the Poisson distribution is actually quite close.

# My check Poisson function
source('https://dl.dropboxusercontent.com/s/yj7yc07s5fgkirz/CheckPoisson.R?dl=0')

# Example with simulated data
set.seed(10)
lambda <- 0.2
x <- rpois(10000,lambda)
CheckPoisson(x,0,max(x),mean(x))

Here you can see in the generated table from my CheckPoisson function, that with a mean of 0.2, we expect around 81.2% zeroes in the data. And since we simulated the data according to the Poisson distribution, that is what we get. The table shows that out of the 10k simulation rows, 8121 were 0’s, 1692 rows were 1’s etc.

In real life data never exactly conform to hypothetical distributions. But we often want to see how close they are to the hypothetical before building predictive models. A real life example as close to Poisson distributed data as I have ever seen is the Washington Post Fatal Use of Force data. Every year WaPo has been collating the data, the total number of Fatal uses of Police Force in the US have been very close to 1000 events per year. And even in all the turmoil this past year, that is still the case.

# Washington Post Officer Involved Shooting Deaths Data
oid <- read.csv('https://raw.githubusercontent.com/washingtonpost/data-police-shootings/master/fatal-police-shootings-data.csv',
                stringsAsFactors = F)

# Year Stats
oid$year <- as.integer(substr(oid$date,1,4))
year_stats <- table(oid$year)[1:6]
year_stats 
mean(year_stats)
var(year_stats)

One way to check the Poison distribution is that the mean and the variance should be close, and here at the yearly level the data have some evidence of underdispersion according to the Poisson distribution (most crime data is overdispersed – the variance is much greater than the mean). If the actual mean is around 990, you would expect typical variations of say around plus/minus 60 per year (~ 2*sqrt(990)). But that only gives us a few observations to check (6 years). We can dis-aggregate the data to smaller intervals and check the Poisson assumption. Here I aggregate to days (note that this includes zero days in the table levels calculation). Then we again check the fit of the Poisson distribution.

#Now aggregating to count per day
oid$date_val <- as.Date(oid$date)
date_range <- paste0(seq(as.Date('2015-01-01'),max(oid$date_val),by='days'))
day_counts <- as.data.frame(table(factor(oid$date,levels=date_range)))
head(day_counts)
pfit <- CheckPoisson(day_counts$Freq, 0, 10, mean(day_counts$Freq))
pfit

According to the mean and the variance, it appears the distribution is a very close fit to the Poisson. We can see in this data we expected to have around 147 days with 0 fatal encounters, and in reality there were 160. I like seeing the overall counts, but another way is via the proportions in the final three columns of the table. You can see for all of the integers, we are less than 2 percentage points off for any particular integer count. E.g. we expect the distribution to have 3 fatal uses of force on about 22% of the days, but in the observed distribution days with 3 events only happened around 21% of the days (or 20.6378132 without rounding). So overall these fatal use of force data of course are not exactly Poisson distributed, but they are quite close.

So the Poisson distribution is motivated via a process in which the inter-arrival dates of events being counted are independent. Or in more simple terms one event does not cause a future event to come faster or slower. So offhand if you had a hypothesis that publicizing officer fatalities made future officers more hesitant to use deadly force, this is not supported in this data. Given that this is officer involved fatal encounters in the entire US, it is consistent with the data generating process that a fatal encounter in one jurisdiction has little to do with fatal encounters in other jurisdictions.

(Crime data we are often interested in the opposite self-exciting hypothesis, that one event causes another to happen in the near future. Self-excitation would cause an increase in the variance, so the opposite process would result in a reduced variance of the counts. E.g. if you have something that occurs at a regular monthly interval, the counts of that event will be underdispersed according to a Poisson process.)

So the above examples just checked a univariate data source for whether the Poisson distribution was a decent fit. Oftentimes academics are interested in whether the conditional distribution is a good fit post some regression model. So even if the marginal distribution is not Poisson, it may be you can still use a Poisson GLM, generate good predictions, and the conditional model is a good fit for the Poisson distribution. (That being said, you model has to do more work the further away it is from the hypothetical distribution, so if the marginal is very clearly off from Poisson a Poisson GLM probably won’t fit very well.)

My CheckPoisson function allows you to check the fit of a Poisson GLM by piping in varying predicted values over the sample instead of just one. Here is an example where I use a Poisson GLM to generate estimates conditional on the day of the week (just for illustration, I don’t have any obvious reason fatal encounters would occur more or less often during particular days of the week).

#Do example for the day of the week
day_counts$wd <- weekdays(as.Date(day_counts$Var1))
mod <- glm(Freq ~ as.factor(wd) - 1, family="poisson", data=day_counts)
#summary(mod), Tue/Wed/Thu a bit higher
lin_pred <- exp(predict(mod))
pfit_wd <- CheckPoisson(day_counts$Freq, 0, 10, lin_pred)
pfit_wd

You can see that the fit is almost exactly the same as before with the univariate data, so the differences in days of the week does not explain most of the divergence from the hypothetical Poisson distribution, but again this data is already quite close to a Poisson distribution.

So it is common for people to do tests for goodness-of-fit using these tables. I don’t really recommend it – just look at the table and see if it is close. Departures from hypothetical can inform modeling decisions, e.g. if you do have more zeroes than expected than you may need a negative binomial model or a zero-inflated model. If the departures are not dramatic, variance estimates from the Poisson assumption are not likely to be dramatically off-the-mark.

But if you must, here is an example of generating a Chi-Square goodness-of-fit test with the example Poisson fit table.

# If you really want to do a test of fit
chi_stat <- sum((pfit$Freq - pfit$PoisF)^2/pfit$PoisF)
df <- length(pfit$Freq) - 2
dchisq(chi_stat, df)

So you can see in this example the p-value is just under 0.06.

I really don’t recommend this though for two reasons. One is that with null hypothesis significance testing you are really put in a position that large data samples always reject the null, even if the departures are trivial in terms of the assumptions you are making for whatever subsequent model. The flipside of this is that with small samples the test is underpowered, so there are never many good scenarios where it is useful in practice. Two, you can generate superfluous categories (or collapse particular categories) in the Chi-Square test to increase the degrees of freedom and change the p-value.

One of the things though that this is useful for is checking the opposite, people fudging data. If you have data too close to the hypothetical distribution (so very high p-values here), it can be evidence that someone manipulated the data (because real data is never that close to hypothetical distributions). A famous example of this type of test is whether Mendel manipulated his data.

I intentionally chose the WaPo data as it is one of the few that out of the box really appears to be close to Poisson distributed in the wild. One of my next tasks though is to do some similar code for negative binomial fits. Like Paul Allison, for crime count data I rarely see much need for zero-inflated models. But while I was working on that I noticed that the parameters in NB fits with even samples of 1,000 to 10,000 observations were not very good. So I will need to dig into that more as well.

Lit reviews are (almost) functionally worthless

The other day I got an email from ACJS about the most downloaded articles of the year for each of their journals. For The Journal of Criminal Justice Education it was a slightly older piece, How to write a literature review in 2012 by Andrew Denney & Richard Tewksbury, DT from here on. As you can guess by the title of my blog post, it is not my most favorite subject. I think it is actually an impossible task to give advice about how to write a literature review. The reason for this is that we have no objective standards by which to judge a literature review – whether one is good or bad is almost wholly subject to the discretion of the reader.

The DT article I don’t think per se gives bad advice. Use an outline? Golly I suggest students do that too! Be comprehensive in your lit review about covering all relevant work? Well who can argue with that!

I think an important distinction to make in the advice DT give is the distinction between functional actions and symbolic actions. Functional in this context means an action that makes the article better accomplish some specific function. So for example, if I say you should translate complicated regression models to more intuitive marginal effects to make your results more interpretable for readers, that has a clear function (improved readability).

Symbolic actions are those that are merely intended to act as a signal to the reader. So if the advice is along the lines of, you should do this to pass peer review, that is on its face symbolic. DT’s article is nearly 100% about taking symbolic actions to make peer reviewers happy. Most of the advice doesn’t actually improve the content of the manuscript (or in the most charitable interpretation how it improves the manuscript is at best implicit). In DT’s section Why is it important this focus on symbolic actions becomes pretty clear. Here is the first paragraph of that section:

Literature reviews are important for a number of reasons. Primarily, literature reviews force a writer to educate him/herself on as much information as possible pertaining to the topic chosen. This will both assist in the learning process, and it will also help make the writing as strong as possible by knowing what has/has not been both studied and established as knowledge in prior research. Second, literature reviews demonstrate to readers that the author has a firm understanding of the topic. This provides credibility to the author and integrity to the work’s overall argument. And, by reviewing and reporting on all prior literature, weaknesses and shortcomings of prior literature will become more apparent. This will not only assist in finding or arguing for the need for a particular research question to explore, but will also help in better forming the argument for why further research is needed. In this way, the literature review of a research report “foreshadows the researcher’s own study” (Berg, 2009, p. 388).

So the first argument, a lit review forces a writer to educate themselves, may offhand seem like a functional objective. It doesn’t make sense though, as lit. reviews are almost always written ex post research project. The point of writing a paper is not to educate yourself, but educate other people on your research findings. The symbolic motivation for this viewpoint becomes clear in DT’s second point, you need to demonstrate credibility to your readers. In terms of integrity if the advice in DT was ‘consider creating a pre-analysis plan’ or ‘release data and code files to replicate your results’ that would be functional advice. But no, it is important to wordsmith how smart you are so reviewers perceive your work as more credible.

Then the last point in the paragraph, articulating the need for a particular piece of research, is again a symbolic action in DT’s essay. You are arguing to peer reviewers about the need for a particular research question. I understand the spirit of this, but think back to what function does this serve? It is merely a signal to reviewers to say, given finite space in a journal, please publish my paper over some other paper, because my topic is more important.

You actually don’t need a literature review to demonstrate a topic is important and/or needed – you can typically articulate that in a sentence or two. For a paper I reviewed not too long ago on crime reductions resulting from CCTV installations in a European city, I was struck by another reviewers critique saying that the authors “never really motivate the study relative to the literature”. I don’t know about you, but the importance of that study seems pretty obvious to me. But yeah sure, go ahead and pad that citation list with a bunch of other studies looking at the same thing to make some peer reviewers happy. God forbid you simply cite a meta-analysis on prior CCTV studies and move onto better things.

What should a lit review accomplish?

So again I don’t think DT give bad advice – mostly vapid but not obviously bad. DT focus on symbolic actions in lit reviews because as lit reviews are currently performed in CJ/Crim journals, they are almost 100% symbolic. They serve almost no functional purpose other than as a signal to reviewers that you are part of the club. So DT give about the best advice possible navigating a series of arbitrary critiques with no clear standard.

As an example for this position that lit reviews accomplish practically nothing, conduct this personal experiment. The next peer review article you pick up, do not read the literature review section. Only read the abstract, and then the results and conclusion. Without having read the literature review, does this change the validity of a papers findings? It for the most part does not. People get feelings hurt by not being cited (including myself), but even if someone fails to cite some of my work that is related it pretty much never impacts the validity of that persons findings.

So DT give advice about how peer review works now. No doubt those symbolic actions are important to getting your paper published, even if they do not improve the actual quality of the manuscript in any clear way. I rather address the question about what I think a lit review should look like – not what you should do to placate three random people and the editor. So again I think the best way to think about this is via articulating specific functions a lit review accomplishes in terms of improving the manuscript.

Broadening the scope abit to consider the necessity of citations, the majority of citations in articles are perfunctory, but I don’t think people should plagiarize. So when you pull a very specific piece of information from a source, I think it is important to cite that work. Say you are using a survey instrument developed by someone else, citing the work that establishes that instruments reliability and validity, as well as the original population those measures were established on, is certainly useful information to the reader. Sources of information/measures, a recent piece saying the properties of your statistical model are I think other good examples of things to cite in your work. Unfortunately I cannot give a bright line here, I don’t cite Gauss every time I use the normal distribution. But if I am using a code library someone else developed that is important, inasmuch as that if someone wants to do a similar project they could use the same library.

In terms of discussing relevant results in prior studies, again the issue is the boundary of what is relevant is very difficult to articulate. If there is a relevant meta-analysis on a topic, it seems sufficient to me to simply state the results of the meta-analysis. Why do I think that is important though? It helps inform your priors about the current study. So if you say a meta-analysis effect size is X, and the current study has an effect size much larger, it may give you pause. It is also relevant if you are generalizing from the results of the study, it is just another piece of evidence in addition to the meta-analysis, not an island all by itself.

I am not saying discussing prior specific results are not needed entirely, but they do not need to be extensive. So if studies Z, Y, X are similar to yours but all had null results, and you think it was because the sample sizes were too small, that is relevant and useful information. (Again it changes your priors.) But it does not need to be belabored on in detail. The current standard of articulating different theoretical aspects ad-nauseum in Crim/CJ journals does not improve the quality of manuscripts. If you do a hot spots policing experiment, you do not need to review all the different minutia of general deterrence theory. Simply saying this experiment is likely to only accomplish general deterrence, not specific deterrence, seems sufficient to me personally.

When you propose a book you need to say ‘here are some relevant examples’ – I think the same idea would be sufficient for a lit review. OK here is my study, here are a few additional studies I think the reader may be interested in that are related. This accomplishes what contemporary lit reviews do in a much more efficient manner – citing more articles makes it much more difficult to pull out the really relevant related work. So admit this does not improve the quality of the current manuscript in a specific way, but helps the reader identify other sources of interest. (I as a reader typically go through the citation list and note a few articles I am interested in, this helps me accomplish that task much quicker.)

I’ve already sprinkled a few additional pieces of advice in this blog post (marginal effect estimates, pre-analysis plans, sharing data code), although you may say they don’t belong in the lit review. Whatever, those are things that actually improve either the content of the manuscript or the actual integrity of the research, not some spray paint on your flowers.

Relevant Other Work

Amending the WDD test to incorporate Harm Weights

So I received a question the other day about amending my and Jerry Ratcliffe’s Weighted Displacement Difference (WDD) test to incorporate crime harms (Wheeler & Ratcliffe, 2018). This is a great idea, but unfortunately it takes a small bit of extra work compared to the original (from the analysts perspective). I cannot make it as simple as just piping in the pre-post crime weights into that previous spreadsheet I shared. The reason is a reduction of 10 crimes with a weight of 10 has a different variance than a reduction of 25 crimes with a weight of 4, even though both have the same total crime harm reduction (10*10 = 4*25).

I will walk through some simple spreadsheet calculations though (in Excel) so you can roll this on your own. HERE IS THE SPREADSHEET TO DOWNLOAD TO FOLLOW ALONG. What you need to do is to calculate the traditional WDD for each individual crime type in your sample, and then combine all those weighted WDD’s estimates in the end to figure out your crime harm weighted estimate in the end (with confidence intervals around that estimated effect).

Here is an example I take from data from Worrall & Wheeler (2019) (I use this in my undergrad crime analysis class, Lab 6). This is just data from one of the PFA areas and a control TAAG area I chose by hand.

So first, go through the motions for your individual crimes in calculating the point estimate for the WDD, and then also see the standard error of that estimate. Here is an example of piping in the data for thefts of motor vehicles. The WDD is simple, just pre-post crime counts. Since I don’t have a displacement area in this example, I set those cells to 0. Note that the way I calculate this, a negative number is a good thing, it means crime went down relative to the control areas.

Then you want to place those point estimates and standard errors in a new table, and in those same rows assign your arbitrary weight. Here I use weights taken from Ratcliffe (2015), but these weights can be anything. See examples in Wheeler & Reuter (2020) for using police cost of crime estimates, and Wolfgang et al. (2006) for using surveys on public perceptions of severity. Many of the different indices though use sentencing data to derive the weights. (You could even use negative weights and the calculations here all work, say you had some positive data on community interactions.)

Now we have all we need to calculate the harm-weighted WDD test. The big thing here to note is that the variance of Var(x*harm_weight) = Var(x)*harm_weight^2. So that allows me to use all the same machinery as the original WDD paper to combine all the weights in the end. So now you just need to add a few additional columns to your spreadsheet. The point estimate for the harm reduction is simply the weight multiplied by the point estimate for the crime reduction. The variance though you need to square the standard error, and square the weight, and then multiply those squared results together.

Once that is done, you can pool the harm weighted stats together, see the calculations below the table. Then you can use all the same normal distribution stuff from your intro stats class to calculate z-scores, p-values, and confidence intervals. Here are what the results look like for this particular example.

I think this is actually a really good idea to pool results together. Many place based police interventions are general, in that you might expect them to reduce multiple crime types. Various harm scores are a good way to pool the results, instead of doing many individual tests. A few caveats though, I have not done simulations like I did in the WDD peer reviewed paper, I believe these normal approximations will do OK under the same circumstances though that we suggest it is reasonable to do the WDD test. You should not do the WDD test if you only have a handful of crimes in each area (under 5 in any cell in that original table is a good signal it is too few of crimes).

These crime count recommendations I think are likely to work as well for weighted crime harm. So even if you give murder a really high weight, if you have fewer than 5 murders in any of those original cells, I do not think you should incorporate it into the analysis. The large harm weight and the small numbers do not cancel each other out! (They just make the normal approximation I use likely not very good.) In that case I would say only incorporate individual crimes that you are OK with doing the WDD analysis to begin with on their own, and then pool those results together.

Sometime I need to grab the results of the hot spots meta-analysis by Braga and company and redo the analysis using this WDD estimate. I think the recent paper by Braga and Weisburd (2020) is right, that modeling the IRR directly makes more sense (I use the IRR to do cost-benefit analysis estimates, not Cohen’s D). But even that is one step removed, so say you have two incident-rate-ratios (IRRs), 0.8 and 0.5, the latter is bigger right? Well, if the 0.8 study had a baseline of 100 crimes, that means the reduction is 100 - 0.8*100 = 20, but if the 0.5 study had a baseline of 30 crimes, that would mean a reduction of 30 - 0.5*30 = 15, so in terms of total crimes is a smaller effect. The WDD test intentionally focuses on crime counts, so is an estimate of the actual number of crimes reduced. Then you can weight those actual crime decreases how you want to. I think worrying about the IRR could even be one step too far removed.

References

Open Source Criminology Related Network Datasets

So I am a big proponent of open source data analysis. There is a problem with using criminal justice data sources though – they often have private information that prevents us from sharing the data. For example, I have posted quite a few of my projects here (mostly spatial data analysis), but there are a few I cannot share. For example, I worked on a paper with chronic offender predictions, and I cannot share that data (Wheeler et al., 2019). The outcome, being a victim or perpetrator of gun violence, is so rare that by itself basically makes it impossible to publicly share the data without exposing the individuals under study.

One good resource all criminologists should be aware of is ICPSR, in particular NACJD. Many datasets on there though anymore are restricted, in that you need to get IRB permission and ICPSR permision to download the dataset to use. (Which typically takes like 2~3 months in my experience doing it a few times, which includes both your local Uni IRB and the ICPSR process.) For example here is one I went through the motions to get to (in the end) validate different survival prediction methods.

ICPSR is a great resource to be able to handle sharing potentially sensitive data. But this falls short in two areas. One is in teaching – you cannot go through the IRB ritual in a timely enough fashion to be able to use those datasets in a course environment. The other is in terms of methods, so for example if you wanted to say your model provides better predictions than some other model, they should be established on the same datasets. Current state of affairs in criminology in this regard is pretty bad to be curt – most everybody uses their own data they have access to. So much of the research on different risk assessment instruments for bail/probation/parole are pretty much impossible to say one is better than another.

One example type of data source that is almost entirely missing from NACJD (that I am aware of) is social network datasets relevant for criminology/criminal justice. So I have started a spreadsheet to collate different open source network datasets relevant for criminologists. So I have some from my work and a few other random examples I have come across on the internet.

SPREADSHEET OF NETWORK DATASETS

I have made that spreadsheet open, so anyone should be able to edit in more sources. (Feel free to include links to ICPSR as well, but if you do edit a note to say whether it is restricted access or not.) For here I would be interested in really large networks, for example would love to try to replicate Marie’s work on gang network transitions (Oullet et al., 2019a).

And also while I am here, Jacob Young has created a very nice introductory course to social network analysis. I have a brief lecture in my advanced research design class, but Jacob’s is much more thorough (and he is more of an expert in this area than I am for sure).

I will add to that spreadsheet over time as well. I have made a separate sheet for survival analysis datasets. I would be particularly keen for example criminal justice examples. So for network analysis we have examples of looking at use-of-force networks (Oullet et al., 2019b), and for survival analysis I would be interested in a time to solve example dataset. Unfortunately for the solved cases, NIBRS is a good resource but has a large confound in they don’t measure whether a case was ever assigned to a detective.

Feel free to add whatever in that spreadsheet, but what I was thinking was oriented towards different methods (again as a main motivation is for teaching). So for example if you knew of datasets for age-period-cohort modelling, or for estimating group-based-trajectory models, I think those would be good examples to start new sheets and collate different data sources.

References

  • Ouellet, M., Bouchard, M., & Charette, Y. (2019a). One gang dies, another gains? The network dynamics of criminal group persistence. Criminology, 57(1), 5-33.
  • Ouellet, M., Hashimi, S., Gravel, J., & Papachristos, A. V. (2019b). Network exposure and excessive use of force: Investigating the social transmission of police misconduct. Criminology & Public Policy, 18(3), 675-704.
  • Wheeler, A. P., Worden, R. E., & Silver, J. R. (2019). The accuracy of the violent offender identification directive tool to predict future gun violence. Criminal Justice and Behavior, 46(5), 770-788.

CrimRxiv, Alt-Journal Contributions, and Mike Maltz’s Retrospective

As I’m sure followers of mine know, I am a big proponent of posting pre-prints. Spearheaded by Scott Jacques, he has started a specifically criminology focused pre-print server title CrimRxiv. It is still in beta but anyone can contribute a paper if they want.

One of the things me and Scott have been jamming about is how to leverage crimrxiv to make a journal that not only takes advantage of all the goodies on the internet, such as being able to embed interactive graphics or other rich media directly in a journal articles. But to really widen the scope of what ‘counts’ in terms of scholarly contribution. Why can’t things like a cool app, or a really good video lecture you edited, or a blog post illustrating code be put on the same level with journal articles?

Part of the reason I am writing this blog post is that I saw Michael Maltz recently publish a retrospective on his career on Academia.edu. This isn’t a typical journal article, but despite that there is no reason why you shouldn’t share such pieces. So I was able to convince Mike to post A Retrospective Look at My Professional Life to crimrxiv. When he first posted it on Academia.edu here was my response on how Mike (despite never having crossed paths) has influenced my career.


Hi Michael and thank you for sharing,

I’ve followed your work since a grad student at Albany. I initially got hooked on data viz based on Tufte’s book. When I looked for examples of criminologists discussing data viz you were the only one I found. That was sometime around 2010, so you had that chapter in the handbook of quantitative crim. You also had another article about drawing glyphs to illustrate life course transitions I was familiar with.

When I finished my classes at SUNY, I then worked at Troy as a crime analyst while finishing my dissertation. I doubt any of the coffee shops were the same from your time, but I did like walking over to Famous hotdogs for lunch every now and then.

Most of my work at the PD was making time series graphs and maps. No regression, so most of my stats training was not particularly useful. Even my mapping course I took focused on areal data analysis was not terribly relevant.

I tried to do similar projects to your glyph life-courses with interval censored crime data, but I was never really successful with that, they always ended up being too complicated with even moderately large crime datasets, see https://andrewpwheeler.com/2013/02/28/interval-graph-for-viz-temporal-overlap-in-crime-events/ and https://andrewpwheeler.com/2014/10/02/stacking-intervals/ for my attempts.

What was much more helpful was simply doing monitoring metrics over time, simple running means, and then I just inverted the PDF of the Poisson to give error bars, e.g. https://andrewpwheeler.com/2016/06/23/weekly-and-monthly-graphs-for-monitoring-crime-patterns-spss/. Then cases that were outside the error bands signified an anomalous pattern. In Troy there was an arrest of a single prolific person breaking into cars, and the trend went from a creeping 10 year high to a 10 year low instantly in those graphs.

So there again we have your work on the Poisson distribution and operations research in that JQC article. Also sometime in there I saw a comment you made on Andrew Gelman’s blog pointing to your work with error bands for BJS. Took that ‘fan chart’ idea later on and provided error bands for city level and USA level homicide trends, e.g. https://apwheele.github.io/MathPosts/FanChart_NewOrleans.html. Most of popular discussion of large scale crime trends is misguided over-interpreting short term noise in my opinion.

So all my degrees are in criminal justice, but I have been focusing more on linear programming over time borrowing from operations researchers as well, https://andrewpwheeler.com/2020/05/29/an-intro-to-linear-programming-for-criminologists/. I’ve found that taking outputs from a predictive model and then applying a decision analysis to specifically articulate strategies CJ agencies should take is much more fruitful than the typical way academic research is done.

Thank you again for sharing your story and best, Andy Wheeler

New paper out: Trauma Center Drive Time Distances and Fatal Outcomes among Gunshot Wound Victims

A recent paper with Gio Circo, Trauma Center Drive Time Distances and Fatal Outcomes among Gunshot Wound Victims, was published in Applied Spatial Analysis and Policy. In this work, me and Gio estimate the marginal effect that drive time distances to the nearest Level 1 trauma center have on the probability a victim dies of a gun shot wound, using open Philadelphia data.

If you do not have access to that published version, here is a pre-print version. (And you can always email me or Gio and ask for a copy.) Also because we use open data, we have posted the data and code used for the analysis. (Gio did most of the work!)

For a bit of the background on the project, Gio had another paper estimating a similar model using Detroit data. But Gio estimated those models with aggregate data. I was familiar with more detailed Philly shooting data, as I used it for an example hot spot cluster map in my GIS crime mapping class.

There are two benefits to leveraging micro data instead of the aggregated data. One is that you can incorporate micro level incident characteristics into the model. The other is that you can get the exact XY coordinates where the incident occurred. And using those exact coordinates we calculate drive time distances to the hospital, which offer a slight benefit in terms of leave-one-out cross-validated accuracy compared to Euclidean distances.

So in terms of incident level characteristics, the biggest factor in determining your probability of death is not the distance to the nearest hospital, but where you physically get shot on your body. Here is a marginal effect plot from our models, showing how the joint effect of injury location (as different colors) and the drive time distance impact the probability of death. So if you get shot in the head vs the torso, you have around a 30% jump in the probability of death from that gun shot wound. Or if you get shot in an extremity you have a very low probability of death as well.

But you can see from that the margins for drive times are not negligible. So if you are nearby a hospital and shot in the torso your probability of dying is around 20%, whereas if you are 30 minutes away your probability rises to around 30%. You can then use this to map out isochrone type survivability estimates over the city. This example map is if you get shot in the torso, and the probability of death based on the drive time distance to the nearest Level 1 trauma location.

Fortunately many shootings do not occur in the northern most parts of Philadelphia, here is a map of the number of shootings over the city for our sample.

You can subsequently use these models to either do hypothetical take a trauma center away or add a trauma center. So given the density of shootings and drive time distances, it might make sense for Philly to invest in a trauma center in the shooting hot spot in the Kensington area (northeast of Temple). (You could technically figure out an ‘optimal’ location given the distribution of shootings, but since you can’t just plop down a hospital wherever it would make more sense to do hypothetical investments in current hospitals.)

For a simplified example, imagine you had 100 shootings in the torso that were an average 20 minutes away. The average probability of death in that case is around 25% (so ~25 homicides). If you hypothetically have a location that is only 5 minutes away, the probability goes down to more like 20% (so ~20 homicides). So in that hypothetical, the distance margin would have prevented 5 deaths.

One future piece of research I would be interested in examining is pre-post Shotspotter. So in that article Jen Doleac is right in that the emipirical evidence for Shotspotter reducing shootings is pretty flimsy, but preventing mortality by getting to the scene faster may be one mechanism that ShotSpotter can justify its cost.