Overview of DataViz books

Keith McCormick the other day on LinkedIn the other day made a post/poll on his favorite data viz books. (I know Keith because I contributed a chapter on geospatial data analysis in SPSS in Keith and Jesus Salcedo’s book, SPSS Statistics for Data Analysis and Visualization, and Jon Peck contributed a chapter as well.)

One thing about this topical area is that there isn’t a standard Data Viz 101 curriculum. So if you pick up Statistics 101 books, they will cover pretty much all the same material (normal distribution, central limit theorem, t-tests, regression). It isn’t 100% overlap (some may spend more time on elementary probability, and others may cover ANOVA), but for someone learning the material there isn’t much point in reading multiple introductory stats books.

This is not so with the Data Viz books in Keith’s picture – they are very different in content. As I have read quite a few different books on the topic over the years I figured I would give my breakdown of the various books.

Albert Cairo’s The Functional Art

While my list is not in rank order, I am putting Cairo’s book first for a reason. Although there is not a Data Viz 101 curriculum, this book is the closest thing to it. Cairo goes through in short order various cognitive aspects on how we view the world that are fundamental to building good data visualizations. This includes things like it is easier to compare lengths along a common axis, and that we can perceive rank order to color saturation, but not to a color’s hue.

It is also enjoyable to read because of all the great journalistic examples. I did not care so much for the interviews at the back, and I don’t like the cover. But if I did a data viz course for undergrads in social sciences (Cairo developed this for journalism students), I would likely assign this book. But despite being very accessible, he covers a broad spectrum of both simple graphs and complicated scientific diagrams.

For this review many of these authors have other books. So I haven’t read Cairo’s The Truthful Art, so I cannot comment on it.

Edward Tufte’s The Visual Display of Quantitative Information

Tufte’s book was the first data viz book I bought in grad school. I initially invested in it as he had a chapter on a critique of powerpoint presentations, which is very straightforward and provides practical advice on what not to do. Most of the critiques of this book are that it is mostly just a collection of Tufte’s opinions about creating minimalist, dense, scientific graphs. So while Cairo dives into the science of perception, Tufte is just riffing his opinions. His opinions are based on his experience though, and they are good!

I believe I have read all of Tufte’s other books as well, but this is the only one that made much of an impression on me (some of his others go beyond graphs, and talk about UI design). I gobbled it up in only two days when I first started reading it, and so if I were stuck on an island with one book scenario I would choose this one over the others I list here (although again think Cairo’s book is the best to start with for most folks). So for scientists I think it is a good investment and an enjoyable read overall.

Nathan Yau’s Visualize This

Of all the books I review, Yau’s is the only how-to actually make graphs in software. Unfortunately, much of Yau’s programmatic advice was outdated already when it was published (e.g. flash was already going by the wayside). So while he has many great examples of creating complicated and beautiful data visualizations, the process he outlines to make them are overly complicated IMO (such as using python to edit parts of a pre-made SVG map). It is a good book for examples no doubt, and maybe you can pick up a few tricks in terms of post editing charts in a vector graphics program, such as Illustrator or Inkscape (most examples are making graphs in base R and then exporting to edit finishing touches).

In terms of making a how-to book it is really hard. Yau I am sure has updates on his Flowing Data website to make charts (and maybe his newer book is better). But I don’t think I would recommend investing in this book for anything beyond looking at pretty examples of data viz.

Stephen Kosslyn’s Graph Design for the Eye and Mind

The prior books all contained complicated, dense, scientific graphs. Kosslyn’s book is specifically oriented to making corporate slide decks/powerpoints, in which the audience is not academic. But his advice is mostly backed on his understanding of the psychology (he relegates extensive endnotes to point to scientific lit, to avoid cluttering up the basic book). He has as few gems of advice I admit, such as it isn’t the number of lines in a graph that make it complicated, but really the number of unique profiles. But then he has some pieces I find bizarre, such as saying pie charts are OK because they are so popular (so have survived a Darwinian survival process in terms of being presented to business people).

I would stick with Tufte’s powerpoint advice (and later will mention a few other books related to giving presentations), as opposed to recommending this book.

Alan MacEachren How maps work: Representation, visualization, and design

MacEachren’s book is encyclopedic in terms of scientific literature on design aspects of both cartography, as well as the psychological literature. So it is like reading an encyclopedia (not 100% sure if I ever finished it front to back to be honest). I would start here if you are interested in designing cognitive experiments to test certain graphs/maps. I think MacEachren pooling from cartography and psychology ends up being a better place to start than say Colin Ware’s Information Visualization (but it is close). They are both very academically oriented though.

Leland Wilkinson’s The Grammar of Graphics

I used SPSS for along time when I read this book, so I was already quite familiar with the grammar of graphics in terms of creating graphs in SPSS. That pre-knowledge helped me digest Wilkinson’s material I believe. Nick Cox has a review of this book, and for this one he notes that the audience for this book is hard to pin down. I agree, in that you need to be pretty far along already in terms of making graphs to be able to really understand the material, and as such it is not clear what the benefit is. Even for power users of SPSS, much of the things Wilkinson talks about are not implemented in SPSS’s GGRAPH language, so they are mostly just on paper.

(Note Nick has a ton of great reviews on Amazon as well for various data viz books. He is a good place to start to decide if you want to purchase a book. For example the worst copy-edited book I have ever seen is Andy Kirk’s via Packt publishing, and Nick notes how poorly it is copy-edited in his review.)

Here is an analogy I think is apt for Wilkinson’s book – if we are talking about cars, you may have a book on the engineering of the car, and another on how to actually drive the car. Knowing how pistons work in a combustible engine does not help you drive a car, but helps you build one. Wilkinson’s book is more about the engineering of a graph from an algebraic perspective. At the fringes it helps in thinking about the components of graphs, but doesn’t really give any advice about what graph to make in-and-of itself, nor what is a good graph or a bad graph.

Note that the R library ggplot2, is actually quite a bit different than Leland’s vision. It is simpler, in that Wickham essentially drops the graph algebra part, so you specify the axes directly, whereas in Wilkinson’s you just say X*Y*Z, and depending on other aspects of the grammer this may produce a 3d scatterplot, a facet gridded scatterplot, a clustered bar chart, etc. I think Wickham was right to make that design choice, but in doing so it really isn’t an implementation of what Wilkinson was talking about in this book.

Jacques Bertin’s Semiology of Graphics: Diagrams, Networks, Maps

Bertin’s book is an attempt to make a dictionary of terms for different aspects of graphs. So it is a bit in the weeds. One unique aspect of Bertin is that he discusses titles and labels for graphs, although I wouldn’t go as far as saying that his discussion leads to straightforward advice. I find Wilkinson’s grammer of graphics a more useful way to overall think about the components of a graph, although Bertin is more encyclopedic in coverage of different types of graphs and maps in the wild.

Short notes on various other books

Most of these books (with the exception of Nathan Yau’s) are not how-to actually write code to make graphs. For those that use R, there are two good options though. Hadley Wickham’s ggplot2: Elegant Graphics for Data Analysis (Use R!) was really good at the time (I am not sure if the newer version is more up to date though, like any software it changes over time so the older one I know is out of date for many different code examples). And though I’ve only skimmed it, Kieran Healy’s Data Visualization: A practical introduction is free and online and looks good (and also for those interested in criminal justice examples Jacob Kaplan has examples in R as well, Crime by the Numbers). So those later two I know are good in terms of being up to date.

For python I just suggest using google (Jake VanderPlas has a book that looks good, and his website is really good). For excel I really like Jorge Camões work (his book is Data at Work, which I don’t think I’ve read, but have followed his website for along time).

In terms of scientific presentations (which covers both graphs and text), I’ve highly suggested in the past Trees, maps, and theorems. This is similar in spirit to Tufte’s minimalist style, but gives practical advice on slides, writing, and presentations. Jon Schwabish’s book, Better Presentations: A Guide for Scholars, Researchers, and Wonks, is very good as well in terms of direct advice. I think for folks in academia I would say go for Doumont’s book, and for those in corporate environment go for Schwabish’s.

Stephen Few’s books deserve a mention here as well, such as Show me the numbers. Stephen is the only one to do a deep dive into the concept of dashboards. Stephen’s advice is very straightforward and more oriented towards a corporate type environment, not so much a scientific one (although it isn’t bad advice for scientists, ditto for Schwabish, just stating more so for an understanding of the intended audience).

I could go on forever, Tukey’s EDA, Calvin Schmid’s book on how to draw graphs with actual splines! How to lie with statistics and how to lie with maps. So many to choose from. But I think if you are starting out in a data oriented role in which you need to make graphs, I would suggest starting with Cairo’s book, then get Tufte to really get some artistic motivation and a good review of bad powerpoint practices. The rest are more advanced material for study though.

From a criminologist, we should restore voting rights

I have donated to the Southern Poverty Law Center in the past (recently my workplace, HMS, matched contributions). I no doubt do not 100% agree with their positions on every little detail (as is probably true for every organization in the criminal justice sphere) , but I believe they do good work. In particular I’ve always though that their identifying hate groups is a valuable public service, see the SPLC’s Hate Map.

They do more work than just the hate group map though. Recently they have been sending information on voter disenfranchisement. It is not uniform across states, but in many places if you have a felony conviction you have your rights to vote stripped entirely. It is even more severe in some places, in that you cannot vote if you simply owe fines or fees to the state.

I figured this would be a good blog post, as I have always had a more extreme view on this than most people. While most argue simply that individuals voting rights should be restored after an individuals imprisonment has ended, I don’t believe they should ever be stripped to begin with. Or more specifically, I believe people who are even currently incarcerated should be allowed to vote.

The reasons I have this opinion are relatively simple. First, there is no evidence that voter disenfranchisement acts as a deterrent to prevent someone from committing a crime. No one thinks, hey, I shouldn’t commit this robbery because I need to cast my ballot this fall. Restoring voting rights, even to those imprisoned, poses no threat to public safety.

The second reason I support restoring voting rights is because an important part of offender reintegration into society is to participate in civil matters. We don’t lock people up and throw away the keys, so we should take steps to help those former offenders come back and have a positive contribution to our society. What simpler way than to allow those individuals to engage in the voting process? (The foremost authority on this subject is Vesla Weaver.)

You may ask how would voting in prison work? For voting in prison the location of the vote should not count where the jail is located, but wherever the last address of the offender was before they were incarcerated. This brings up another issue, that certain state census counting procedures count individuals incarcerated at the location of the prison. This results in gerrymandering, where typically rural areas with prisons get more electoral representation, even though for the most part those individuals have no voting rights.

I believe we would be better off as a nation if not only everyone was allowed to vote, but that everyone did vote.

Open Source Criminology Related Network Datasets

So I am a big proponent of open source data analysis. There is a problem with using criminal justice data sources though – they often have private information that prevents us from sharing the data. For example, I have posted quite a few of my projects here (mostly spatial data analysis), but there are a few I cannot share. For example, I worked on a paper with chronic offender predictions, and I cannot share that data (Wheeler et al., 2019). The outcome, being a victim or perpetrator of gun violence, is so rare that by itself basically makes it impossible to publicly share the data without exposing the individuals under study.

One good resource all criminologists should be aware of is ICPSR, in particular NACJD. Many datasets on there though anymore are restricted, in that you need to get IRB permission and ICPSR permision to download the dataset to use. (Which typically takes like 2~3 months in my experience doing it a few times, which includes both your local Uni IRB and the ICPSR process.) For example here is one I went through the motions to get to (in the end) validate different survival prediction methods.

ICPSR is a great resource to be able to handle sharing potentially sensitive data. But this falls short in two areas. One is in teaching – you cannot go through the IRB ritual in a timely enough fashion to be able to use those datasets in a course environment. The other is in terms of methods, so for example if you wanted to say your model provides better predictions than some other model, they should be established on the same datasets. Current state of affairs in criminology in this regard is pretty bad to be curt – most everybody uses their own data they have access to. So much of the research on different risk assessment instruments for bail/probation/parole are pretty much impossible to say one is better than another.

One example type of data source that is almost entirely missing from NACJD (that I am aware of) is social network datasets relevant for criminology/criminal justice. So I have started a spreadsheet to collate different open source network datasets relevant for criminologists. So I have some from my work and a few other random examples I have come across on the internet.

SPREADSHEET OF NETWORK DATASETS

I have made that spreadsheet open, so anyone should be able to edit in more sources. (Feel free to include links to ICPSR as well, but if you do edit a note to say whether it is restricted access or not.) For here I would be interested in really large networks, for example would love to try to replicate Marie’s work on gang network transitions (Oullet et al., 2019a).

And also while I am here, Jacob Young has created a very nice introductory course to social network analysis. I have a brief lecture in my advanced research design class, but Jacob’s is much more thorough (and he is more of an expert in this area than I am for sure).

I will add to that spreadsheet over time as well. I have made a separate sheet for survival analysis datasets. I would be particularly keen for example criminal justice examples. So for network analysis we have examples of looking at use-of-force networks (Oullet et al., 2019b), and for survival analysis I would be interested in a time to solve example dataset. Unfortunately for the solved cases, NIBRS is a good resource but has a large confound in they don’t measure whether a case was ever assigned to a detective.

Feel free to add whatever in that spreadsheet, but what I was thinking was oriented towards different methods (again as a main motivation is for teaching). So for example if you knew of datasets for age-period-cohort modelling, or for estimating group-based-trajectory models, I think those would be good examples to start new sheets and collate different data sources.

References

  • Ouellet, M., Bouchard, M., & Charette, Y. (2019a). One gang dies, another gains? The network dynamics of criminal group persistence. Criminology, 57(1), 5-33.
  • Ouellet, M., Hashimi, S., Gravel, J., & Papachristos, A. V. (2019b). Network exposure and excessive use of force: Investigating the social transmission of police misconduct. Criminology & Public Policy, 18(3), 675-704.
  • Wheeler, A. P., Worden, R. E., & Silver, J. R. (2019). The accuracy of the violent offender identification directive tool to predict future gun violence. Criminal Justice and Behavior, 46(5), 770-788.

CrimRxiv, Alt-Journal Contributions, and Mike Maltz’s Retrospective

As I’m sure followers of mine know, I am a big proponent of posting pre-prints. Spearheaded by Scott Jacques, he has started a specifically criminology focused pre-print server title CrimRxiv. It is still in beta but anyone can contribute a paper if they want.

One of the things me and Scott have been jamming about is how to leverage crimrxiv to make a journal that not only takes advantage of all the goodies on the internet, such as being able to embed interactive graphics or other rich media directly in a journal articles. But to really widen the scope of what ‘counts’ in terms of scholarly contribution. Why can’t things like a cool app, or a really good video lecture you edited, or a blog post illustrating code be put on the same level with journal articles?

Part of the reason I am writing this blog post is that I saw Michael Maltz recently publish a retrospective on his career on Academia.edu. This isn’t a typical journal article, but despite that there is no reason why you shouldn’t share such pieces. So I was able to convince Mike to post A Retrospective Look at My Professional Life to crimrxiv. When he first posted it on Academia.edu here was my response on how Mike (despite never having crossed paths) has influenced my career.


Hi Michael and thank you for sharing,

I’ve followed your work since a grad student at Albany. I initially got hooked on data viz based on Tufte’s book. When I looked for examples of criminologists discussing data viz you were the only one I found. That was sometime around 2010, so you had that chapter in the handbook of quantitative crim. You also had another article about drawing glyphs to illustrate life course transitions I was familiar with.

When I finished my classes at SUNY, I then worked at Troy as a crime analyst while finishing my dissertation. I doubt any of the coffee shops were the same from your time, but I did like walking over to Famous hotdogs for lunch every now and then.

Most of my work at the PD was making time series graphs and maps. No regression, so most of my stats training was not particularly useful. Even my mapping course I took focused on areal data analysis was not terribly relevant.

I tried to do similar projects to your glyph life-courses with interval censored crime data, but I was never really successful with that, they always ended up being too complicated with even moderately large crime datasets, see https://andrewpwheeler.com/2013/02/28/interval-graph-for-viz-temporal-overlap-in-crime-events/ and https://andrewpwheeler.com/2014/10/02/stacking-intervals/ for my attempts.

What was much more helpful was simply doing monitoring metrics over time, simple running means, and then I just inverted the PDF of the Poisson to give error bars, e.g. https://andrewpwheeler.com/2016/06/23/weekly-and-monthly-graphs-for-monitoring-crime-patterns-spss/. Then cases that were outside the error bands signified an anomalous pattern. In Troy there was an arrest of a single prolific person breaking into cars, and the trend went from a creeping 10 year high to a 10 year low instantly in those graphs.

So there again we have your work on the Poisson distribution and operations research in that JQC article. Also sometime in there I saw a comment you made on Andrew Gelman’s blog pointing to your work with error bands for BJS. Took that ‘fan chart’ idea later on and provided error bands for city level and USA level homicide trends, e.g. https://apwheele.github.io/MathPosts/FanChart_NewOrleans.html. Most of popular discussion of large scale crime trends is misguided over-interpreting short term noise in my opinion.

So all my degrees are in criminal justice, but I have been focusing more on linear programming over time borrowing from operations researchers as well, https://andrewpwheeler.com/2020/05/29/an-intro-to-linear-programming-for-criminologists/. I’ve found that taking outputs from a predictive model and then applying a decision analysis to specifically articulate strategies CJ agencies should take is much more fruitful than the typical way academic research is done.

Thank you again for sharing your story and best, Andy Wheeler

New paper out: Trauma Center Drive Time Distances and Fatal Outcomes among Gunshot Wound Victims

A recent paper with Gio Circo, Trauma Center Drive Time Distances and Fatal Outcomes among Gunshot Wound Victims, was published in Applied Spatial Analysis and Policy. In this work, me and Gio estimate the marginal effect that drive time distances to the nearest Level 1 trauma center have on the probability a victim dies of a gun shot wound, using open Philadelphia data.

If you do not have access to that published version, here is a pre-print version. (And you can always email me or Gio and ask for a copy.) Also because we use open data, we have posted the data and code used for the analysis. (Gio did most of the work!)

For a bit of the background on the project, Gio had another paper estimating a similar model using Detroit data. But Gio estimated those models with aggregate data. I was familiar with more detailed Philly shooting data, as I used it for an example hot spot cluster map in my GIS crime mapping class.

There are two benefits to leveraging micro data instead of the aggregated data. One is that you can incorporate micro level incident characteristics into the model. The other is that you can get the exact XY coordinates where the incident occurred. And using those exact coordinates we calculate drive time distances to the hospital, which offer a slight benefit in terms of leave-one-out cross-validated accuracy compared to Euclidean distances.

So in terms of incident level characteristics, the biggest factor in determining your probability of death is not the distance to the nearest hospital, but where you physically get shot on your body. Here is a marginal effect plot from our models, showing how the joint effect of injury location (as different colors) and the drive time distance impact the probability of death. So if you get shot in the head vs the torso, you have around a 30% jump in the probability of death from that gun shot wound. Or if you get shot in an extremity you have a very low probability of death as well.

But you can see from that the margins for drive times are not negligible. So if you are nearby a hospital and shot in the torso your probability of dying is around 20%, whereas if you are 30 minutes away your probability rises to around 30%. You can then use this to map out isochrone type survivability estimates over the city. This example map is if you get shot in the torso, and the probability of death based on the drive time distance to the nearest Level 1 trauma location.

Fortunately many shootings do not occur in the northern most parts of Philadelphia, here is a map of the number of shootings over the city for our sample.

You can subsequently use these models to either do hypothetical take a trauma center away or add a trauma center. So given the density of shootings and drive time distances, it might make sense for Philly to invest in a trauma center in the shooting hot spot in the Kensington area (northeast of Temple). (You could technically figure out an ‘optimal’ location given the distribution of shootings, but since you can’t just plop down a hospital wherever it would make more sense to do hypothetical investments in current hospitals.)

For a simplified example, imagine you had 100 shootings in the torso that were an average 20 minutes away. The average probability of death in that case is around 25% (so ~25 homicides). If you hypothetically have a location that is only 5 minutes away, the probability goes down to more like 20% (so ~20 homicides). So in that hypothetical, the distance margin would have prevented 5 deaths.

One future piece of research I would be interested in examining is pre-post Shotspotter. So in that article Jen Doleac is right in that the emipirical evidence for Shotspotter reducing shootings is pretty flimsy, but preventing mortality by getting to the scene faster may be one mechanism that ShotSpotter can justify its cost.

Some more peer review musings

It is academics favorite pastime to complain about peer review. There is no getting around it – peer review is degrading, both in the adjective sense of demoralizing, as well as in the verb ‘wear down a rock’ sense.

There is nothing quite like it in other professional spheres that I can tell. For example if I receive a code review at work, it is not adversarial in nature like peer review is. I think most peer reviewers treat the process like a prosecutor, poking at all the minutia that they can uncover, as opposed to being judges of the truth.

At work I also have a pretty unobjectional criteria for success – if my machine learning model is making money then it is good. Not so for academic papers. Despite everyone learning about the ‘scientific method’, academics don’t really have a coda to follow like that. And that lack of objective criteria causes quite a bit of friction in the peer review process.

But all the things I think are wrong with peer review I have a hard time articulating succinctly. So we have a bias problem, in that reviewers have preferences for particular methods or styles. We have a problem that individuals get judged based on highly arbitrary standards of what is interesting. Many critiques are focused on highly pedantic things that make little to no material difference, such as the use of personal pronouns. Style advice can be quite bad to be frank, in my career I’m up to something like four different peer reviews saying providing results in the introduction is inappropriate. Edit: And these complaints are not exhaustive as well, we have reviewers pile on endless complaints in multiple rounds, and people phone it in with nonsense short descriptions as well. (I’m sure I can continue to add to the list, but I’ve personally experienced all of these things, in addition to being called a racist in peer review.)

I’ve previously provided advice about how I think peer reviews should be done. To sum up my advice:

  • Differentiate between big problems and minor stuff
  • Be very specific what things you want changed (and how to change them)

I think I should add two to this as well – don’t be a jerk, and don’t pile on (or be respectful of peoples time). For the jerk part I don’t even meet that standard if I am being honest with myself. For one of my peer reviews I am going to share a later round I got pretty snippy (with the Urban folks on their CCTV paper in Milwaukee), that was over the top in retrospect. (I don’t think reviewers should get a say on revisions, editors should just arbiter whether the responses by the original authors were sufficient. I am pretty much never happy if I suggest something and an author says no thanks.)

For the pile on part, I recently posted my responses to my cost of crime hot spots paper. Although all three reviewers were positive, it still took me ~40 hours to respond to all of the critiques. Even though all of the reviews were in good faith, it honestly was not worth my time to make those revisions. (I think two were legitimate, changing the front end to say my hot spots are crime cost, not crime harm, and the ask for more details on the Hunt estimates. The rest were just fluff though.)

If you do consulting think about your rate – and whether addressing all those minor critiques meet the threshold of ‘is the paper improved by the amount to justify my consulting fee’. My experience it does not come close, and I am quite a cheap consultant.

So I have shared in the past a few examples of my response to reviewers (besides above, see here for the responses to my how to select participants for focussed deterrence paper, and here for my tables and graphs for crime analysis paper). But maybe instead of bagging on others, I should just try to lead by example.

So below are several of my recent reviews. I’ve only pulled out the recent ones that I know the paper has been published. This is then subject to a selection bias, papers I have more negative things to say are less likely to be published in the end. So keep that bias in mind when judging these.

The major components of how I believe I am different from the modal reviewer is I subdivide between major and minor concerns. Many reviewers take the running commentary through the paper approach, which not only does not distinguish between major/minor, but many critiques are often explicitly addressed (just at a different point in the manuscript from where it popped into the reviewers head).

The way I do my reviews I actually do the running commentary in the side of the paper, then I sleep on it, then I organize into the major sections. In this process of organizing I drop quite a bit of minor fluff, or things that are cleared up at later points in the paper. (I probably end up deleting ~50% of my original notes.)

Second, for papers I give a thumbs up for I take actual time to articulate why they are good. I don’t just give a complement sandwich and pile on a series of minor critiques. Positive comments are fleeting in peer review, but necessary to judge a piece.

So here are some of my examples from peer review, take them for what they are worth. I no doubt do not always follow my advice I lay out above, by try my best to.


Title: Going Local: Do Consent Decrees and Other Forms of Federal Intervention in Municipal Police Departments Reduce Police Killings?

The article is well written and a timely topic. The authors have a quote that I think sums it up quite nicely “This paper is therefore the first to evaluate the effects of civil rights investigations and the consent decree process on an important – arguably the most important – measure of use of force: death.” Well put.

The analysis is also well executed. It is observational, but city fixed effects and the event history study were the main things I was looking for.

I have a three major revision requests. But they are just editing the manuscript type stuff, so can be easily accomplished by the authors.

  1. Drop the analysis of Crime Rates

While I think the analysis of officer deaths is well done, the analysis of crime rates I have more doubts about. Front end is not focused on this at all, and would need a whole different section about it discussing recent work in Chicago/Baltimore/NYC. Also I don’t think the same analysis (fixed effects), is sufficient for crime trends – we are basically talking about the crime drop period. Also very important to assess heterogeneity in that analysis – a big part of the discussion is that Chicago/Baltimore are different than NYC.

The analysis of officer deaths is sufficient to stand on its own. I agree the crime rates question is important – save it for another paper where it can be given enough attention to do the topic justice.

  1. Conclusion

Like I said, I was most concerned about the event study, and the authors show that there was some evidence of pre-treatment selection trends. You don’t talk about the event study results in the conclusion though. It is a limitation that the event study analysis had less clear evidence of crime reductions than the simpler pre/post analysis.

This is likely due to larger error bars with the rare outcome, which is itself a limitation of using shootings as the outcome. I think it deserves to be mentioned even if overall the effects of consent decrees are not 100% clear on reducing officer involved deaths, locally monitoring more frequent outcomes is worthwhile. See the recent work by MacDonald/Braga in NYC. New Orleans is another example, https://journals.sagepub.com/doi/full/10.1177/1098611117709785.

I note the results are important to publish without respect to the results of the analysis. It would be important to publish this work even if it did not show reductions in officer involved deaths.

  1. Front end lit review

For 2.2 racial disparities, this section misses much of the recent work on the topic of officer involved shootings except for Greg Ridgeway’s article. There is a wide array of work that uses other counterfactual approaches to examine implicit bias, see Roland Fryer work (https://www.journals.uchicago.edu/doi/abs/10.1086/701423), or the Wheeler/Worrall papers on Dallas shoot/don’t shoot data (https://journals.sagepub.com/doi/full/10.1177/1525107118759900 & https://journals.sagepub.com/doi/full/10.1177/0011128718756038). These are the observational duel to the experimental lab work by Lois James. Also there are separate papers by Justin Nix (https://www.tandfonline.com/doi/full/10.1080/0735648X.2018.1547269), and Joseph Cesario (https://journals.sagepub.com/doi/10.1177/1948550618775108) that show when using different benchmarks estimates of disparity can vary by quite abit.

For the use of force review (section 2.3), people typically talk about situational factors that influence use of force (in addition to individual/extra-legal and organizational). So you may say “consent decrees don’t/can’t change situational behavior of offenders, so what is the point of writing about it” tis true, but it still should be articulated. To the extent that situational factors are a large determinant of shootings, it may be consent decrees are not the right way to reduce officer deaths then if it is all situational. But, consent decrees may indirectly effect police/citizen interactions (such as via de-escalation or procedural justice training), that could be a mechanism through which fewer officer deaths occur.

Long story short, 2.2 should be updated with more recent work on officer involved shootings and the benchmark problem, and 2.3 should be updated to include discussion of the importance of situational factors in the use of force.

Additional minor stuff:

  • pg 12, killings are not a proxy for use of force (they count as force!)
  • regression equations need some editing. Definitely need a log or exponential function on one of the sides of the equation, and generalized linear models do not have an error term like linear models do. I personally write them as:

log(E[crime_it]) = intercept + B1*X + …..

where E[crime_it] is the expected value of crime at place i and time t (equivalent to lambda in your current formulation).

  • pg 19 monitor misspelled (equation type)

Title: Immigration Enforcement, Crime and Demography: Evidence from the Legal Arizona Workers Act

Well done paper, uses current status quo empirical techniques to estimate the effect employment oversight for illegal immigrant workers had on subsequent crime reductions.

Every critique I thought of was systematically addresses in the paper already. Discussed issues with potential demographic spillovers (biasing estimates because controls may have crime increases). Eliminating states from the pool of placebos with stronger E-Verify support, and later on robustness checks for neighboring states. And using some simple analyses to give decompositions that would be expected due to the decrease in the share of young males.

Minor stuff

  • Light and Miller quote on pg 21 not sure if it is space/dash missing or just kerning problems in LaTex
  • Pg 26, you do the estimates for 08 and 09 separately, but I think you should pool 08-09 together in that table (the graphs do the visual tests for 08 & 09 independently). For example, Violent crimes are negative for both 08 & 09, which in the graphs are not outside the typical bounds for each individual year, but cumulatively that may be quite low (most will have a variable low and then high). This should get you more power I think given the few potential placebo tests. So it should be something like (-0.067 + -0.108) with error bars for violent crimes.
  • I had to stare at your change equation (pg 31) for quite a bit. I get the denominator of equation 2, I’m confused about the numerator (although it is correct). Here is how I did it (using your m1, m2, a, and X).

Pre-Crime = m1*aX + (1 - m1)X = X * (m1*a + 1 - m1)

Post-Crime = m2*aX + (1 - m2)X = X * (m2*a + 1 - m2) #so you can drop the X

% Change = (Post - Pre) / Pre = Post/Pre - 1

At least that is easier for me to wrap my head around. Also should m1 and m2 be the overall share of young adults? Not just limited to immigrants? (Since the estimated crime reduction is for everybody, not just crimes committed by immigrants?)


Title: How do close-circuit television cameras impact crimes and clearances? An evaluation of the Milwaukee Police Department’s public surveillance system

Well written paper. Uses appropriate quasi-experimental methods (matching and diff-in-diff) to evaluate reductions in crimes and potential increases in case clearances.

I have some minor requests on analysis/descriptive stats, but things that can be easily addressed by the authors.

First, I would request a simpler pre-post DiD table. The panel models not only introduce more complications of interpretation, but they are based on asymptotics that I’m not sure if they are met here.

So if you do something like:

              Pre Crime   Post Crime  Difference DiD
Treated      100          80              -20         -30
Control      100        110               10  

It is much more straightforward to interpret, and I provide a stat test and power analysis advice in https://crimesciencejournal.biomedcentral.com/articles/10.1186/s40163-018-0085-5. Your violent crime counts are very low, so I think you would need unrealistic effects (of say 50% reductions) to detect an effect with your group sizes.

You can do the same table for % clearances, and do whatever binomial test of proportions (which will have much more power than the regression equation). And again is simpler to interpret.

Technically doing a poisson regression with an exposure is not the same as modelling the clearance counts with total crimes as an exposure. The predicted PMF can technically go above 1 (so you can have %’s above 100%). It would be more appropriate to use binomial regression, so something like below in Stata:

glm arrests i.Treat i.Post Treat#Post i.Unit, family(binomial crimes) link(logit)

(No xtbinomial unfortunately, maybe can coerce xtlogit with weights to do it, or use meglm since you are doing random effects. I think you should probably do fixed effects here anyway.) I don’t know if it will make a difference in the results here though.

Andy Wheeler


Title: The formation of suspicion: A vignette study

Well done experimental study examining suspiciousness using a vignette approach. There is no power analysis done, but it is a pretty large sample, and only examines first order effects. (With a note about examining a well powered interaction effect described in the analysis section.)

My only big ask is that the analysis should include a dummy variable for the two different sampling frames (e.g. a dummy variable for 1=New York). Everything else is minor/easy stuff.

Minor stuff:

  • How were the vignettes randomized? (They obviously were, balance is really good!)
  • For the discussion, it is important to understand these characteristics that start an interaction because of another KT heuristic bias – anchoring effects. Paul Taylor has some recent work of interest on Dispatch priming that is relevent. (Also Dan Mears had a recent overview paper on biases/heuristics in Journal of Criminal Justice I also think should probably be cited.)
  • For Table 1 I would label the “Dependent Variable” with a more descriptive label (suspiciousness)
  • Also people typically code the variables 0/1 instead of 1/2, it actually won’t impact the analysis here, since it is just a linear shift of +1 (just change the intercept term). The variables of Agency Size, Education, & Experience are coded as ordinal variables though, and they should maybe be included as dummy variables for each category. (I don’t think this will make a big difference though for the main randomized variables of interest though.)

Title: The criminogenic effect of marijuana dispensaries in Denver, Colorado: A microsynthetic controls quasi-experiment and cost-benefit analysis

This work is excellent. It is a timely topic, as many states are still considering whether to legalize and what that will exactly look like.

The work is a replication/extension of a prior study also in Denver, but this is a stronger research design. Using the micro units allows for matching on pre-trends and drawing synthetic controls, which is a stronger design than prior pre/post work at neighborhood level (both in Denver and in LA). Micro units are also more relevant to test direct criminogenic effects of the stores. The authors may be interested in https://www.tandfonline.com/doi/full/10.1080/0735648X.2019.1582351, which is also a stronger research design using matched comparison groups, but is only for medical dispensaries in DC and is a much smaller sample.

Even if one considers that to be a minor contribution (crime increase findings are similar magnitude to Hughes JQ paper), the cost benefit analysis is a really important contribution. It hits on all of the important considerations – that even if the book of costs/benefits is balanced, they are relevant to really different segments of society. So even if tax revenue offsets the books, places may still not want to take on that extra crime burden.

Only two (very minor) suggestions. One, some of the permutation lines are outside of the figure boundaries. Two, I would like a brief ado in the discussion mentioning the trade-off in economic investment making places busier (which fits right into the current discussion of how costs-benefits). Likely if you added 100 ice-cream shops crime might go up due to increased commercial activity – weed has the same potential negative externality – but is not necessarily worse than say opening a bunch of bars or convenience stores. (Same thing is relevant for any BID, https://journals.sagepub.com/doi/full/10.1177/0011128719834559.)


Title: Understanding the Predictors of Street Robbery Hot Spots: A Matched Pairs Analysis and Systematic Social Observation

Note: Reviewed for Justice Quarterly and rejected. Final published version is at Crime & Delinquency (which I was not a reviewer for)

The article uses the case-control method to match high-crime to low-crime street segments, and then examine local land use factors (bars, convenience stores, etc.) as well as the more novel source of physical disorder coded from Google Street View images. The latter is the main motivation for the case-control design, as manual coding prevents one from doing the entire city.

Case-control designs by their nature you cannot manipulate the number of cases you have at your disposal. Thus the majority of such designs typically focus on ONE exposure of interest, and match on any other characteristic that is known to affect the outcome, but is not of direct interest to the study. E.g. if you examined lung cancer given exposure to vehicle emissions, you would need to match controls as to whether they smoked or not. This allows you to assess the exposure of interest with the maximum power given the design limitations, although you can’t say anything about smoking vs not smoking on the outcome.

The authors here match within neighborhoods and on several other local characteristics, but then go onto examine different crime generators (7 factors), as well as the two disorder coded variables. Given that this is a fairly small sample size of 129 matched cases, this is likely a pretty underpowered research design. Logistic regression relies on asymptotic properties, and even with fewer variables it is questionable whether 260 cases is sufficient, see https://www.tandfonline.com/doi/abs/10.1080/02664763.2017.1282441. Thus you get in abstract terms fairly large odds ratios, but are still not significant (e.g. physical disorder has an odds ratio of 1.7, but is insignificant). So you have low power to test all of those coefficients.

I believe a stronger research design would focus on the novel measures (the Google Street View disorder), and match on the crime generator variables. The crime generator factors have been well established in prior research, so that work is a small contribution. The front end focuses on the typical crime generator typology, and lacks any broader “so what” about the disorder measures. (Which can be made, focusing on broken windows theory and the controversy it has caused.)

It would be feasible for authors to match on the crime generators, but it would result in different control cases, so they would need to code additional locations. (If you do match on crime generators, I think it is OK to include 0 crime areas in the pool. Main reason it is sometimes not fair to include 0 crime areas is because they are things like a bridge or a park.)

Minor notes:

  • You cannot interpret the coefficients in causal terms on which you matched in the conclusion. (Top page 23.) It only says the extent to which your matching was successful, not anything about causality. Later on you also attempt to weasel out of causal interpretations (page 26). Saying this is not causality, but otherwise interpreting regression coefficients as if they have any meaning is an act of cognitive dissonance.
  • Given that there is perfect separation between convenience stores and hot spots, the model should have infinite standard errors for that factor. (You have a few coefficients that appear to have explosive standard errors.)
  • I wouldn’t describe as you match on the dependent variable [bottom page 8]. I realize it is confusing mixing propensity score terminology with case-control (although the method is fine). You match cases-to-controls on the independent variables you choose at the onset.
  • page 5, Dan O’Brien in his work has shown that you have super-callers for 311. Which fits right into your point of how coding of images may be better, as a single person can bias the measure at micro places. (This is sort of the opposite of not calling problem you mention.)
  • You may be interested, some folks have tried to automate the scoring part using computer vision, see https://ieeexplore.ieee.org/document/6910072?tp=&arnumber=6910072&url=http:%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D6910072 or https://mikebader.net/projects/canvas-usa/ . George Mohler had a talk at ASC where he used Google’s automated image labeling to identify disorder (like graffiti) in pictures https://cloud.google.com/vision/docs/labels

 

Resources of interest for criminologists and crime analysts

I tend to get about one email per week asking for help. Majority of folks are either students asking for general research advice, or individuals who came across my webpage asking for advice about code.

This is great, and everyone should always feel open to send me an email. The utility of me answering these questions (for everyone) are likely greater than spending time working on a paper, so I do not mind at all. I can currently keep up with the questions given the volume (but not by much, and is dependent on how busy I am with other work/family things). Worst case I will send an email response that says sorry I cannot respond to this anytime soon.

Many times there are other forums though for people to post questions that are ultimately better. One, I participate in many of these, so it is not like sending an email just to me, it is like sending an email to me + 40 other people who can answer your question. Also from my perspective it is better to answer a question once in one of these forums, than repeat the same answer a dozen different times. (Many times I write a blog post if I get the same question multiple times.)

While the two groups overlap a bit, I separate out resources aimed at criminologists (as typically more interested in research and are current master/PhD students), whereas crime analysts are embedded in a criminal justice agency.

For Criminologists

For resources on where to ask questions, Jacob Kaplan recently created a slack channel, crimhelp.slack.com. It has been joined by a variety of criminologists, folks in think tanks/research institutes, current graduate students, and some working crime analysts. It is new, but you can go and peruse the topics so far, they are pretty wide in scope.

So that forum you can really ask about anything crim related, the remaining resources are more devoted towards programming/statistical analysis.

If you are interested in statistical or programming questions, I used to participate in StackOverflow, Cross Validated (the stats site), and the GIS site. They are good places to check out prior answers, and are worth a shot asking a question on occasion. For tricky python or R coding questions that are small in scope, StackOverflow is excellent. Anything more complicated it is more hit or miss.

Many programming languages have their own question boards. Stata and SPSS are ones I am familiar with and tend to receive good responses (I still actively participate in the SPSS board). If I’m interested in learning some new command/library in Stata, I often just search the forum for posts related to it to check it out in the wild.

For programming questions, it is often useful to create a minimal reproducible example to describe the problem, show what the input data looks like and how you want the output data to look like. (In fact on the forums I link to you will almost always be asked explicitly to do that.)

For Crime Analysts

In similar spirit to the crim slack channel, Police Rewired has a Discord group for crime analysts (not 100% sure who started it, Andreas Varotsis is one of the people involved though). So that was founded by some UK analysts, but there are US analysts participating as well (and the problems folks deal with are very similar, so no real point in making a distinction between US/UK).

For crime analysts in the US, you should likely join either the IACA or a local crime analyst network. Many of the local ones come bundled, so if you join the Texas analyst network TXLEAN you also automatically get an IACA membership. To join is cheap (especially for current students). IACA has also started a user question forum as well.

For folks looking to get an entry level gig, the IACA has a job board that is really good. So it is worth the $10 just for that. They have various other intro resources though as well. For current BA/Masters students who are looking to get a job, I also suggest applying to private sector analyst jobs as well. They are mostly exchangeable with a crime analyst role. (Think more excel jockey than writing detailed statistic programming.)

How I learn to code

What prompted this blog post is that I’ve gotten asked by maybe 5 different people in the past month or so asking for resources to learn about statistical programming. And honestly I do not have a good answer, I’ve never really sat down with a book and learned a statistical software (tried on a few occasions and failed). I’m always just project focused.

So I wanted to do an example conjunctive analysis, or deep learning with pytorch, or using conformal prediction intervals to generate synthetic control estimates, etc. So I just sat down and figured out how to do those specific projects using various resources around the internet. One of my next personal projects is going to estimate prediction intervals for logistic multilevel models using Julia (based on this very nice set of intros to the language). I also need to get a working familiarity with Tableau. (Both are related to projects I am doing at work.) So expect to see a Tableau dashboard on the blog sometime in the near future!

Also many statistical programming languages are pretty much exchangeable for the vast majority of tasks people do. You can see that I have example blog posts for Excel, Access/SQL, R, SPSS, Stata, python, and ArcGIS. Just pick one and figure it for a particular project.

For criminologists, I have posted my Phd research course materials, and for Crime Analysts I have posted my GIS Class and my Crime Analysis course materials (although the GIS course is already out of date, it uses Arc Desktop instead of ArcPro). I don’t suggest you sit down and go through the courses though page-by-page. What I more suggest is look at the table of contents, see if anything strikes your fancy, read that particular lecture/code, and if you want to apply it to your own projects try to work it out. (At least that is how I go about learning coding.)

If you want more traditional learning materials for learning how to do code (e.g. textbooks or online courses), I suggest you ask folks on those forums I mentioned. They will likely be able to provide much better advice than I would.

To end it is totally normal to want to ask questions, get advice, or get feedback. Both my experience in Academia and in Crime Analysis it can be very lonely (I was in a small department, so was the only analyst). Folks on these forums are happy to help and connect.

300 blog posts and public good criminology

This isn’t technically my 300th blog post, but the 300th page I’ve constructed on my blog (so e.g. it includes when I’ve made a page for a class). I’ve posted a spreadsheet of the titles and dates of the posts over time (and updating it I noticed I was at 300).

I typically get around 200~300 views per day. Most of these are probably bots, but unless say over 90% are bots this website gets way more views than the cumulative views of all my academic papers combined. Here is a screen shot of the stats wordpress gives to me. My downtick in 2019 I thought was going to spiral into very few views, but it is still holding on.

I kind of have three different types of blog posts. One are example code snippets/data analysis. Often these are things I have done multiple times, so I want to create a record for me to more easily search up later. For example making a hexbin map in ggplot, or a margins plot in Stata. I wrote a recent post because I was talking with a friend about crime weights, and I wanted an example of using regression in python and an error bar plot for my library. (Quite a few birds with that stone.)

Two are questions I repeatedly encounter by students. For example, I made a list of demographic variables I use in the census, and where to find or scrape crime generator variables. Consistently my most popular post is testing the equality of two regression coefficients.

The third are just more generic opinion pieces. For example my notes on (the now late) David Bayley’s writing on the police potential to reduce crime, or Jane Jacob’s take on neighborhoods, or that I don’t think latent trajectories are real things.

Some are multiple of these categories put together, particularly opinion pieces with example code snippets to illustrate the points I am making. Like a simulation of why I like to model individual delinquency items, or how to balance false positives in bail decisions.

On Public Good Criminology

None of these per se fit in the example framework of typical peer review output. So despite no peer review, I think things like deriving optimal treatment allocation with network spillovers, or that conformal predictions intervals for synthetic control estimates are much smaller than permutation tests are a substantive contribution to share!

So that brings me to the public good point. Most criminologists have a default of only valuing a closed peer review system. Despite my blog posts not being peer reviewed (ditto for the pre-prints I post at first), I hope folks can take the time to judge for themselves whether they are valuable or not. We would be much better off as a group if we did things like share code, share class preps, or failed projects by default.

Some of these posts I might write up if we had a short journal for our field akin to Economics Letters, but even that is a lot of work for very little value added to be frank. (If I had infinite time I also might turn my notes on Poisson/Negative Binomial regression into a little Sage green book.) Being a private sector data scientist now without the tenure boot on my neck, I don’t really have any need or desire to go through that process.

If all you value are getting the opinions of a handful of other academics than by all means keep your work close to the chest and only publish in peer reviewed journals. If you want to provide a public good though, your work actually needs to be public.

Conjoint Analysis of Crime Rankings

So part of my recent research mapping crime harm spots uses cost of crime estimates relevant to police departments (Wheeler & Reuter, 2020). But a limitation of this is that cost of crime estimates are always somewhat arbitrary.

For a simple example, those cost estimates are based mostly on people time by the PD to respond to crimes and devote investigative resources. Many big city PDs entirely triage crimes like breaking into vehicles though. So based on PD response the cost of those crimes are basically $0 (especially if PDs have an online reporting system).

But I don’t think the public would agree with that sentiment! So in an act of cognitive dissonance with my prior post, I think asking the public is likely necessary for police to be able to ultimately serve the publics interest when doing valuations. For some ethical trade-offs (like targeting hot spots vs increasing disproportionate minority contact, Wheeler, 2019) I am not sure there is any other reasonable approach than simply getting a bunch of peoples opinions.

But that being said, I suspected that these different metrics would provide pretty similar rankings for crime severity overall. So while it is criminology 101 that official crime and normative perceptions of deviance are not a perfect 1 to 1 mapping, most folks (across time and space) have largely similar agreement on the severity of different crimes, e.g. that assault is worse than theft.

So what I did was grab some survey ranking of crime data from the original source of crime ranking that I know of, Marvin Wolfgang’s supplement to the national crime victimization survey (Wolfgang et al., 2006). I have placed all the code in this github folder to replicate. And in particular check out this Jupyter notebook with the main analysis.

Conjoint Analysis of Crime Ranks

This analysis is often referred to as conjoint analysis. There are a bunch of different ways to conduct conjoint analysis – some ask folks to create a ranked list of items, others ask folks to choose between a list of a few items, and others ask folks to rank problems on a Likert item 1-5 scale. I would maybe guess Likert items are the most common in our field, see for example Spelman (2004) using surveys of asking people about disorder problems (and that data is available to, Taylor, 2008).

The Wolfgang survey I use here is crazy complicated, see the codebook, but in a nutshell they had an anchoring question where they assigned stealing a bike to a value of 10, and then asked folks to give a numeric score relative to that theft for a series of 24 other crime questions. Here I only analyze one version of the questionnaire, and after eliminating missing data there are still over 4,000 responses (in 1977!).

So you could do analyze those metric scores directly, but I am doing the lazy route and just doing a rank ordering (where ties are the average rank) within person. Then conjoint analysis is simply a regression predicting the rank. See the notebook for a more detailed walkthrough, so this just produces the same analysis as looking at the means of the ranks.

About the only thing I do different here than typical conjoint analysis is that I rescale the frequency weights (just changes the degrees of freedom for standard error estimates) to account for the repeated nature of the observations (e.g. I treat it like a sample of 4000 some observations, not 4000*25 observations). (I don’t worry about the survey weights here.)

To test my assertion of whether these different ranking systems will be largely in agreement, I take Jerry’s crime harm paper (Ratcliffe, 2015), which is based on sentencing guidelines, and map them as best I could to the Wolfgang questions (you could argue with me some though on those assements – and some questions don’t have any analog, like a company dumping waste). I rescaled the Wolfgang rankings to be in a range of 1-14, same as Jerry’s, instead of 1-25.

Doing a more deep dive into the Wolfgang questions, there are definately different levels in the nature of the questions you can tease out. Folks clearly take into account both harm to the victim and total damages/theft amounts. But overall the two systems are fairly correlated. So if an analyst wants to make crime harm spots now, I think it is reasonable to use one of these ranking systems, and then worry about getting the public perspective later on down the line.

The Wolfgang survey is really incredible. In this regression framework you can either adjust for other characteristics (e.g. it asks about all the usual demographics) or look at interactions (do folks who were recently victimized up their scores). So this is really just scratching the surface. I imagine if someone redid it with current data many of the metrics would be similar as well, although if I needed to do this I don’t think I would devise something as complicated as this, and would ask people to rank a smaller set of items directly.

References

  • Ratcliffe, J.H. (2015). Towards an index for harm-focused policing. Policing: A Journal of Policy and Practice, 9(2), 164-182.
  • Spelman, W. (2004). Optimal targeting of incivility-reduction strategies. Journal of Quantitative Criminology, 20(1), 63-88.
  • Taylor, R.B. (2008). Impacts of Specific Incivilities on Responses to Crime and Local Commitment, 1979-1994: [Atlanta, Baltimore, Chicago, Minneapolis-St. Paul, and Seattle]. https://doi.org/10.3886/ICPSR02520.v1
  • Wheeler, A.P., & Reuter, S. (2020). Redrawing hot spots of crime in Dallas, Texas. https://doi.org/10.31235/osf.io/nmq8r
  • Wheeler, A.P. (2019). Allocating police resources while limiting racial inequality. Justice Quarterly, Online First.
  • Wolfgang, M.E., Figlio, R.M., Tracy, P.E., and Singer, S.I. (2006). National Crime Surveys: Index of Crime Severity, 1977. https://doi.org/10.3886/ICPSR08295.v1

Admin data should be used more often in policing research

I sometimes wonder if many researchers do not know actually what data police departments regularly collect. I commonly see articles on topics and think to myself “Hey, that is nice you did a survey on XYZ, why did you not confirm the responses with actual admin data on the same topic?”. Or I see topics that can be reasonably addressed using admin data not tackled at all by researchers.

So I decided to write this blog post.

I’ve mostly to date made a career out of analyzing administrative police data (only 2 out of my 30 some peer reviewed papers at this point are using non-regularly collected data as part of the analysis – and both of those link surveys to official crime records). To be honest I’m also motivated to write this as it is common for senior academics (in general in criminology, not just specific to policing researchers) to critique secondary data analysis (some of those folks are curmudgeons though, so maybe not worth stating). Of course you can do bad analysis with whatever data – primary or secondary makes no difference.

I think the default though should be to leverage admin data, so this sentiment I believe is in general misguided, and results in a lot of waste (time and money spent on primary data collection). I have never received research funding directly in my career (only as an RA for Rob Worden), so my work has essentially been for “free” on these projects (just my time). (I was basically subsidized by the university to do research!)

My opinion is based on two key points:

  1. Administrative data has already been collected by police agencies, so it has no additional costs for use by researchers.
  2. Administrative data defines core outcomes to which police agencies strive to reduce.

For 2 in particular this is reducing reported crime and reducing use of force. (Use of force can be conceived of as an “output” instead of an “outcome”, but I tend to think of it as a negative externality that should be minimized to the extent possible.) I’m sure a few folks are thinking here “these don’t define the potential universe of outcomes police departments are interested in” and I agree – permit me to discuss this in more detail in a few paragraphs. The argument I am making is ultimately fuzzy – not that we shouldn’t collect other data, but it should meet a higher threshold than using zero-cost data already collected by PDs.

What is Admin Policing Data?

For folks not familiar, police departments keep electronic records of various things, mostly related to crime and interactions with the public. All police departments I have worked with have these types of records in various tables/databases:

  • calls to 911 (Computer Automated Dispatch)
  • reported crimes and incidents
  • charges & arrests
  • discretionary stops (traffic and pedestrian)
  • use of force

All of these tables you can link to individual officers and/or individual citizens, as well as have a date-time and location stamp of where it happened. So you can do things like see all the cases detective X has been assigned and his specific clearance rate, or all cases in which Y was listed as a victim, or see the stop/use-of-force patterns of officer Z over time, etc.

Other types of admin data that are pretty regular are pysch screenings (especially for newer officers), civilian complaints, plain text detective/case notes, gang related databases (people/tags/incidents), databases of reported/recovered stolen goods, etc. Police collect alot of data! At this point PDs often have this data going back over a decade.

How often is Admin Policing Data Used in Policing Journal Articles?

To illustrate my point about admin data should be used more in policing research, I took the most recent issues of several policing journals and counted up the articles that used admin data. (There are probably more policing journals I missed, sorry, these are the ones I know of/have submitted articles to in the past.)

So this is a total of 14/50 ~28% in this sample. This is actually higher than I expected (I guessed 10%). Looking at the first issue of Police Quarterly for 2020 it is 0/5. The Policing Policy and Practice issue also contained a special sub-issue on recruit training, among them 0/6 likely contained administrative data. The Policing an International journal first issue of 2020 had a special issue on cyber crime, which appears to me have 2/14 papers using admin data. So if I add those stats, it is 16/75 ~ 21%.

I may be undercounting admin data here; for example I assume a survey of recruits is not a regular data collection (it hasn’t been in any police agency I’ve been involved with), but I of course may be wrong.

I’ve included as admin data looking at detective case notes (it is sort of like secondary analysis of a qualitative dataset!). Also counted as admin data one article that used the NCVS – which is regularly collected data (but by the federal govt, not local PD).

So you may squabble with my definitions here, but in broad strokes I don’t think any reasonable definition is likely to push this above ~1/3 papers in policing research use regularly collected admin data (in this sample of policing journals).

For reference I did a Twitter poll asking what proportion of policing research folks thought used admin data, and the distribution of the 86 responses was a slight favor for the right category (under 1/3rd, but almost the same amount guessed over 2/3’s).

https://twitter.com/CrimAndyW/status/1260195703017680898

So you can see a significant number of folks think that the distribution is opposite what it is in practice – the majority, not the minority, of policing research uses specially collected data and ignores admin data.

Restricting the subset to policing journals is likely to bias the estimate downward somewhat. I bet if I pulled policing articles from say Journal of Experimental Crim or Crime Science they are closer to 100% using admin policing data. But I think that also illustrates a pretty big discord in the current field of policing as well.

Some may think this cuts the research in terms of criminology/criminal justice – policing journals publish work on examining police behavior, whereas other journals tend to more frequently look at crime outcomes more associated with “criminological” research. This may be true, but admin data collected by police departments are pretty relevant for examining police behavior (e.g. proactive stops, use of force). These admin measures are almost always more relevant to police behavior than surveys of opinions! If you do surveys you should often tie it to these other admin measures to provide secondary evidence of different relevant measures.

Whats Wrong with Collecting New Data?

My argument is explicitly value-laden – I don’t know the correct percent of policing research that should use admin police data. But I do think the current swing in which the clear majority of research is oriented to collect primary data is wrong. Those primary data collections have both more costs (above data already collected by police agencies) and, for the most part, ignore core outcomes to which PDs strive for.

For example, the National Institute of Justice has stated they want researchers to move away from admin data. One reason for this is that past researchers have been unsuccessful lowering crime, and so you should collect alternative measures to validate your intervention.

This I believe is an actively harmful perspective called “goal switching,” and in general makes little sense. If crime is so rare a study is ultimately poorly powered, there isn’t much potential benefit to reducing crime in that area even if the intervention does work in practice. Best case you need to do longer interventions. I mean if you want to reduce violent crime you can look at community sentiment if you want; it doesn’t make sense though to entirely drop the ultimate goal of violence reduction in its place though!

And this gets to the crux of core outcomes police should strive for. It is a normative question, but I believe reduced crime and reduced use of force are relatively well agreed upon general goals of police. I think it is OK to have secondary measures – such as say attitudes towards police or fear of crime or measures of police stress. But these measures have several things working against them.

One, they are not regularly collected as administrative datasets. I imagine you can troll up a few examples of PDs who have started to do regular surveys of attitudes towards police (either general public or specific post-PD contact), but vast majority have not. So say you have an intervention intended to improve attitudes towards police. Great! For a police department interested in implementing that program, they not only have to allocate resources to that project, but also put an item in the budget to do the surveys forever. (This isn’t always true though, I think for example Rylan Simpson’s work is strong enough to justify making those low cost appearance changes and you don’t need to forever do surveys to see if it is working.) But for most interventions you can’t just do it once and hope it has improved indefinitely! (Same as you can’t stop measuring crime just because something you did made crime go down one time.)

Two, they are pretty fuzzy as to whether they should be reasonably swapped out for goals of crime reduction and reduced use of force in-and-of themselves. For sake of argument say hot spots policing causes back fire effects that cause increased fear of crime. How exactly do you trade off fear of crime vs actual crime reduction? Personally I think actual crime reductions should take precedence in that scenario. If you want to justify actually measuring fear of crime, you need to make some value based arguments to justify at minimum the cost of doing surveys. You should also probably justify altering police behavior in a particular way to improve that particular metric as well.

So any time you do a secondary data collection, you need to actually valuate the costs of the measures somehow (which I know is very difficult, hence it makes more sense to default to using admin data that is costless in terms of research!) Costless is probably a bit of a misnomer though – police departments have already sunk a lot of resources into collecting that admin data (patrol officers likely spend about equal time on dealing with people as they do with paperwork). But it is costless in terms of capital for me to query a database and say “use of force went down 10% after you instituted this policy”.

I think plenty of research collecting unique measures has potential to meet this threshold. One of the motivations to write this was Lois James articles on EIS – I think her general idea of doing a more deep dive to tease out more detailed interaction measures could be really important work (especially if it can be automated in a particular way, say through BWC footage). Lois’s work is just one example though. I also think measures of say police stressors could be very important in measuring churn of police officers over time. I already stated I think Rylan Simpson’s work on perceptions of police is well justified based on his simple experiments (since they are very low cost interventions, like wear purple gloves instead of black, or no cost e.g. take off your sunglasses when interviewing folks).

So these have potential to be worth the cost for police departments to open up their pocket books and collect those measures, but that is a bridge further than the majority of research currently being publishing in policing journals.

Some Caveats

So this is like I said a value-laden and fuzzy argument. No doubt some folks doing qualitative research or surveys will think this is loathsome, and think “I can’t answer my research question using administrative data”.

I intend the argument to go the other way though – we can be doing so much more quality research for much less cost. It is also the case that folks I believe need in general to do a much better job tying contemporary policing research to actual real life outcomes such as crime and use of force. Like I said I think the default should be basically the opposite proportion of what policing research looks like at the moment.

I’m not saying folks can’t do more basic data measures and collection – but as is the vast majority of this research lacks any semblance of a cost-benefit analysis that would justify the cost to collect those measures. As is, even if folks hypotheses are validated in a one time data collection, they lack the necessary valuation to justify police departments implement those measures going forward in practice. (Many of these same valuation critiques apply to the use of technology in policing, although it is the obverse, not much academic work but plenty of sinking $$ into tech with little return in terms of measurable outcomes.)

One thing I have not touched on is access. Folks may be thinking “I can’t get access to that info!”. You actually probably can though – I don’t know a PD that would let you do a survey or interviews that also wouldn’t share much of this admin data.

Another thing I have not touched on is bias in admin data. That deserves a whole additional blog post. It is a fair critique in part (bias no doubt exists, it is quantifying how large and its impact on the analysis is the question). The majority of the work in these policing journals though is not using alternative measures to get around bias in admin data though, they are measuring totally different things (as I said goal switching to totally different outcomes).