Precision in measures and policy relevance

Too busy to post much recently – will hopefully slow down a bit soon and publish some more technical posts, but just a quick opinion post for this Sunday. Reading a blog post by Callie Burt the other day – I won’t comment on the substantive critique of the Harden book she is discussing (since I have not read it), but this quote struck me:

precise point estimates are generally not of major interest to social scientists. Nearly all of our measures, including our outcome measures, are noisy, (contain error), even biased. In general, what we want to know is whether more of something (education, parental support) is associated with more (or less) of something else (income, education) that we care about, ideally with some theoretical orientation. Frequently the scale used to measure social influences is somewhat arbitrary anyway, such that the precise point estimate (e.g., weeks of schooling) associated with 1 point increase in the ‘social support scale’ is inherently vague.

I think Callie is right, precise point estimates often aren’t of much interest in general criminology. I think this perspective is quite bad though for our field as a whole in terms of scientific advancement. Most criminology work is imprecise (for various reasons), and because of this it has no hope to be policy relevant.

Lets go with Callie’s point about education is associated with income. Imagine we have a policy proposal that increases high school completion rates via allocating more money to public schools (the increased education), and we want to see its improvement on later life outcomes (like income). Whether a social program “is worth it” depends not only whether it is effective in increasing high school completion rates, but by how much and how much return on investment there is those later life outcomes we care about. Programs ultimately have costs; both in terms of direct costs as well as opportunity costs to fund some other intervention.

Here is another more crim example – I imagine most folks by now know that bootcamps are an ineffective alternative to incarceration for the usual recidivism outcomes (MacKenzie et al., 1995). But what folks may not realize is that bootcamps are often cheaper than prison (Kurlychek et al., 2011). So even if they do not reduce recidivism, they may still be worth it in a cost-benefit analysis. And I think that should be evaluated when you do meta-analyses of CJ programs.

Part of why I think economics is eating all of the social sciences lunch is not just because of the credibility revolution, but also because they do a better job of valuating costs and benefits for a wide variety of social programs. These cost estimates are often quite fuzzy, same as more general theoretical constructs Callie is talking about. But we often can place reasonable bounds to know if something is effective enough to be worth more investment.

There are a smattering of crim papers that break this mold though (and to be clear you can often make these same too fuzzy to be worthwhile critiques for many of my papers). For several examples in the policing realm Laura Huey and her Canadian crew have papers doing a deep dive into investigation time spent on cases (Mark et al., 2019). Another is Lisa Tompson and company have a detailed program evaluation of a stalking intervention (Tompson et al., 2021). And for a few papers that I think are very important are Priscilla Hunt’s work on general CJ costs for police and courts given a particular UCR crime (Hunt et al., 2017; 2019).

Those four papers are definitely not the norm in our field, but personally think are much more policy relevant than the vast majority of criminological research – properly estimating the costs is ultimately needed to justify any positive intervention.


  • Hunt, P., Anderson, J., & Saunders, J. (2017). The price of justice: New national and state-level estimates of the judicial and legal costs of crime to taxpayers. American Journal of Criminal Justice, 42(2), 231-254.
  • Hunt, P. E., Saunders, J., & Kilmer, B. (2019). Estimates of law enforcement costs by crime type for benefit-cost analyses. Journal of Benefit-Cost Analysis, 10(1), 95-123.
  • Kurlychek, M. C., Wheeler, A. P., Tinik, L. A., & Kempinen, C. A. (2011). How long after? A natural experiment assessing the impact of the length of aftercare service delivery on recidivism. Crime & Delinquency, 57(5), 778-800.
  • MacKenzie, D. L., Brame, R., McDowall, D., & Souryal, C. (1995). Boot camp prisons and recidivism in eight states. Criminology, 33(3), 327-358.
  • Tompson, L., Belur, J., & Jerath, K. (2021). A victim-centred cost–benefit analysis of a stalking prevention programme. Crime Science, 10(1), 1-11.
  • Mark, A., Whitford, A., & Huey, L. (2019). What does robbery really cost? An exploratory study into calculating costs and ‘hidden costs’ of policing opioid-related robbery offences. International Journal of Police Science & Management, 21(2), 116-129.

Musings on Project Organization, Books and Courses

Is there a type of procrastination via which people write lists of things? I have that condition.

I have been recently thinking about project organization. At work we have been using the Cookie Cutter Data Science project set up – and I really hate it. I have been thinking about this more recently, as I have taken over several other data scientists models at work. The Cookie Cutter Template is waaaay too complicated, and mixes logic of building python packages (e.g., a LICENSE folder) with data science in production code (who makes their functions pip installable for a production pipeline?). Here is the Cookie Cutter directory structure (even slightly cut off):

Cookie cutter has way too many folders (data folder in source, and data folder itself), multiple nested folders (what is the difference between external data, interim, and raw data?, what is the difference between features and data in the src folder?) I can see cases for individual parts of these needed sometimes (e.g. an external data file defining lookups for ICD codes), but why start with 100 extra folders that you don’t need. I find this very difficult taking over other peoples projects in that I don’t know where there are things and where there are not (most of these folders are empty).

So I’ve reorganized some of my projects at work, and they now look like this:

├──           <- High level overview of project + any special notes
├── requirements.txt    <- Default python libraries we often use (eg sklearn, sqlalchemy)
├                         + special instructions for conda environments in our VMs
├── .gitignore          <- ignore `models/*.pkl`, `*.csv`, etc.
├── /models             <- place to store trained and serialized models
├── /notebooks          <- I don't even use notebooks very often, more like a scratch/EDA folder
├── /reports            <- Powerpoint reports to business (using HMS template)
├── /src                <- Place to store functions

And then depending on the project, we either use secret environment variables, or have a YAML file that has database connection strings etc. (And that YAML is specified in .gitignore.)

And then over time in the root folder it will typically have shell scripts call whatever production pipeline or API we are building. All the function files in source is fine, although it can grow to more modules if you really want it to.

And this got me thinking about how to teach this program management stuff to new data scientists we are hiring, and if I was still a professor how I would structure a course to teach this type of stuff in a social science program.


So in my procrastination I made a generic syllabi for what this software developement course would look like, Software & Project Development For Social Scientists. It would have a class/week on using the command prompt, then a week on github, then a few weeks building a python library, then ditto for an R package. And along the way sprinkle in literate programming (notebooks and markdown and Latex), unit testing, and docker.

And here we could discuss how projects are organized. And social science students get exposed to way more stuff that is relevant in a typical data science role. I have over the years also dreamt up other data science related courses as well.

Stats Programming for CJ. This goes through the basics of data manipulation using statistical programming. I would likely have tutorials for R, python, SPSS, and Stata for this. My experience with students is that even if they have had multiple stats classes in grad school, if you ask them “take this incident dataset with dates, and prepare a weekly level file with counts of crimes per week” they don’t know how to do even that simple task (an aggregation). So students need an entry level data manipulation course.

Optimization for Criminal Justice (or alt title Operations Research and Machine Learning for CJ). This one is not as developed as some of my other courses, but I think I could make it work for a semester. I think learning linear programming is a really great skill not taught at all in any CJ program I am aware of. I have some small notes on machine learning in my Research Design class for PhD students, but that could be expanded out (week for decision trees/forests, week for boosting, week for neural networks, etc.).

And last, I have made syllabi for the one credit entry level course for undergrad students, and the equivalent course for the new PhD students, College Prep. These classes I had I don’t think did a very good job. My intro one at Bloomsburg for undergrad had a textbook lol! The only thing I remember about my PhD one was fear mongering over publications (which at that point I had no idea what was going on), and spending the last class with Julie Horney and David McDowell at whatever the place next to the Washington Tavern in Albany was called (?Gingerbread?).

These are of course just in my head at the moment. I have posted my course materials over the years that I have delivered.

I have pitched to a few programs to hire me as a semi teaching professor (and still keep my private sector gig). This set up is not that uncommon in comp sci departments, but no CJ ones I think are interested. Even though I like musing about courses, adjunct pay is way too low to justify this investment, and should be paid to both develop the material as well as deliver the class.


I have similarly made outlines for books over the years as well. One is Data Science for Crime Analysis with Python. I think there is an opening in the crime analysis market to advance to more professional coding, and so a python book would be good. But the market is overall tiny, my high end guesstimates are only around 800, so hard to justify the effort. (It would be mainly just a collection of my blog posts, but all in a nicer format for everyone to walk through/replicate.)

Another is a reader book, Handbook of Advanced Crime Analysis. That may not be needed though, as Cory Haberman and Liz Groff did a recent book that has quite a bit of overlap (can’t find it at the moment, maybe it is not out yet). Many current advanced techniques are scattered and sometimes difficult to replicate, I figured a reader that also includes code walkthroughs would help quite a few PhD students.

And again if I was still in the publishing game I would like to turn my Poisson course notes into a little Sage green book.

If I was still a professor, this would go hand in hand with developing courses. I know Uni’s do sometimes have grants to develop open source teaching materials, and these would probably best fit those molds. These aren’t going to generate revenue directly from sales.

So complaints and snippets on blog posts are all you are going to get for now from me.

Spatial consistency of shootings and NIJ recid working papers

I have two recent working papers out:

The NIJ forecasting paper is the required submission by NIJ. Gio and I will likely try to turn this into a real paper in the near future. I’d note George Mohler and Michael Porter did the same thing as us, clip the probabilities to under 0.5 to win the fairness competition.

NIJ was interested in “what variables are the most important” – I will need to slate a longer blog post about this in the future, but this generally is not the right way to frame predictive challenges. You do not need real in depth understanding of the underlying system, and many times different effects can be swapped out for one another (e.g. Dawes, 1979).

The paper on shootings in Buffalo is consistent with my blog posts on shootings in NYC (precincts, grid cells). Even though shootings have gone up by quite a bit in Buffalo overall, the spatial distribution is very consistent over time. Appears similar to a recent paper by Jeff Brantingham and company as well.

It is a good use case for the differences in SPPT results when adjusting for multiple comparisons, we get a S index of 0.88 without adjustments, (see below distribution of p-values). These are consistent with random data though, so when doing a false discovery rate correction we have 0 areas below 0.05.

If you look at the maps there are some fuzzy evidence of shifts, but it is quite weak overall. Also one thing I mention here is that even though we have hot spots of shootings, even the hottest grid cells only have 1 shooting a month. Not clear to me if that is sufficient density (if only considering shootings) to really justify a hot spots approach.


  • Brantingham, P. J., Carter, J., MacDonald, J., Melde, C., & Mohler, G. (2021). Is the recent surge in violence in American cities due to contagion?. Journal of Criminal Justice, 76, 101848.
  • Circo, G., & Wheeler, A. (2021). National Institute of Justice Recidivism Forecasting Challenge Team “MCHawks” Performance Analysis. CrimRxiv.
  • Dawes, R. M. (1979). The robust beauty of improper linear models in decision making. American Psychologist, 34(7), 571.
  • Drake, G., Wheeler, A., Kim, D.-Y., Phillips, S. W., & Mendolera, K. (2021). The Impact of COVID-19 on the Spatial Distribution of Shooting Violence in Buffalo, NY. CrimRxiv.
  • Mohler, G., & Porter, M. D. (2021). A note on the multiplicative fairness score in the NIJ recidivism forecasting challenge. Crime Science, 10(1), 1-5.

Incoherence in policy preferences for gun violence reduction

One of the most well vetted criminal justice interventions at this point we have is hot spots policing. We have over 50 randomized control trials at this point, showing modest overall crime reductions on average (Braga & Weisburd, 2020). This of course is not perfect, I think Emily Owen sums it up the best in a recent poll of various academics on the issue of gun violence:

So when people argue that hot spots policing doesn’t show long term benefits, all I can do is agree. If in a world where we are choosing between doing hot spots vs doing nothing, I think it is wrong to choose the ultra risk adverse position of do nothing because you don’t think on average short term crime reductions of 10% in hot spots are worth it. But I cannot say it is a guaranteed outcome and it probably won’t magically reduce crime forever in that hot spot. Mea culpa.

The issue is most people making these risk adverse arguments against hot spots, whether academics or pundits or whoever, are not actually risk adverse or ultra conservative in accepting scientific evidence of the efficacy of criminal justice policies. This is shown when individuals pile on critiques of hot spots policing – which as I noted the critiques are often legitimate in and of themselves – but then take the position that ‘policy X is better than hotspots’. As I said hot spots basically is the most well vetted CJ intervention we have – you are in a tough pickle to explain why you think any other policy is likely to be a better investment. It can be made no doubt, but I haven’t seen a real principled cost benefit analysis to prefer another strategy over it to prevent crime.

One recent example of this is on the GritsForBreakfast blog, where Grits advocates for allocating more resources for detectives to prevent violence. This is an example of an incoherent internal position. I am aware of potential ways in which clearing more cases may reduce crimes, even published some myself on that subject (Wheeler et al., 2021). The evidence behind that link is much more shaky however overall (see Mohler et al. 2021 for a conflicting finding), and even Grits himself is very skeptical of general deterrence. So sure you can pile on critiques of hot spots, but putting the blinders on for your preferred policy just means you are an advocate, not following actual evidence.

To be clear, I am not saying more detective resources is a bad thing, nor do I think we should go out and hire a bunch more police to do hot spots (I am mostly advocating for doing more with the same resources). I will sum up my positions at the end of the post, but I am mostly sympathetic in reference to folks advocating for more oversight for police budgets, as well as that alternative to policing interventions should get their due as well. But in a not unrealistic zero sum scenario of ‘I can either allocate this position for a patrol officer vs a detective’ I am very skeptical Grits is actually objectively viewing the evidence to come to a principled conclusion for his recommendation, as opposed to ex ante justifying his pre-held opinion.

Unfortunately similarly incoherent positions are not all that uncommon, even among academics.

The CJ Expert Panel Opinions on Gun Violence

As I linked above, there was a recent survey of various academics on potential gun violence reduction strategies. I think these are no doubt good things, albeit not perfect, similar to but are many more opinions on overall evidence bases but are more superficial.

This survey asked about three general strategies, and asked panelists to give Likert responses (strongly agree,agree,neutral,disagree,strongly disagree), as well as a 1-10 for how confident they were, whether those strategies if implemented would reduce gun violence. The three strategies were:

  • investing in police-led targeted enforcement directed at places and persons at high risk for gun crime (e.g.,“hot spot” policing; gang enforcement)
  • investing in police-led focused deterrence programs (clearly communicating “carrots and sticks” to local residents identified as high risk, followed by targeted surveillance and enforcement with some community-based support for those who desist from crime)
  • investing in purely community-led violence-interruption programs (community-based outreach workers try to mediate and prevent conflict, without police involvement)

The question explicitly stated you should take into account implementation in real life as well. Again people can as individuals have very pessimistic outlooks on any of these programs. It is however very difficult for me to understand a position where you ‘disagree’ with focused deterrence (FD) in the above answer and also ‘agree’ with violence interrupters (VI).

FD has a meta analysis of 20 some studies at this point (Braga et al., 2018), all are quasi-experimental (e.g. differences in differences comparing gang shootings vs non gang shootings, as well as some matched comparisons). So if you want to say – I think it is bunk because there are no good randomized control trials, I cannot argue with this. However there are much fewer studies for VI, Butts et al. (2015) have 5 (I imagine there are some more since then), and they are all quasi-experimental as well. So in this poll of 39 academics, how many agree with VI and disagree with FD?

We end up having 3. I show in that screen shot as well the crosstabulation with the hot spots (HS) question as well. It ends up being the same three people disagreed on HS/FD and agreed on VI:

I will come back to Makowski and Apel’s justification for their opinion in a bit. There is a free text field (although not everyone filled in, we have no responses from Harris here), and while I think this is pretty good evidence of having shifting evidentiary standards for their justification, the questions are quite fuzzy and people can of course weight their preferences differently. The venture capitalist approach would say we don’t have much evidence for VI, so maybe it is really good!

So again as a first blush, I checked to see how many people had opinions that I consider here coherent. You can say they all are bad, or you can agree with all the statements, but generally the opinions should be hs >= fd >= vi if one is going by the accumulated evidence in an unbiased manner. I checked how many coherent opinions there are in this survey according to this measure and it is the majority, 29/39 (those at the top of the list are more hawkish, saying strongly agree and agree more often):

Here are those I considered incoherent according to this measure:

Looking at the free text field for why people justified particular positions in this table, with the exception of Makowski and Apel, I actually don’t think they have all that unprincipled opinions (although how they mapped their responses to agree/disagree I don’t think is internally consistent). For example, Paolo Pinotti disagrees with lumping in hot spots with people based strategies:

Fair enough and I agree! People based strategies are much more tenuous. Chalfin et al. (2021) have a recent example of gang interdiction, but as far as I’m aware much of the lit on that (say coordinated RICO), is a pretty mixed bad. Pinotti then gives agree to FD and neutral to VI (with no text for either). Another person in this list is Priscilla Hunt, who mentions the heterogeneity of hot spots interventions:

I think this is pretty pessimistic, since the Braga meta analyses often break down by different intervention types and they mostly coalesce around the same effect estimates (about a 10% reduction in hot spots compared to control, albeit with a wide variance). But the question did ask about implementation. Fair enough, hot spots is more fuzzy a category than FD or VI.

Jennifer Doleac is an example where I don’t think they are mapping opinions consistently to what they say, although what they say is reasonable. Here is Doleac being skeptical for FD:

I think Doleac actually means this RCT by Hamilton et al. (2018) – arrests are not the right outcome though (more arrests probably mean the FD strategy is not working actually), so personally I take this study as non-informative as to whether FD reduces gun violence (although there is no issue to see if it has other spillovers on arrests). But Doleac’s opinion is still reasonable in that we have no RCT evidence. Here is Doleac also being skeptical of VI, but giving a neutral Likert response:

She mentions negative externalities for both (which is of course something people should be wary of when implementing these strategies). So for me to say this is incoherent is really sweating the small stuff – I think incorporating the text statement with these opinions are fine, although I believe a more internally consistent response would be neutral for both or disagree for both.

Jillian Carr gives examples of the variance of hot spots:

This is similar to Priscilla’s point, but I think that is partially an error. When you collect more rigorous studies over time, the effect sizes will often shrink (due to selection effects in the scholarly literature process that early successes are likely to have larger errors, Gelman et al. 2020). And you will have more variance as well and some studies with null effects. This is a good thing – no social science intervention is so full proof to always be 100% success (the lower bound is below 0 for any of these interventions). Offhand the variance of the FD meta analysis is smaller overall than hot spots, so Carr’s opinion of agree on FD can still be coherent, but for VI it is not:

If we are simply tallying when things do not work, we can find examples of that for VI (and FD) as well. So it is unclear why it is OK for FD/VI but not for HS to show some studies that don’t work.

There is an actual strategy I mentioned earlier where you might actually play the variance to suggest particular policies – we know hot spots (and now FD) have modest crime reducing effects on average. So you may say ‘I think we should do VI, because it may have a higher upside, we don’t know’. But that strikes me as a very generous interpretation of Carr’s comments here (which to be fair are only limited to only a few sentences). I think if you say ‘the variance of hot spots is high’ as a critique, you can’t hang your hat on VI and still be internally coherent. You are just swapping out a known variance for an unknown one.

Makowski and Apels Incoherence?

I have saved for last Michael Makowski and Robert Apel’s responses. I will start out by saying I don’t know all of the people in this sample, but the ones I do know are very intelligent people. You should generally listen to what they say, although I think they show some bias here in these responses. We all have biases, and I am sure you can trawl up examples of my opinions over time that are incoherent as well.

I do not know Michael Makowski, so I don’t mean to pick on him in particular here. I am sure you should listen to him over me for many opinions on many different topics. For example agree with his proposal to sever seized assets with police budgets. But just focusing on what he does say here (which good for him to actually say why he chose his opinions, he did not have to), for his opinion on hot spots:

So Makowski thinks policing is understaffed, but hot spots is a no go. OK, I am not sure what he expects those additional officers to do – answer calls for service and drive around randomly? I’d note hot spots can simultaneously be coordinated with the community directly – I know of no better examples of community policing than foot patrols (e.g. Haberman & Stiver, 2019 for an example). But the question was not that specific about that particular hot spot strategy, so that is not a critique of Makowski’s position.

We have so many meta analyses of hot spots now, that we also have meta analyses of displacement (Bowers et al., 2011), and the Braga meta analyses of direct effects have all included supplemental analyses of displacement as well. Good news! We actually often find evidence of diffusion of benefits in quite a few studies. Banking on secondary effects that are larger/nullify direct effects is a strange position to take, but I have seen others take it as well. The Grits blog I linked to earlier mentions that these studies only measure displacement in the immediate area. Tis true, these studies do not measure displacement in surrounding suburbs, nor displacement to the North Pole. Guess we will never know if hot spots reduce crime worldwide. Note however this applies to literally any intervention!

For Makowski’s similarly pessimistic take on FD:

So at least Makowski is laying his cards on the table – the question did ask about implementation, and here he is saying he doesn’t think police have the capability to implement FD. If you go in assuming police are incompetent than yeah no matter what intervention the police might do you would disagree they can reduce violence. This is true for any social policy. But Makowski thinks other orgs (not the police) are good to go – OK.

Again have a meta analysis showing that quite a few agencies can implement FD competently and subsequently reduce gun violence, which are no doubt a self selected set of agencies that are more competent compared to the average police department. I can’t disagree with if you interpret the question as you draw a random police department out of a hat, can they competently implement FD (most of these will be agencies with only a handful of officers in rural places who don’t have large gun violence problems). The confidence score is low from Makowski here though (4/10), so at least I think those two opinions are wrong but are for the most part are internally consistent with each other.

I’d note also as well, that although the question explicitly states FD is surveillance, I think that is a bit of a broad brush. FD is explicitly against this in some respects – Kennedy talks about in the meetings to tell group members the police don’t give a shit about minor infractions – they only care if a body drops. It is less surveillancy than things like CCTV or targeted gang takedowns for example (or maybe even HS). But it is right in the question, so a bit unfair to criticize someone for focusing on that.

Like I said if someone wants to be uber critical across the board you can’t really argue with that. My problem comes with Makowski’s opinion of VI:

VI is quite explicitly diverged from policing – it is a core part of the model. So when interrupters talk with current gang members, they can be assured the interrupters will not narc on them to police. The interrupters don’t work with the police at all. So all the stuff about complementary policing and procedural justice is just totally non-sequitur (and seems strange to say hot spots no, but boots on the ground are good).

So while Makowski is skeptical of HS/FD, he thinks some mechanism he just made up in his own mind (VI improving procedural justice for police) with no empirical evidence will reduce gun violence. This is the incoherent part. For those wondering, while I can think procedural justice is a good thing, thinking it will reduce crime has no empirical support (Nagin & Telep, 2020).

I’d note that while Makowski thinks police can’t competently implement FD, he makes no such qualms about other agencies implementing VI. I hate to be the bearer of bad news for folks, but VI programs quite often have issues as well. Baltimore’s program over the years have had well known cases of people selling drugs and still quite active in violence themselves. But I guess people are solely concerned about negative externalities from policing and just turn a blind eye to other non policing interventions.

Alright, so now onto Bob Apel. For a bit off topic – one of the books that got me interested in research/grad school was Levitt and Dubners Freakonomics. I had Robert Apel for research design class at SUNY Albany, and Bob’s class really formalized counterfactual logic that I encountered in that book for me. It was really what I would consider a transformative experience from student to researcher for me. That said, it is really hard for me to see a reasonable defense of Bob’s opinions here. We have a similar story we have seen before in the respondents for hot spots, there is high variance:

The specific to gun violence is potentially a red herring. The Braga meta analyses do breakdowns of effects on property vs violent crime, with violent typically having smaller but quite similar overall effect sizes (that includes more than just gun violence though). We do have studies specific to gun violence, Sherman et al. (1995) is actually one of the studies with the highest effects sizes in those meta analyses, but is of course one study. I disagree that the studies need to be specific to gun violence to be applicable, hot spots are likely to have effects on multiple crimes. But I think if you only count reduced shootings (and not violent crime as a whole), hot spots are tough, as even places with high numbers of shootings they are typically too small of N to justify a hot spot at a particular location. So again all by itself, I can see a reasonably skeptical person having this position, and Bob did give a low confidence score of 3.

And here we go for Bob’s opinion of FD:

Again, reasonably skeptical. I can buy that. Saying we need more evidence seems to me to be conflicting advice (maybe Bob saying it is worth trying to see if it works, just he disagrees it will work). The question does ask if violence will be reduced, not if it is worth trying. I think a neutral response would have been more consistent with what Bob said in the text field. But again if people want to be uber pessimistic I cannot argue so much against that in particular, and Bob also had a low confidence.

Again though we get to the opinion of VI:

And we see Bob does think VI will reduce violence, but not due to direct effects, but indirect effects of positive spillovers. Similar to Makowski these are mechanisms not empirically validated in any way – just made up. So we get critiques of sample selection for HS, and SUTVA for FD, but Bob agrees VI will reduce violence via agencies collecting rents from administering the program. Okey Dokey!

For the part about the interrupters being employed as a potential positive externality – again you can point to examples where the interrupters are still engaged in criminal activity. So a reasonably skeptical person may think VI could actually be worse in terms of such spillovers. Presumably a well run program would hire people who are basically no risk to engage in violence themselves, so banking on employing a dozen interrupters to reduce gun violence is silly, but OK. (It is a different program to give cash transfers to high risk people themselves.)

I’d note in a few of the cities I have worked/am familiar with, the Catholic orgs that have administered VI are not locality specific. So rents they extract from administering the program are not per se even funneled back into the specific community. But sure, maybe they do some other program that reduces gun violence in some other place. Kind of a nightmare for someone who is actually concerned about SUTVA. This also seems to me to be logic stemmed from Patrick Sharkey’s work on non-profits (Sharkey et al., 2017). If Bob was being equally of critical of that work as HS/FD, it is non-experimental and just one study. But I guess it is OK to ignore study weaknesses for non police interventions.

For both Bob and Makowski here I could concoct some sort of cost benefit analysis to justify these positions. If you think harms from policing are infinite, then sure VI makes sense and the others don’t. A more charitable way to put it would be Makowski and Bob have shown lexicographic preferences for non policing solutions over policing ones, no matter what the empirical evidence for those strategies. So be it – it isn’t opinions based on scientific evidence though, they are just word souping to justify their pre held positions on the topic.

What do I think?

God bless you if you are still reading this rant 4k words in. But I cannot end by just bagging on other peoples opinions without giving my own can I? If I were to answer this survey as is, I guess I would do HS/agree (confidence 6), FD/agree (confidence 5), VI/agree (confidence 3). Now if you changed the question to ‘you get even odds, how much money would you put on reduced violence if a random city with recent gun violence increases implemented this strategy’, I would put down $0.00 (the variance people talked about is real!) So maybe a more internally consistent position would be neutral across the board for these questions with a confidence of 0. I don’t know.

This isn’t the same as saying should a city invest in some of these policies. If you properly valuate all the issues with gun violence, I think each of these strategies are worth the attempt – none of them are guaranteed to work though (any big social problem is hard to fix)! In terms of hot spots and FD, I actually think these have a strong enough evidence base at this point to justify perpetual internal positions at PDs devoted to these functions. The same as police have special investigation units focused on drugs they could have officers devoted to implementing FD. Ditto for community police officers could be specifically devoted to COP/POP at hot spots of crime.

I also agree with the linked above editorial on VI – even given the problems with Safe Streets in Baltimore, it is still worth it to make the program better, not just toss it out.

Subsequently if the question were changed to, I am a mayor and have 500k burning a hole in my pocket, which one of these programs do I fund? Again I would highly encourage PDs to work with what they have already to implement HS, e.g. many predictive policing/hot spots interventions are nudge style just spend some extra time in this spot (e.g. Carter et al., 2021), and I already gave the example of how PDs invest already in different roles that would likely be better shifted to empirically vetted strategies. And FD is mostly labor costs as well (Burgdorf & Kilmer, 2015). So unlike what Makowski implies, these are not rocket science and necessitate no large capital investments – it is within the capabilities of police to competently execute these programs. So I think a totally reasonable response from that mayor is to tell the police to go suck on a lemon (you should do these things already), and fund VI. I think the question of right sizing police budgets and how police internally dole out responsibilities can be reasoned about separately.

Gosh some of my academic colleagues must wonder how I sleep at night, suggesting some policing can be effective and simultaneously think it is worth funding non police programs.

I have no particular opinion about who should run VI. VI is also quite cheap – I suspect admin/fringe costs are higher than the salaries for the interrupters. It is a dangerous thing we are asking these interrupters to do for not much money. Apel above presumes it should be a non-profit community org overseeing the interrupters – I see no issue if someone wanted to leverage current govt agencies to administer this (say the county dept of social services or public health). I actually think they should be proactive – Buffalo PD had a program where they did house visits to folks at high risk after a shooting. VI could do the same and be proactive and target those with the highest potential spillovers.

One of the things I am pretty frustrated with folks who are hyper critical of HS and FD is the potential for negative externalities. The NAS report on proactive policing lays out quite a few potential mechanisms via which negative externalities can occur (National Academies of Sciences, Engineering, and Medicine, 2018). It is evidence light however, and many studies which explicitly look for these negative externalities in conjunction with HS do not find them (Brantingham et al., 2018; Carter et al., 2021; Ratcliffe et al., 2015). I have published about how to weigh HS with relative contact with the CJ system (Wheeler, 2020). The folks in that big city now call it precision policing, and this is likely to greatly reduce absolute contact with the CJ system as well (Manski & Nagin, 2017).

People saying no hot spots because maybe bad things are intentionally conflating different types of policing interventions. Former widespread stop, question and frisk policies do not forever villify any type of proactive policing strategy. To reasonably justify any program you need to make assumptions that the program will be faithfully implemented. Hot spots won’t work if a PD just draws blobs on the map and does no coordinated strategy with that information. The same as VI won’t work if there is no oversight of interrupters.

For sure if you want to make the worst assumptions about police and the best assumptions about everyone else, you can say disagree with HS and agree with VI. Probably some of the opinions on that survey do the same in reverse – as I mention here I think the evidence for VI is plenty good enough to continue to invest and implement such programs. And all of these programs should monitor outcomes – both good and bad – at the onset. That is within the capability of crime analysis units and local govt to do this (Morgan et al., 2017).

I debated on closing the comments for this post. I will leave them open, but if any of the folks I critique here wish to respond I would prefer a more long formed response and I will publish it on my blog and/or link to your response. I don’t think the shorter comments are very productive, as you can see with my back and forth with Grits earlier produced no resolution.


Similarities between crime and health insurance data

One of the things I was mildly worried about when making the jump to the private sector was that the knowledge I had built up from my work in crime analysis over the years would not be transferable. I had basically 10+ year experience working with crime data (directly as a crime analyst at Troy, or when I was a research analyst at the Finn Institute, or when I was doing other collaborations with PDs).

PDs all basically have a similar records management set up. Typical tables are CAD, incident reports, arrests, charges, etc. PDs will have somewhat different fields – but the way they all related to each other are very similar.

Because the company I work for now aggregates health insurance claims from multiple insurance agencies it is a bit more complicated, but there are similarities between how people analyze health insurance claims that in broad strokes are similar to issues with crime data. Below are my musings on that front.

Classifying Events: UCR vs DRG

Historically the predominate way in which people classify what type of crime occurs in a particular incident is via the Uniform Crime Report (UCR) hierarchy. Imagine a crime incident in which someone breaks into a house (burglary), and then also assaults the individual within the home (aggravated assault). When we count these crimes for reporting purposes, we typically take ‘the top charge’, and analyze the event strictly as an assault.

Inpatient health insurance claims (when someone goes to a hospital) have a somewhat unifying classification, Diagnostic Related Groupings, DRG for short. Unlike UCR for general crime reporting though, these are used to bill insurance claims. The idea being that instead of itemizing your hospital bill, insurance companies broadly compensate according to the DRG. This purportedly discourages tacking on extra medical procedures, although brings with it some other problems instead (see the later section in this post on discretion).

Unlike the UCR, DRGs have quite a few more categories, check out the APR DRG weights for New York State for example. For the APR DRG, the DRG also includes a severity category. This I think would be a neat idea for crime incidents – it is somewhat codified in penal laws, but not so much in typical crime reporting. It is somewhat accomplished by folks creating harm weights for crimes (e.g. Ratcliffe, 2015). (There is also a second major DRG used by insurance agencies here in the states, the MS-DRG. That is not a good idea to take from medical records, having multiple common ways to group events!)

One major difference between crimes and health insurance claims are ICD codes. One insurance claim can have multiple ICD codes. For example a claim with an APR DRG of 161 could have ICD codes for:

  • I214: Heart Attack
  • E119: Diabetes
  • I2510: Heart Disease
  • E785: High Cholesterol

So there are a mix of chronic conditions (that for billing purposes can modify the severity of the claim), but are not directly related to the current claim/incident/hospital stay.

This could be a neat idea for crime records – say a domestic incident happens, and there is a field to record prior history of domestic incidents. I can see how that would be useful both in the immediate term for the officer handling the call, as well as for an analyst crunching numbers/trends. That being said, ICD codes are crazy in their specificity, so that is not a good thing.

You could also maybe do some other crunching to create your own crime categories based on the individual crime types, see for example Kuang et al. (2017). This is sort of like creating your own DRG for crimes.

Aggregate vs Individual

The point of creating high level groupings is to aggregate multiple events together. In policing, UCR statistics are commonly used to evaluate crime trends over time. Health insurance claims are typically not used for monitoring disease outcomes – since there isn’t any standardized location where they are all collated it would be pretty difficult to use them in that manner for the general pop.

But, overall aggregate statistics pooling claims from particular healthcare providers (e.g. hospitals) are sometimes used for different reimbursement policies. For examples, MIPS is intended as a metric for healthcare providers to promote value based care (Liao & Navathe, 2021), or the CaseMix system (Steinbusch et al., 2007). If you checked out the prior APR DRG list I linked to, you can see they had weights, and higher weights have higher standard billing. The idea behind CaseMix is that if a provider takes on many high weight cases, they get a modifier that ups the weights/billing by a certain percent.

You could maybe consider MIPS to be similar to agencies that give PDs scorecards, aggregating many different metrics together. I rather look at individual metrics though, such as this funnel chart example I give for monitoring use of force. I don’t see much point in aggregating different metrics all together into one final score.

Currently in policing many agencies are migrating from the UCR system, which is just an aggregate tally of events, to NIBRS, which is a database that reports individual events (Kaplan, 2021a, 2021b).


Police departments and health care providers (the ones creating the incidents/claims) both have discretion. For PDs, they often want to downgrade the severity of crime incidents, see Thomas and Wolff (2021) for example. Health providers have incentives going the other way though, they have incentives to upcode claims to increase insurance payouts (Farbmacher et al., 2020). Some claims are more fuzzy than others, for example CPT codes that determine a doctors time on a particular office visit are one good example – doctors can just claim they spent larger amounts of time on the office visit (Brunt, 2011).

Like I said previously, health insurance claims are not typically used to monitor overall health outcomes, so non-reporting is not something people really worry about (although researchers should be cognizant of non reporting if they are using insurance claims to look at say policy analysis). The dark figure of crime though is a perpetual threat to the validity of interpreting crime trends.

Health insurance claims have a somewhat opposite problem – submitting claims for when events actually did not happen. One example this occurs is ambulance ghost rides, ambulance billing for events that appear to not have occurred at all (Sanghavi et al., 2021).

Similar to crime events, these reporting/claim errors can either be the result of unintentional accidents, or they can be malicious. Often times, even in retrospect if you know something was in error, it can be difficult to impossible to tell the difference between the two scenarios.

The big difference is $$

The scale of healthcare insurance in the US is massive. Because of this, there is a market to audit these health insurance claims. For example, Georgia is likely to recover nearly half a billion in medical overpayments for the past year. Some of the work I am doing at HMS is related to using machine learning to identify these overpaid Medicare claims. My work is spread across multiple states, but I have easily identified over 8 digits of medical overpayments based on that work in the past year.

There is nothing equivalent to this for policing. There is no monetary incentive for individuals to audit how crime complaints are handled/recorded/resolved.

I wonder if there were a market how much criminal justice would look differently in the United States? For example, say if you had victimization insurance, and detectives worked for the insurance agencies instead of the public sector. This could maybe improve clearance rates, but of course would place more economic burdens on individuals to be insured. That is pure speculation though.


How to interpret one sided tests for coefficient differences?

In my ask me anything series, Rob Case writes in a question about interpreting one-sided tests for the difference in coefficients:

Mr. Wheeler,

Thank you for your page

I did your technique (at the end of the page) of re-running the model with X+Z and X-Z as independent variables (with coefficients B1 and B2, respectively).

I understand:

  1. (although you did not say so) that testing whether coefficient b1 (X’s coefficient in the original equation) is LESS THAN coefficient b2 (Z’s coefficient in the original regression) is a one-sided test; and testing whether one coefficient is DIFFERENT from another is a two-sided test
  2. that the 90%-confidence t-distribution-critical-values-with-infinite-degrees-of-freedom are 1.282 for one-sided tests and 1.645 for two-sided tests
  3. that if the resulting t-stat for the B2 coefficient is say 1.5, then—according to the tests—I should therefore be 90% confident that b1 is in fact less than b2; and I should NOT be 90% confident that b1 is different from b2.

But—according to MY understanding of logic and statistics—if I am 90% confident that b1 is LESS THAN b2, then I would be MORE THAN 90% confident that b1 DIFFERS from b2 (because “differs” includes the additional chance that b1 is greater than b2), i.e. the tests and my logic conflict. What am I doing wrong?


So I realize null hypothesis statistical testing (NHST) can be tricky to interpret – but the statement in 3 is not consistent with how we do NHST for several reasons.

So if we have a null hypothesis that Beta1 = Beta2, for reasons to do with the central limit theorem we actually rewrite this to be:

Null: Beta1 - Beta2 = 0 => Theta0

I’ve noted this new parameter we are testing – the difference in the two coefficients – as Theta0. For NHST we assume this parameter is 0, and then test to see how close our data is to this parameter. So we estimate with our data:

b1 - b2 = Diff
DiffZ = Diff/StandardError_Diff

Now, to calculate a p-value, we need to say how unlikely our data estimate, DiffZ, is given the assumed null distribution Theta0. So imagine we draw our standard normal distribution curve about Theta0. This then defines the space for NHST, for a typical two sided test we have (here assuming DiffZ is a negative value):

P(Z < DiffZ | Theta0 ) + P(Z > -DiffZ | Theta0 ) = Two tailed p-value

Where less than Z is our partitioning of the space of the null hypothesis, since an exact value for DiffZ here when the distribution of potential outcomes is continuous is zero. For a one sided test, you would just take the relevant portion of the above, and not add the two above portions together:

P(Z < DiffZ | Theta0 ) = One tail p-value for Beta1 < Beta2
P(Z > -DiffZ | Theta0 ) = One tail p-value for Beta1 > Beta2

Note here that the test is conditional on the null hypothesis. Statements such as ‘I should therefore be 90% confident that b1 is in fact less than b2’, which seem to estimate the complement of the p-value (e.g. 1 – p-value) and interpret it as a meaningful probability are incorrect.

P-values are basically numerical summaries of how close the data are to the presumed null distribution. Small p-values just indicate they are not close to the assumed null distribution. The complement of the p-value is not evidence for the alternative hypothesis. It is just the left over distribution for the null hypothesis that is inside the Z values.

Statisticians oftentimes at this point in the conversation suggest Bayesian analysis and instead interpret posteriori probabilities instead of p-values. I will stop here though, as I am not sure “90% confident” readily translates into a specific Bayesian statement. (It could be people are better off doing inferiority/equivalence testing for example, e.g. changing the null hypothesis.)

CCTV and clearance rates paper published

My paper with Yeondae Jung, The effect of public surveillance cameras on crime clearance rates, has recently been published in the Journal of Experimental Criminology. Here is a link to the journal version to download the PDF if you have access, and here is a link to an open read access version.

The paper examines the increase in case clearances (almost always arrests in this sample) for incidents that occurred nearby 329 public CCTV cameras installed and monitored by the Dallas PD from 2014-2017. Quite a bit of the criminological research on CCTV cameras has examined crime reductions after CCTV installations, which the outcome of that is a consistent small decrease in crimes. Cameras are often argued to help solve cases though, e.g. catch the guy in the act. So we examined that in the Dallas data.

We did find evidence that CCTV increases case clearances on average, here is the graph showing the estimated clearances before the cameras were installed (based on the distance between the crime location and the camera), and the line after. You can see the bump up for the post period, around 2% in this graph and tapering off to an estimate of no differences before 1000 feet.

When we break this down by different crimes though, we find that the increase in clearances is mostly limited to theft cases. Also we estimate counterfactual how many extra clearances the cameras were likely to cause. So based on our model, we can say something like, a case would have an estimated probability of clearance without a camera of 10%, but with a camera of 12%. We can then do that counterfactual for many of the events around cameras, e.g.:

Probability No Camera   Probability Camera   Difference
    0.10                      0.12             + 0.02
    0.05                      0.06             + 0.01
    0.04                      0.10             + 0.06

And in this example for the three events, we calculate the cameras increased the total expected number of clearances to be 0.02 + 0.01 + 0.06 = 0.09. This marginal benefit changes for crimes mostly depends on the distance to the camera, but can also change based on when the crime was reported and some other covariates.

We do this exercise for all thefts nearby cameras post installation (over 15,000 in the Dallas data), and then get this estimate of the cumulative number of extra theft clearances we attribute to CCTV:

So even with 329 cameras and over a year post data, we only estimate cameras resulted in fewer than 300 additional theft clearances. So there is unlikely any reasonable cost-benefit analysis that would suggest cameras are worthwhile for their benefit in clearing additional cases in Dallas.

For those without access to journals, we have the pre-print posted here. The analysis was not edited any from pre-print to published, just some front end and discussion sections were lightly edited over the drafts. Not sure why, but this pre-print is likely my most downloaded paper (over 4k downloads at this point) – even in the good journals when I publish a paper I typically do not get 1000 downloads.

To go on, complaint number 5631 about peer review – this took quite a while to publish because it was rejected on R&R from Justice Quarterly, and with me and Yeondae both having outside of academia jobs it took us a while to do revisions and resubmit. I am not sure the overall prevalence of rejects on R&R’s, I have quite a few of them though in my career (4 that I can remember). The dreaded send to new reviewers is pretty much guaranteed to result in a reject (pretty much asking to roll a Yahtzee to get it past so many people).

We then submitted to a lower journal, The American Journal of Criminal Justice, where we had reviewers who are not familiar with what counterfactuals are. (An irony of trying to go to a lower journal for an easier time, they tend to have much worse reviewers, so can sometimes be not easier at all.) I picked it up again a few months ago, and re-reading it thought it was too good to drop, and resubmitted to the Journal of Experimental Criminology, where the reviews were reasonable and quick, and Wesley Jennings made fast decisions as well.

Bias and Transparency

Erik Loomis over at the LGM blog writes:

It’s fascinating to be doing completely unfundable research in the modern university. It means you don’t matter to administration. At all. You are completely irrelevant. You add no value. This means almost all humanities people and a good number of social scientists, though by no means all. Because universities want those corporate dollars, you are encouraged to do whatever corporations want. Bring in that money. But why would we trust any research funded by corporate dollars? The profit motive makes the research inherently questionable. Like with the racism inherent in science and technology, all researchers bring their life experiences into their research. There is no “pure” research because there are no pure people. The questions we ask are influenced by our pasts and the world in which we grew up. The questions we ask are also influenced by the needs of the funder. And if the researcher goes ahead with findings that the funder doesn’t like, they are severely disciplined. That can be not winning the grants that keep you relevant at the university. Or if you actually work for the corporation, being fired.

And even when I was an unfunded researcher at university collaborating with police departments this mostly still applied. The part about the research being quashed was not an issue for me personally, but the types of questions asked are certainly influenced. A PD is unlikely to say ‘hey, lets examine some unintended consequences of my arrest policy’ – they are much more likely to say ‘hey, can you give me an argument to hire a few more guys?’. I do know of instances of others people work being limited from dissemination – the ones I am familiar with honestly it was stupid for the agencies to not let the researchers go ahead with the work, but I digress.

So we are all biased in some ways – we might as well admit it. What to do? One of my favorite passages in relation to our inherent bias is from Denis Wood’s introduction to his dissertation (see some more backstory via John Krygier). But here are some snippets from Wood’s introduction:

There is much rodomontade in the social sciences about being objective. Such talk is especially pretentious from the mouths of those whose minds have never been sullied by even the merest passing consideration of what it is that objectivity is supposed to be. There are those who believe it to consist in using the third person, in leaning heavily on the passive voice, in referring to people by numbers or letters, in reserving one’s opinion, in avoiding evaluative adjectives or adverbs, ad nauseum. These of course are so many red herrings.

So we cannot be objective, no point denying it. But a few paragraphs later from Wood:

Yet this is no opportunity for erecting the scientific tombstone. Not quite yet. There is a pragmatic, possible, human out: Bare yourself.

Admit your attitudes, beliefs, politics, morals, opinions, enthusiasms, loves, odiums, ethics, religion, class, nationality, parentage, income, address, friends, lovers, philosophies, language, education. Unburden yourself of your secrets. Admit your sins. Let the reader decide if he would buy a used car from you, much less believe your science. Of course, since you will never become completely self-aware, no more in the subjective case than in the objective, you cannot tell your reader all. He doesn’t need it all. He needs enough. He will know.

This dissertation makes no pretense at being objective, whatever that ever was. I tell you as much as I can. I tell you as many of my beliefs as you could want to know. This is my Introduction. I tell you about this project in value-loaded terms. You will not need to ferret these out. They will hit you over the head and sock you in the stomach. Such terms, such opinions run throughout the dissertation. Then I tell you the story of this project, sort of as if you were in my – and not somebody else’s – mind. This is Part II of the dissertation. You may believe me if you wish. You may doubt every word. But I’m not conning you. Aside from the value-loaded vocabulary – when I think I’ve done something wonderful, or stupid, I don’t mind giving myself a pat on the back, or a kick in the pants. Parts I and II are what sloppy users of the English language might call “objective.” I don’t know about that. They’re conscientious, honest, rigorous, fair, ethical, responsible – to the extent, of course, that I am these things, no farther.

I think I’m pretty terrific. I tell you so. But you’ll make up your mind about me anyway. But I’m not hiding from you in the the third person passive voice – as though my science materialized out of thin air and marvelous intentions. I did these things. You know me, I’m

Denis Wood

We will never be able to scrub ourselves clean to be entirely objective – a pure researcher as Loomis puts its. But we can be transparent about the work we do, and let readers decide for themselves whether the work we bring forth is sufficient to overcome those biases or not.

Academia and the culture of critiquing

Being out of academia for a bit now gives me some perspective on common behaviors I now know are not normal for other workplaces. Andrew Gelman and Jessica Hullman’s posts are what recently brought this topic to mind. Both what Jessica (and other behavior Andrew Gelman points out commonly on his blog) are near synonymous with my personal experience at multiple institutions. So even though we all span different areas in science it appears academic culture is quite similar across places and topical areas.

One in academia is senior academics shirking responsibility – deadwoods. This behavior I can readily attribute to rational behavior, so although I found it infuriating it was easily explainable. Hey, if you let me collect a paycheck into my 90’s I would likely be a deadwood at that point too! (Check out this Richard Larson post on why universities should encourage more professors to be semi-retired.)

Another behavior I had a harder time wrapping my head around was what I will refer to as the culture of critique. To the extent that we have a scientific method, a central component of that is to be critical of scientific results. If I read a news article that says X made crime go up/down, my immediate thought is ‘there needs to be more evidence to support that assertion’.

That level of skepticism is a necessary component of being an academic. We apply this skepticism not only to newspaper articles, but to each other as well. University professors don’t really have a supervisor like normal jobs, we each evaluate our peers research through various mechanisms (peer review journal articles, tenure review, reviewing grant proposals, critique public presentations, etc.).

This again is necessary for scientific advancement. We all make mistakes, and others should be able to rightly go and point out my mistakes and improve upon my work.

This bleeds out though in several ways that negatively impact academics ability to interact with one another. I don’t really have a well scoped out outline of these behaviors, but here are several examples I’ve noticed over time (in no particular order):

1) The person receiving critiques cannot distinguish between personal attacks and legitimate scientific ones. This has two parts, one is that even if you can distinguish between the two in your mind, they make you feel like shit either way. So it doesn’t really matter if someone gives a legitimate critique or someone makes ad hominem attacks – they each are draining on your self-esteem the same way.

The second part is people actually cannot tell the difference in some circumstances. In replication work on fish behavior pointing out potential data fabrication, some scientists response is that it is intentionally cruel to critique prior work. Original researchers often call people who do replications data thugs or shameless bullies, impugning the motives of those who do the critiques. For a criminology example check out Justin Pickett’s saga trying to get his own paper retracted.

To be fair to the receiver of critiques, in critiques it is not uncommon to have a mixture of legitimate and personal attacks, so it is reasonable to not know the difference sometimes. I detail on this blog on a series of back and forth on officer involved shooting research how several individuals from both sides again have their motivations impugned based on their research findings. So 2) the person sending critiques cannot distinguish between legitimate scientific critique and unsubstantiated personal attacks.

One of the things that is pretty clear to me – we can pretty much never have solid proof into the motives or minds of people. We can only point out either logical flaws in work, or in the more severe case of forensic numerical work can point out inconsistencies that are at best gross negligence (and at worse intentional malfeasance). It is also OK to point out potential conflicts of interest of course, but relying on that as a major point of scientific critique is often pretty weak sauce. So while I cannot define a bright line between legitimate and illegitimate critique, I don’t think in practice the line is all that fuzzy.

But because critiquing is a major component of many things we do, we have 3) piling on every little critique we can think of. I’ve written about how many reviewers have excessive complaints about minutia in peer reviews, in particular people commonly critique clearly arbitrary aspects of writing style. I think this is partly a function of even if people really don’t have substantive things to say, they go down the daisy chain and create critiques out of something. Nothing is perfect, so everything can be critiqued in some way, but clearly what citations you included are rarely a fundamental aspect of your work. But that part of your work is often the major component of how you are evaluated, at least in terms of peer reviewed journal articles.

This I will admit is a harder problem though – personal vs legitimate critiques I don’t think is that hard to tell the difference – but what counts as a deal breaker vs acceptable problem with some work is a harder distinction to make. This results in someone being able to always justify rejecting some work on some grounds, because we do not have clear criteria for what is ‘good enough’ to publish, ‘justified enough’ to get a grant, ‘excellent enough’ to get an award, etc.

4) The scarlet mark. Academics have a difficult time separating out critiques on one piece of research vs a persons work as a whole. This admittedly I have the weakest evidence of widespread examples across fields (only personal anecdotes really, the original Gelman/Hullman posts point out some similar churlish behavior though, such as asking others to disassociate themselves), but it was common in my circle of senior policing scholars to critique other younger policing scholars out of hand. It happened to me as well, senior academics saying directly to me based on the work I do I shouldn’t count as a policing scholar.

Another common example I came across was opinions of the Piquero’s and their work. It would be one thing to critique individual papers, often times people dismissed their work offhand because they are prolific publishers.

This is likely also related to network effects. If you are in the right network, individuals will support you and defend your work (perhaps without regards to the content). Whereas if you are in an outside network folks will critique you. Because it is fair game to critique everything, and there are regular norms in peer review to critique things that are utterly arbitrary, you can sink a paper for what appears to be objective reasons but is really you just piling on superficial critiques. So of course if you have already decided you do not like someone’s work, you can pile on whatever critiques you want with impunity.

The final behavior I will point out, 5) never back down or admit faults. For a criminal justice example, I will point out an original article in JQC and critique in JQC about interaction effects. So the critique by Alex Reinhart was utterly banal, it was that if you estimate a regression model:

y = B1*[ log(x1*x2*x3) ]

This does not test an interaction effect, quite the opposite, it forces the effects to be equal across the three variables:

y = B1*log(x1) + B1*log(x2) + B1*log(x3)

Considering a major hypothesis for the paper was testing interaction effects, it was kind of a big deal for interpretations in the paper. So the response by the original authors should have been ‘Thank you Alex for pointing out our error, here are the models when correcting for this mistake’, but instead we get several pages of of non sequiturs that attempt to justify the original approach (the authors confuse formative and reflective measurement models, and the distribution of your independent variables doesn’t matter in regression).

To be fair this never admit you are wrong behavior appears to be for everyone, not just academics. Andrew Gelman on his blog often points to journalists refusing to correct mistakes as well.

The irony of never back down is that since critique is a central part of academia, you would think it would also be normative to say ‘ok I made a mistake’ and/or ‘OK I will fix my mistake you pointed out’. Self correcting is surely a major goal of critiques and I mean we all make mistakes. But for some reason admitting fault is not normative. Maybe because we are so used to defending our work through a bunch of nonsense (#2) we also defend it even when it is not defensible. Or maybe because we evaluate people as a whole and not individual pieces of work (#4) we need to never back down, because you will carry around a scarlet mark of one bad piece forever. Or because we ourselves cannot distinguish between legitimate/illegitimate (#1), people never back down. I don’t know.

So I am sure a sociologist who does this sort of analysis for a living could make sense of why these behaviors exist than me. I am simply pointing out regular, repeated interactions I had that make life in academia very mentally difficult.

But again I think these are maybe intrinsic to the idea that skepticism and critiquing are central to academia itself. So I don’t really have any good thoughts on how to change these manifest negative behaviors.

Some ACS download helpers and Research Software Papers

The blog has been a bit sparse recently, as moving has been kicking my butt (hanging up curtains and recycling 100 boxes today!). So just a few quick notes.

Downloading ACS Data

First, I have posted some helper functions to work with American Community Survey data (ACS) in python. For a quick overview, if you import/define those functions, here is a quick example of downloading the 2019 Texas micro level files (for census tracts and block groups) from the census FTP site. Can pipe in another year (if available) and and whatever state into the function.

# Python code to download American Community Survey data
base = r'??????' #put your path here where you want to download data
temp = os.path.join(base,'2019_5yr_Summary_FileTemplates')
data = os.path.join(base,'tables')


Some locations have census tract data to download, I think the FTP site is the only place to download block group data though. And then based on those files you downloaded, you can then grab the variables you want, and here I show selecting out the block groups from those fields:

interest = ['B03001_001','B02001_005','B07001_017','B99072_001','B99072_007',
labs, comp_tabs = merge_tabs(interest,temp,data)
bg = comp_tabs['NAME'].str.find('Block Group') == 0

Then based on that data, I have an additional helper function to calculate proportions given two lists of the numerators and denominators that you want:

top = ['B17010_002',['B11003_016','B11003_013'],'B08141_002']
bot = ['B17010_001',        'B11002_001'       ,'B08141_001']
nam = ['PovertyFamily','SingleHeadwithKids','NoCarWorkers']
prep_sdh = prop_prep(bg, top, bot, nam)

So here to do Single Headed Households with kids, you need to add in two fields for the numerator ['B11003_016','B11003_013']. I actually initially did this example with census tract data, so not sure if all of these fields are available at the block group level.

I have been doing some work on demographics looking at the social determinants of health (see SVI data download, definitions), hence the work with census data. I have posted my prior example fields I use from the census, but criminologists may just use the social-vulnerability-index from the CDC – it is essentially the same as how people typically define social disorganization.

Peer Review for Criminology Software

Second, jumping the gun a bit on this, but in the works is an overlay journal for CrimRxiv. Part of the contributions we will accept are software contributions, e.g. if you write an R package to do some type of analysis function common in criminology.

It is still in the works, but we have some details up currently and a template for submission (I need to work on a markdown template, currently just a word doc). High level I wanted something like the Journal of Statistical Software or the Journal of Open Source Software (I do not think the level of detail of JSS is necessary, but wanted an example use case, which JoSS does not have).

Just get in touch if you have questions whether your work is on topic. Aim is to be more open to contributions at first. Really excited about this, as publicly sharing code is currently a thankless prospect. Having a peer reviewed venue for such code contributions for criminologists fills a very important role that traditional journals do not.

Future Posts?

Hopefully can steal some time to continue writing posts here and there, but will definitely be busy getting the house in order in the next month. Hoping to do some work on mapping grids and KDE in python/geopandas, and writing about the relationship between healthcare data and police incident report data are two topics I hope to get some time to work on in the near future for the blog.

If folks have requests for particular topics on the blog though feel free to let me know in the comments or via email!