The sausage making behind peer review

Even though I am not on Twitter, I still lurk every now and then. In particular I can see webtraffic referrals to the blog, so I will go and use nitter to look it up when I get new traffic.

Recently my work about why I publish preprints was referenced in a thread. That blog post was from the perspective of why I think individual scholars should post preprints. The thread that post was tagged in was not saying from a perspective of an individual writer – it was saying the whole idea of preprints is “a BIG problem” (Twitter thread, Nitter Thread).

That is, Dan thinks it is a problem other people post preprints before they have been peer reviewed.

Dan’s point is one held by multiple scholars in the field (have had similar interactions with Travis Pratt back when I was on Twitter). Dan does not explicitly say it in that thread, but I take this as pretty strong indication Dan thinks posting preprints without peer review is unethical (Dan thinks postprints are ok). The prior conversations I had with Pratt on Twitter he explicitly said it was unethical.

The logic goes like this – you can make errors, so you should wait until colleagues have peer reviewed your work to make sure it is “OK” to publish. Otherwise, it is misleading to readers of the work. In particular people often mention the media uncritically reporting preprint articles.

There are several reasons I think this opinion is misguided.

One, the peer review system itself is quite fallible. Having received, delivered, and read hundreds of peer review reports, I can confidently say that the entire peer review system is horribly unreliable. It has both a false negative and a false positive problem – in that things that should be published get rejected, and things that should not be published get through. Both happen all the time.

Now, it may be the case that the average preprint is lower quality than a peer reviewed journal article (given selection of who posts preprints I am actually not sure this is the case!) In the end though, you need to read the article and judge the article for yourself – you cannot just assume an article is valid simply because it was published in peer review. Nor can you assume the opposite – something not peer reviewed is not valid.

Two, the peer review system is vast currently. To dramatically oversimplify, there are “low quality” (paid for journals, some humanities journals, whatever journals publish the “a square of chocolate and a glass of red wine a day increases your life expectancy” garbage), and “high quality” journals. The people who Dan wants to protect from preprints are exactly the people who are unlikely to know the difference.

I use scare quotes around low and high quality in that paragraph on purpose, because really those superficial labels are not fair. BMC probably publishes plenty of high quality articles, it just happened to also publish an a paper that used a ridiculous methodology that dramatically overestimated vaccine adverse effects (where the peer reviewers just phoned in superficial reviews). Simultaneously high quality journals publish junk all the time, (see Crim, Pysch, Econ, Medical examples).

Part of the issue is that the peer review system is a black box. From a journalists perspective you don’t know what papers had reviewers phone it in (or had their buddies give it a thumbs up) versus ones that had rigorous reviews. The only way to know is to judge the paper yourself (even having the reviews is not informative relative to just reading the paper directly).

To me the answer is not “journalists should only report on peer reviewed papers” (or the same, no academic should post preprints without peer review) – all consumers need to read the work for themselves to understand its quality. Suggesting that something that is peer reviewed is intrinsically higher quality is bad advice. Even if on average this is true (relative to non-peer reviewed work), any particular paper you pick up may be junk. There is no difference from the consumer perspective in evaluating the quality of a preprint vs a peer reviewed article.

The final point I want to make, three, is that people publish things that are not peer reviewed all the time. This blog is not peer reviewed. I would actually argue the content I post here is often higher quality than many journal articles in criminology (due to transparent, reproducible code I often share). But you don’t need to take my word for it, you can read the posts and judge that for yourself. Ditto for many other popular blogs. I find it pretty absurd for someone to think me publishing a blog is unethical – ditto for preprints.

No point in arguing with peoples personal opinions about what is ethical vs what is not though. But thinking that you are protecting the public by only allowing peer reviewed articles to be reported on is incredibly naive as well as paternalistic.

We would be better off, not worse, if more academics posted preprints, peer review be damned.

This one simple change will dramatically improve reproducibility in journals

So Eric Stewart is back in the news, and it appears a new investigation has prompted him to resign from Florida State. For a background on the story, I suggest reading Justin Pickett’s EconWatch article. In short, Justin did analysis of his own papers he co-authored with Stewart to show what is likely data fabrication. Various involved parties had superficial responses at first, but after some prodding many of Stewart’s papers were subsequently retracted.

So there is quite a bit of human messiness in the responses to accusations of error/fraud, but I just want to focus on one thing. In many of these instances, the flow goes something like:

  1. individual points out clear numeric flaws in a paper
  2. original author says “I need time to investigate”
  3. multiple months later, original author has still not responded
  4. parties move on (no resolution) OR conflict (people push for retraction)

My solution here is a step that mostly fixes the time lag in steps 2/3. Authors who submit quantitative results should be required to submit statistical software log files along with their article to the journal from the start.

So there is a push in social sciences to submit fully reproducible results, where an outside party can replicate 100% of the analysis. This is difficult – I work full time as a software engineer – it requires coding skills most scientists don’t have, as well as outside firms to devote resources to the validation. (Offhand, if you hired me to do this, I would probably charge something like $5k to $10k I am guessing given the scope of most journal articles in social sciences.)

An additional problem with this in criminology research, we are often working with sensitive data that cannot easily be shared.

I agree a fully 100% reproducible would be great – lets not make the perfect the enemy of the good though. What I am suggesting is that authors should directly submit the log files that they used to produce tables/regression results.

Many authors currently are running code interactively in Stata/R/SPSS/whatever, and copy-pasting the results into tables. So in response to 1) above (the finding of a data error), many parties assume it is a data transcription error, and allow the original authors leeway to go and “investigate”. If journals have the log files, it is trivial to see if a data error is a transcription error, and then can move into a more thorough forensic investigation stage if the logs don’t immediately resolve any discrepancies.


If you are asking “Andy, I don’t know how to save a log file from my statistical analysis”, here is how below. It is a very simple thing – a single action or line of code.

This is under the assumption people are doing interactive style analysis. (It is trivial to save a log file if you have created a script that is 100% reproducible, e.g. in R it would then just be something like Rscript Analysis.R > logfile.txt.) So is my advice to save a log file when doing interactive partly code/partly GUI type work.

In Stata, at the beginning of your session use the command:

log using "logfile.txt", text replace

In R, at the beginning of your session:

sink("logfile.txt")
...your code here...
# then before you exit the R session
sink()

In SPSS, at the end of your session:

OUTPUT EXPORT /PDF DOCUMENTFILE="local_path\logfile.pdf".

Or you can go to the output file and use the GUI to export the results.

In python, if you are doing an interactive REPL session, can do something like:

python > logfile.txt
...inside REPL here...

Or if you are using Jupyter notebooks can just save the notebook a html file.

If interested in learning how to code in more detail for regression analysis, I have PhD course notes on R/SPSS/Stata.


This solution is additional work from the authors perspective, but a very tiny amount. I am not asking for 100% reproducible code front to back, I just want a log file that shows the tables. These log files will not show sensitive data (just summaries), so can be shared.

This solution is not perfect. These log files can be edited. Requiring these files will also not prevent someone from doctoring data outside of the program and then running real analysis on faked data.

It ups the level of effort for faking results though by a large amount compared to the current status quo. Currently it just requires authors to doctor results in one location, this at a minimum requires two locations (and to keep the two sources equivalent is additional work). Often the outputs themselves have additional statistical summaries though, so it will be clearer if someone doctored the results than it would be from a simpler table in a peer reviewed article.

This does not 100% solve the reproducibility crisis in social sciences. It does however solve the problem of “I identified errors in your work” and “Well I need 15 months to go and check my work”. Initial checks for transcription vs more serious errors with the log files can be done by the journal or any reasonable outsider in at most a few hours of work.

Some peer review ideas

I recently did two more reviews for Crime Solutions. I actually have two other reviews due, in which I jumped Crime Solutions up in my queue. This of course is likely to say nothing about anyone but myself and my priorities, but I think I can attribute this behavior to two things:

  1. CrimeSolutions pays me to do a review (not much, $250, IMO I think I should get double this but DSG said it was pre-negotiated with NIJ).
  2. CrimeSolutions has a pre-set template. I just have to fill in the blanks, and write a few sentences to point to the article to support my score for that item.

Number 2 in particular was a determinant in me doing the 2nd review CrimeSolutions forwarded to me in very short order. After doing the 1st, I had the template items fresh in my mind, and knew I could do the second with less mental overhead.

I think these can, on the margins, improve some of the current issues with peer reviews. #1 will encourage more people to do reviews, #2 will improve the reliability of peer reviews (as well as make it easier for reviewers by limiting the scope). (CrimeSolutions has the reviewers hash it out if we disagree about something, but that has only happened once to me so far, because the template to fill in is laid out quite nicely.)

Another problem with peer reviews is not just getting people to agree to review, but to also to get them to do the review in a timely manner. For this, I suggest a time graded pay scale – if you do the review faster, you will get paid more. Here are some potential curves if you set the pay scale to either drop linearly with number of days or a logarithmic drop off:

So here, if using the linear scale and have a base rate of $300, if you do the review in two weeks, you would make $170, but if you take the full 30 days, you make $10. I imagine people may not like the clock running so fast, so I also devised a logarithmic pay scale, that doesn’t ding you so much for taking a week or two, but after that penalizes you quite heavily. So at two weeks is just under $250.

I realize pay is unlikely to happen (although is not crazy unreasonable, publishers extract quite a bit of rents from University libraries to subscriptions). But standardized forms are something journals could do right now.

Paper retraction and exemplary behavior in Crim

Criminology researchers had a bad look going for them in the Stewart/Pickett debacle. But a recent exchange shows to me behavior we would all be better if we emulated; a critique of a meta analysis (by Kim Rossmo) and a voluntary retraction (by Wim Bernasco).

Exemplary behavior by both sides in this exchange. I am sure people find it irksome if you are on the receiving end, but Kim has over his career pursued response/critique pieces. And you can see in the retraction watch piece this is not easy work (basically as much work as writing an original meta analysis). This is important if science is to be self correcting, we need people to spend the time to make sure prior work was done correctly.

And from Wim’s side it shows much more humility than the average academic – which it is totally OK to admit ones faults/mistakes and move on. I have no doubt if Kim (or whomever) did a deep dive into my prior papers, he would find some mistakes and maybe it would be worth a retraction. It is ok, Wim will not be made to wear a dunce hat at the next ASC or anything like that. Criminology would be better off if we all were more like Kim and more like Wim.

One thing though is that I agree with Andrew Gelman, and that it is OK to do a blog post if you find errors before going to the author directly. Most academics don’t respond to critiques at all (or make superficial excuses). So if you find error in my work go ahead and blog it or write to the editor or whatever. I am guessing it worked out here because I imagine Kim and Wim have crossed paths before, and Wim actually answers his emails.

Note I think this is OK even. For example Data Colada made a dig at an author for not responding to a critique recently (see the author feedback at the bottom). If you critique my work I don’t think I’m obligated to respond. I will respond if I think it is worth my time – papers are not a contract to defend until death.


A second part I wanted to blog about was reviewing papers. You can see in my comment on Gelman’s blog, Kaiser Fung asks “What happened during the peer review process? They didn’t find any problems?”. And you can see in the original retraction watch, I think Kim did his due diligence in the original review. It was only after it was published and he more seriously pursued a replication analysis (which is beyond what is typically expected in peer review), did he find inconsistencies that clearly invalidated the meta analysis.

It is hard reviewing papers to find really widespread problems with an empirical analysis. Personally I do small checks, think of these as audits, that are not exhaustive but I often do find errors. For meta-analysis things I have done are pull out 1/2/3 studies, and see if I can replicate the point effects the authors report. One example I realized in doing this for example is that the Braga meta analysis of hot spots uses the largest point effect for some tables, which I think is probably a mistake and they should just pool all of the effects reported (although the variants I have reviewed have calculated them correctly).

Besides this for meta-analysis I do not have much advice. I have at times noted papers missing, but that was because I was just familiar with them, not because I replicated the authors search strategy. And I have advocated sharing data and code in reviews (which should clearly be done in meta-analysis), but pretty much no one does this.

For not meta analysis, one thing I do is if people have inline statistics (often things like F-tests or Chi-Square tests), I try to replicate these. Looking at regression coefficients it may be simpler to see a misprint, but I don’t have Chi-square committed to memory. I can’t remember a time I was actually able to replicate one of these, reviewed a paper one time with almost 100 inline stats like this and I couldn’t figure out a single one! It is actually somewhat common in crim articles for regression to online print the point effects and p-values, which is more difficult to check for inconsistencies without the standard errors. (You should IMO always publish standard errors, to allow readers to do their own tests by eye.)

Even if one did provide code/data, I don’t think I would spend the time to replicate the tables as a reviewer – it is just too much work. I think journals should hire data/fact checkers to do this (an actual argument for paid for journals to add real value). I only spend around 3-8 hours per review I think – this is not enough time for me to dig into code, putz with it to run on my local machine, and cross reference the results. That would be more like 2~4 days work in many cases I think. (And that is just using the original data, verifying the original data collection in a meta-analysis would be even more work.)

CCTV and clearance rates paper published

My paper with Yeondae Jung, The effect of public surveillance cameras on crime clearance rates, has recently been published in the Journal of Experimental Criminology. Here is a link to the journal version to download the PDF if you have access, and here is a link to an open read access version.

The paper examines the increase in case clearances (almost always arrests in this sample) for incidents that occurred nearby 329 public CCTV cameras installed and monitored by the Dallas PD from 2014-2017. Quite a bit of the criminological research on CCTV cameras has examined crime reductions after CCTV installations, which the outcome of that is a consistent small decrease in crimes. Cameras are often argued to help solve cases though, e.g. catch the guy in the act. So we examined that in the Dallas data.

We did find evidence that CCTV increases case clearances on average, here is the graph showing the estimated clearances before the cameras were installed (based on the distance between the crime location and the camera), and the line after. You can see the bump up for the post period, around 2% in this graph and tapering off to an estimate of no differences before 1000 feet.

When we break this down by different crimes though, we find that the increase in clearances is mostly limited to theft cases. Also we estimate counterfactual how many extra clearances the cameras were likely to cause. So based on our model, we can say something like, a case would have an estimated probability of clearance without a camera of 10%, but with a camera of 12%. We can then do that counterfactual for many of the events around cameras, e.g.:

Probability No Camera   Probability Camera   Difference
    0.10                      0.12             + 0.02
    0.05                      0.06             + 0.01
    0.04                      0.10             + 0.06

And in this example for the three events, we calculate the cameras increased the total expected number of clearances to be 0.02 + 0.01 + 0.06 = 0.09. This marginal benefit changes for crimes mostly depends on the distance to the camera, but can also change based on when the crime was reported and some other covariates.

We do this exercise for all thefts nearby cameras post installation (over 15,000 in the Dallas data), and then get this estimate of the cumulative number of extra theft clearances we attribute to CCTV:

So even with 329 cameras and over a year post data, we only estimate cameras resulted in fewer than 300 additional theft clearances. So there is unlikely any reasonable cost-benefit analysis that would suggest cameras are worthwhile for their benefit in clearing additional cases in Dallas.

For those without access to journals, we have the pre-print posted here. The analysis was not edited any from pre-print to published, just some front end and discussion sections were lightly edited over the drafts. Not sure why, but this pre-print is likely my most downloaded paper (over 4k downloads at this point) – even in the good journals when I publish a paper I typically do not get 1000 downloads.

To go on, complaint number 5631 about peer review – this took quite a while to publish because it was rejected on R&R from Justice Quarterly, and with me and Yeondae both having outside of academia jobs it took us a while to do revisions and resubmit. I am not sure the overall prevalence of rejects on R&R’s, I have quite a few of them though in my career (4 that I can remember). The dreaded send to new reviewers is pretty much guaranteed to result in a reject (pretty much asking to roll a Yahtzee to get it past so many people).

We then submitted to a lower journal, The American Journal of Criminal Justice, where we had reviewers who are not familiar with what counterfactuals are. (An irony of trying to go to a lower journal for an easier time, they tend to have much worse reviewers, so can sometimes be not easier at all.) I picked it up again a few months ago, and re-reading it thought it was too good to drop, and resubmitted to the Journal of Experimental Criminology, where the reviews were reasonable and quick, and Wesley Jennings made fast decisions as well.

Reproducible research and code review for journals

Recently came across two different groups broaching the subject of code reviews and reproducible research more broadly for criminal justice. There are certainly aspects of either that make it difficult in the context of peer review. But I am not one to let the perfect be the enemy of the good, so I will layout the difficulties and give some comments on potential good enough solutions that still make marked improvements on the current state of affairs in crim/cj research.

Reproducible Research

So what do I mean by reproducible research? Jeromy Anglim on crossvalidated has a good breakdown on different ways we may apply the term. So to some it may mean if you did a hot spots policing experiment, can I replicate the same crime reduction results in another city.

These are important to publish (simply because social science experiments will inevitably have quite a bit of variance), but this is often not what we are talking about when we talk about replication. We are often talking about a much smaller in scope goal – if I give you the exact same data, can you reproduce the tables/figures in the manuscript you used to make your inferences?

One problem that is often the case with CJ research is that we are working with sensitive data. If I do analysis on a survey of a sensitive topic, I often cannot share the data. But, I do not believe that should entirely put a spike in the question of reproducible data. I have broken down different levels that are possible in making research more reproducible:

  1. A Sharing data and code files to reproduce the paper results
  2. B Sharing code files and simulated data that illustrate the results
  3. C Sharing the plain-text log files showing the code and results of tables/figures

So I have not seen C proposed anywhere, but it is a dead simple solution that almost everyone should be able to accommodate. It simply involves typing log using "output.txt", text at the top of your Stata file, or OUTPUT EXPORT /PDF DOCUMENTFILE="output.pdf" at the end of your SPSS analysis (or could be done via the GUI), etc. These are the log/output files used to generate the results you report in the paper, and typically contain both the commands run, as well as the resulting tables. These files can quite easily not contain privileged information (in fact they won’t be default most of the time, unless you printed out individual names in a table for example in intermediate results).

To accomplish C does take some modicum of wherewithal in terms of writing code, but it is a pretty low bar. So I see no reason why all quantitative analyses cannot require at least this step right now. I realize it is not foolproof – a bad actor could go and edit the results (same as they could edit the results without this information). But it ups the level of effort to manipulate results by quite a bit, and more importantly has the potential to catch more mundane transcription errors that occur quite frequently.

Sometimes I want more details on the code used, the nature of the data etc. (Most quasi-experimental design for example can be summed up as shape your data in a special way and run a particular regression model.) For people like me who care about that, B helps with that, in that I can see the code front-to-back, can actually go and inspect the shape and values in a particular rectangular dataset, and see how the code interacts with those objects. The only full on example of this I am aware of is a recent example paper in Nature Behavior that shares the code using simulated data.

B is also very similar to people who release statistical packages to reproduce their code. So if you release an R package that conducts your new fancy technique, even if you can’t share your data it is really good for people to be able to view the underlying code even by itself to understand the technique better and in conjunction build on your work more. If you do a new technique, it is a crazy ton of work to replicate that on your own, so most people will not bother.

A is most of the way there to the gold standard – if you can share both the data and the code used to reproduce the analysis. Both A and B take a significant amount of knowledge of statistical programming to accomplish. Most people in our field do not have the skills to write an analysis front-to-back that can run in a series of scripts though. To get to A/B grad programs in crim/cj need to spend crazy more time on teaching these skills, which is near zero now almost across the board.

One brief thing to mention about A is that the boundary is difficult to define. So for example, I share code to reproduce analysis in my 311 and crime at micro places in DC paper (paper link, code). But this starts from a dataset that has the street units in DC and all of the covariates already compiled. But where did that dataset come from? I created it by compiling many different sources, so the base dataset is itself very difficult to replicate. Again not letting the perfect be the enemy of the good, I think just starting from your compiled dataset, and replicating the tables/graphs in the manuscript is better than letting the fuzzy boundary prevent you from sharing anything.

Code Reviews for Journal Submissions

The hardest part of A is that even after you share your data, some journals want to be able to run the code locally to entirely reproduce your results. So while I have shared data code (A above) for many papers, see this spreadsheet, they have not been externally vetted by any of those journals. This vetting is the standard in some economic journals now I believe, and would not be surprised in some poli-sci journals as well. This is a very hard problem though, and requires significant resources from both the journal and the researcher to be able to do that.

The biggest hurdle is that even if you share your data/code, your particular system may be idiosyncratic. You may have different R libraries installed than me. You may have different versions of python packages. I may have used a program on Windows to do some analysis you cannot do on a Mac. You may rely on some paid API I cannot access.

These are often solvable problems, but take quite a bit of time to work out. A comparable example to my work is when data scientists say ‘going to production’. This often involves taking some analysis I did on my local machine, and making it run autonomously on my companies servers. There are some things that make it more or less difficult than the typical academic situation, but I think it is broadly comparable. To go to production for a project will typically take me 3-6 months at 50% of my time, so maybe something like 300 hours for a lowish end estimate. And that is just the time it takes from the researchers end, from the journals end it will also take a significant amount of time to compile every ones code and verify the results.

Because of this, I don’t think the fully reproducible re-run my code and generate the exact same tables are feasible in the current way we do academic research and peer review. But again that is why I list C above – we shouldn’t let the perfect be the enemy of the good.

Validating New Empirical Techniques

The code review above is not really code review in the sense that someone looks at your code and says this is correct, it is simply just saying can I get the same results as you. You may want peer review to accomplish the task of not only saying is it reproducible, but is it valid/correct? There are a few things towards this end I would like to see more often in crim/cj. I realize we are not statistics, so cannot often ask for formal proofs. But there are simpler things we can do to verify the results. These are the responsibility of the researcher to provide, not the reviewer to script up on their own to validate someone elses work.

One, illustrate the technique using a very simplified example. So for instance, in my p-median patrol areas paper, I show an example of constructing the linear program with only four areas. You should be able to calculate what the result should be by hand, so can verify the correctness of your algorithm. This has the added benefit of being a very good pedagogical way to describe your method.

Two, illustrate the technique on a larger sample of simulated data in which you again know the correct result. For one example of this, I showed how to estimate group based trajectory models using deep learning libraries. Again your model/method should be able to recover the correct result (which you know) given the simulated fake data.

Three, validate the result using real data compared to the current standard. For crime mapping papers, this means comparing forecasts compared to RTM, or simpler regression models, or simply prior crime = future crime on out of sample data. Amazingly many machine learning papers in CJ do not do out of sample predictions. If it is an inferential procedure, comparing the results to some other status quo technique is similar, such as showing conformal prediction intervals have smaller widths (so more statistical power) than placebo results for synthetic control designs (at least for that example with state panel level crime data).

You may not have all three of these examples in any particular paper, but I think for very new techniques 1 or 2 is necessary. 3 is often a by-product on the analysis anyway. So I do not believe any of these asks are that onerous. If you have the skills to create some new technique, you should be able to accomplish 1 or 2.

I do not have any special advice in terms of the reviewers perspective. When I do code reviews at work, what we do is go line by line, and my co-workers give high level design advice. E.g. you should use a config file for this instead of defining it inline, you should turn this block into a function, you should make a class to open/close the database connections etc. The code reviews do not validate the technical correctness, so if I queried the wrong data they wouldn’t know in the code review. The proof is in the pudding so to speak, so if my results are performing really badly in the real world I know I am doing something wrong. (And the obverse, if my results are on the mark and making money I am pretty sure I did nothing terribly wrong.)

Because there are not these real world mechanisms to validate code in peer reviewed papers, my suggestions for 1/2/3 are the closest I think we can get in many circumstances. That and simply making your code available will dramatically improve the reproducibility and validity of your research compared to the current status quo in our field.

Publishing in Peer Review?

I am close, but not quite, entirely finished with my current crim/cj peer reviewed papers. Only one paper hangs on, the CCTV clearance paper (with Yeondae Jung). Rejected twice so far (once on R&R from Justice Quarterly), and has been under review in toto around a year and a half so far. It will land somewhere eventually, but who knows where at this point. (The other pre-prints I have on my CV but are not in peer review journals I am not actively seeking to publish anymore.)

Given the typical lags in the peer review process, if you look at my CV I will appear active in terms of publishing in 2020 (6 papers) and 2021 (4 papers and a book). But I have not worked on any peer review paper in earnest since I started working at HMS in December 2019, only copy-editing things I had already produced. (Which still takes a bit of work, for example my Cost of Crime hot spots paper took around 40 hours to respond to reviewers.)

At this point I am not sure if I will pursue any more peer reviewed publications directly in criminology/criminal justice. (Maybe as part of a team in giving support, but not as the lead.) Also we have discussed at my workplace pursuing publications, but that will be in healthcare related projects, not in Crim/CJ.

Part of the reason is that the time it takes to do a peer review publication is quite a bit relative to publishing a simple blog post. Take for instance my recent post on incorporating harm weights into the WDD test. I received the email question for this on Wednesday 11/18, thought about how to tackle the problem overnight, and wrote the blog post that following Thursday morning before my CrimCon presentation, (I took off work to attend the panel with no distractions). So took me around 3 hours in total. Many of my blog posts take somewhat longer, but I definitely do not take any more than 10-20 hours on an individual one (that includes the coding part, the writing part is mostly trivial).

I have attempted to guess as to the relative time it takes to do a peer reviewed publication based on my past work. I averaged around 5 publications per year, worked on average 50 hours a week while I was an academic, and spent something like I am guessing 60% to 80% (or more) of my time on peer review publications. Say I work 51 weeks a year (I definitely did not take any long vacations!, and definitely still put in my regular 50 hours over the summertime), that is 51*50=2550 hours. So that means around (2550*0.6)/5 ~ 300 or (2550*0.8)/5 ~ 400 so an estimate of 300 to 400 hours devoted to an individual peer review publication over my career. This will be high (as it absorbs things like grants I did not get), but is in the ballpark of what I would guess (I would have guessed 200+).

So this is an average. If I had recorded the time, I may have had a paper only take around 100 hours (I don’t think I could squeeze any out in less than that). I have definitely had some take over 400 hours! (My Mapping RTM using Machine Learning I easily spent over 200 hours just writing computer code, not to brag, it was mostly me being inefficient and chasing a few dead ends. But that is a normal part of the research process.)

So it is hard for me to say, OK here is a good blog post that took me 3 hours. Now I should go and spend another 300 to write a peer review publication. Some of that effort to publish in peer review journals is totally legitimate. For me to turn those blog posts into a peer review article I would need a more substantive real-life application (if not multiple real-life applications), and perhaps detailed simulations and comparisons to other techniques for the methods blog posts. But a bunch is just busy work – the front end lit review and answering petty questions from peer reviewers is a very big chunk of that 300 hours (and has very little value added).

My blog posts typically get many more views than my peer review papers do, so I have very little motivation to get the stamp of approval for peer review. So my blog posts take far less time, are more wide read, and likely more accessible than peer reviewed papers. Since I am not on the tenure track and do not get evaluated by peer reviewed publications anymore, there is not much motivation to continue them.

I do have additional ideas I would like to pursue. Fairness and efficiency in siting CCTV cameras is a big one on my mind. (I know how to do it, I just need to put in the work to do the analysis and write it up.) But again, it will likely take 300+ hours for me to finish that project. And I do not think anyone will even end up using it in the end – peer reviewed papers have very little impact on policy. So my time is probably better spent writing a few blog posts and playing video games with all the extra time.

If you are an editor reading this, I still do quite a few peer reviews (so feel free to send me those). I actually have more time to do those promptly since I am not hustling writing papers! I have actually debated on whether it is worth it to start my own peer reviewed journal, or maybe contribute to editing an already existing journals (just joined the JQC editorial board). Or maybe start writing my own crime analysis or methods text books. I think that would be a better use of my time at this point than pursuing individual publications.

Lit reviews are (almost) functionally worthless

The other day I got an email from ACJS about the most downloaded articles of the year for each of their journals. For The Journal of Criminal Justice Education it was a slightly older piece, How to write a literature review in 2012 by Andrew Denney & Richard Tewksbury, DT from here on. As you can guess by the title of my blog post, it is not my most favorite subject. I think it is actually an impossible task to give advice about how to write a literature review. The reason for this is that we have no objective standards by which to judge a literature review – whether one is good or bad is almost wholly subject to the discretion of the reader.

The DT article I don’t think per se gives bad advice. Use an outline? Golly I suggest students do that too! Be comprehensive in your lit review about covering all relevant work? Well who can argue with that!

I think an important distinction to make in the advice DT give is the distinction between functional actions and symbolic actions. Functional in this context means an action that makes the article better accomplish some specific function. So for example, if I say you should translate complicated regression models to more intuitive marginal effects to make your results more interpretable for readers, that has a clear function (improved readability).

Symbolic actions are those that are merely intended to act as a signal to the reader. So if the advice is along the lines of, you should do this to pass peer review, that is on its face symbolic. DT’s article is nearly 100% about taking symbolic actions to make peer reviewers happy. Most of the advice doesn’t actually improve the content of the manuscript (or in the most charitable interpretation how it improves the manuscript is at best implicit). In DT’s section Why is it important this focus on symbolic actions becomes pretty clear. Here is the first paragraph of that section:

Literature reviews are important for a number of reasons. Primarily, literature reviews force a writer to educate him/herself on as much information as possible pertaining to the topic chosen. This will both assist in the learning process, and it will also help make the writing as strong as possible by knowing what has/has not been both studied and established as knowledge in prior research. Second, literature reviews demonstrate to readers that the author has a firm understanding of the topic. This provides credibility to the author and integrity to the work’s overall argument. And, by reviewing and reporting on all prior literature, weaknesses and shortcomings of prior literature will become more apparent. This will not only assist in finding or arguing for the need for a particular research question to explore, but will also help in better forming the argument for why further research is needed. In this way, the literature review of a research report “foreshadows the researcher’s own study” (Berg, 2009, p. 388).

So the first argument, a lit review forces a writer to educate themselves, may offhand seem like a functional objective. It doesn’t make sense though, as lit. reviews are almost always written ex post research project. The point of writing a paper is not to educate yourself, but educate other people on your research findings. The symbolic motivation for this viewpoint becomes clear in DT’s second point, you need to demonstrate credibility to your readers. In terms of integrity if the advice in DT was ‘consider creating a pre-analysis plan’ or ‘release data and code files to replicate your results’ that would be functional advice. But no, it is important to wordsmith how smart you are so reviewers perceive your work as more credible.

Then the last point in the paragraph, articulating the need for a particular piece of research, is again a symbolic action in DT’s essay. You are arguing to peer reviewers about the need for a particular research question. I understand the spirit of this, but think back to what function does this serve? It is merely a signal to reviewers to say, given finite space in a journal, please publish my paper over some other paper, because my topic is more important.

You actually don’t need a literature review to demonstrate a topic is important and/or needed – you can typically articulate that in a sentence or two. For a paper I reviewed not too long ago on crime reductions resulting from CCTV installations in a European city, I was struck by another reviewers critique saying that the authors “never really motivate the study relative to the literature”. I don’t know about you, but the importance of that study seems pretty obvious to me. But yeah sure, go ahead and pad that citation list with a bunch of other studies looking at the same thing to make some peer reviewers happy. God forbid you simply cite a meta-analysis on prior CCTV studies and move onto better things.

What should a lit review accomplish?

So again I don’t think DT give bad advice – mostly vapid but not obviously bad. DT focus on symbolic actions in lit reviews because as lit reviews are currently performed in CJ/Crim journals, they are almost 100% symbolic. They serve almost no functional purpose other than as a signal to reviewers that you are part of the club. So DT give about the best advice possible navigating a series of arbitrary critiques with no clear standard.

As an example for this position that lit reviews accomplish practically nothing, conduct this personal experiment. The next peer review article you pick up, do not read the literature review section. Only read the abstract, and then the results and conclusion. Without having read the literature review, does this change the validity of a papers findings? It for the most part does not. People get feelings hurt by not being cited (including myself), but even if someone fails to cite some of my work that is related it pretty much never impacts the validity of that persons findings.

So DT give advice about how peer review works now. No doubt those symbolic actions are important to getting your paper published, even if they do not improve the actual quality of the manuscript in any clear way. I rather address the question about what I think a lit review should look like – not what you should do to placate three random people and the editor. So again I think the best way to think about this is via articulating specific functions a lit review accomplishes in terms of improving the manuscript.

Broadening the scope abit to consider the necessity of citations, the majority of citations in articles are perfunctory, but I don’t think people should plagiarize. So when you pull a very specific piece of information from a source, I think it is important to cite that work. Say you are using a survey instrument developed by someone else, citing the work that establishes that instruments reliability and validity, as well as the original population those measures were established on, is certainly useful information to the reader. Sources of information/measures, a recent piece saying the properties of your statistical model are I think other good examples of things to cite in your work. Unfortunately I cannot give a bright line here, I don’t cite Gauss every time I use the normal distribution. But if I am using a code library someone else developed that is important, inasmuch as that if someone wants to do a similar project they could use the same library.

In terms of discussing relevant results in prior studies, again the issue is the boundary of what is relevant is very difficult to articulate. If there is a relevant meta-analysis on a topic, it seems sufficient to me to simply state the results of the meta-analysis. Why do I think that is important though? It helps inform your priors about the current study. So if you say a meta-analysis effect size is X, and the current study has an effect size much larger, it may give you pause. It is also relevant if you are generalizing from the results of the study, it is just another piece of evidence in addition to the meta-analysis, not an island all by itself.

I am not saying discussing prior specific results are not needed entirely, but they do not need to be extensive. So if studies Z, Y, X are similar to yours but all had null results, and you think it was because the sample sizes were too small, that is relevant and useful information. (Again it changes your priors.) But it does not need to be belabored on in detail. The current standard of articulating different theoretical aspects ad-nauseum in Crim/CJ journals does not improve the quality of manuscripts. If you do a hot spots policing experiment, you do not need to review all the different minutia of general deterrence theory. Simply saying this experiment is likely to only accomplish general deterrence, not specific deterrence, seems sufficient to me personally.

When you propose a book you need to say ‘here are some relevant examples’ – I think the same idea would be sufficient for a lit review. OK here is my study, here are a few additional studies I think the reader may be interested in that are related. This accomplishes what contemporary lit reviews do in a much more efficient manner – citing more articles makes it much more difficult to pull out the really relevant related work. So admit this does not improve the quality of the current manuscript in a specific way, but helps the reader identify other sources of interest. (I as a reader typically go through the citation list and note a few articles I am interested in, this helps me accomplish that task much quicker.)

I’ve already sprinkled a few additional pieces of advice in this blog post (marginal effect estimates, pre-analysis plans, sharing data code), although you may say they don’t belong in the lit review. Whatever, those are things that actually improve either the content of the manuscript or the actual integrity of the research, not some spray paint on your flowers.

Relevant Other Work

CrimRxiv, Alt-Journal Contributions, and Mike Maltz’s Retrospective

As I’m sure followers of mine know, I am a big proponent of posting pre-prints. Spearheaded by Scott Jacques, he has started a specifically criminology focused pre-print server title CrimRxiv. It is still in beta but anyone can contribute a paper if they want.

One of the things me and Scott have been jamming about is how to leverage crimrxiv to make a journal that not only takes advantage of all the goodies on the internet, such as being able to embed interactive graphics or other rich media directly in a journal articles. But to really widen the scope of what ‘counts’ in terms of scholarly contribution. Why can’t things like a cool app, or a really good video lecture you edited, or a blog post illustrating code be put on the same level with journal articles?

Part of the reason I am writing this blog post is that I saw Michael Maltz recently publish a retrospective on his career on Academia.edu. This isn’t a typical journal article, but despite that there is no reason why you shouldn’t share such pieces. So I was able to convince Mike to post A Retrospective Look at My Professional Life to crimrxiv. When he first posted it on Academia.edu here was my response on how Mike (despite never having crossed paths) has influenced my career.


Hi Michael and thank you for sharing,

I’ve followed your work since a grad student at Albany. I initially got hooked on data viz based on Tufte’s book. When I looked for examples of criminologists discussing data viz you were the only one I found. That was sometime around 2010, so you had that chapter in the handbook of quantitative crim. You also had another article about drawing glyphs to illustrate life course transitions I was familiar with.

When I finished my classes at SUNY, I then worked at Troy as a crime analyst while finishing my dissertation. I doubt any of the coffee shops were the same from your time, but I did like walking over to Famous hotdogs for lunch every now and then.

Most of my work at the PD was making time series graphs and maps. No regression, so most of my stats training was not particularly useful. Even my mapping course I took focused on areal data analysis was not terribly relevant.

I tried to do similar projects to your glyph life-courses with interval censored crime data, but I was never really successful with that, they always ended up being too complicated with even moderately large crime datasets, see https://andrewpwheeler.com/2013/02/28/interval-graph-for-viz-temporal-overlap-in-crime-events/ and https://andrewpwheeler.com/2014/10/02/stacking-intervals/ for my attempts.

What was much more helpful was simply doing monitoring metrics over time, simple running means, and then I just inverted the PDF of the Poisson to give error bars, e.g. https://andrewpwheeler.com/2016/06/23/weekly-and-monthly-graphs-for-monitoring-crime-patterns-spss/. Then cases that were outside the error bands signified an anomalous pattern. In Troy there was an arrest of a single prolific person breaking into cars, and the trend went from a creeping 10 year high to a 10 year low instantly in those graphs.

So there again we have your work on the Poisson distribution and operations research in that JQC article. Also sometime in there I saw a comment you made on Andrew Gelman’s blog pointing to your work with error bands for BJS. Took that ‘fan chart’ idea later on and provided error bands for city level and USA level homicide trends, e.g. https://apwheele.github.io/MathPosts/FanChart_NewOrleans.html. Most of popular discussion of large scale crime trends is misguided over-interpreting short term noise in my opinion.

So all my degrees are in criminal justice, but I have been focusing more on linear programming over time borrowing from operations researchers as well, https://andrewpwheeler.com/2020/05/29/an-intro-to-linear-programming-for-criminologists/. I’ve found that taking outputs from a predictive model and then applying a decision analysis to specifically articulate strategies CJ agencies should take is much more fruitful than the typical way academic research is done.

Thank you again for sharing your story and best, Andy Wheeler

Some more peer review musings

It is academics favorite pastime to complain about peer review. There is no getting around it – peer review is degrading, both in the adjective sense of demoralizing, as well as in the verb ‘wear down a rock’ sense.

There is nothing quite like it in other professional spheres that I can tell. For example if I receive a code review at work, it is not adversarial in nature like peer review is. I think most peer reviewers treat the process like a prosecutor, poking at all the minutia that they can uncover, as opposed to being judges of the truth.

At work I also have a pretty unobjectional criteria for success – if my machine learning model is making money then it is good. Not so for academic papers. Despite everyone learning about the ‘scientific method’, academics don’t really have a coda to follow like that. And that lack of objective criteria causes quite a bit of friction in the peer review process.

But all the things I think are wrong with peer review I have a hard time articulating succinctly. So we have a bias problem, in that reviewers have preferences for particular methods or styles. We have a problem that individuals get judged based on highly arbitrary standards of what is interesting. Many critiques are focused on highly pedantic things that make little to no material difference, such as the use of personal pronouns. Style advice can be quite bad to be frank, in my career I’m up to something like four different peer reviews saying providing results in the introduction is inappropriate. Edit: And these complaints are not exhaustive as well, we have reviewers pile on endless complaints in multiple rounds, and people phone it in with nonsense short descriptions as well. (I’m sure I can continue to add to the list, but I’ve personally experienced all of these things, in addition to being called a racist in peer review.)

I’ve previously provided advice about how I think peer reviews should be done. To sum up my advice:

  • Differentiate between big problems and minor stuff
  • Be very specific what things you want changed (and how to change them)

I think I should add two to this as well – don’t be a jerk, and don’t pile on (or be respectful of peoples time). For the jerk part I don’t even meet that standard if I am being honest with myself. For one of my peer reviews I am going to share a later round I got pretty snippy (with the Urban folks on their CCTV paper in Milwaukee), that was over the top in retrospect. (I don’t think reviewers should get a say on revisions, editors should just arbiter whether the responses by the original authors were sufficient. I am pretty much never happy if I suggest something and an author says no thanks.)

For the pile on part, I recently posted my responses to my cost of crime hot spots paper. Although all three reviewers were positive, it still took me ~40 hours to respond to all of the critiques. Even though all of the reviews were in good faith, it honestly was not worth my time to make those revisions. (I think two were legitimate, changing the front end to say my hot spots are crime cost, not crime harm, and the ask for more details on the Hunt estimates. The rest were just fluff though.)

If you do consulting think about your rate – and whether addressing all those minor critiques meet the threshold of ‘is the paper improved by the amount to justify my consulting fee’. My experience it does not come close, and I am quite a cheap consultant.

So I have shared in the past a few examples of my response to reviewers (besides above, see here for the responses to my how to select participants for focussed deterrence paper, and here for my tables and graphs for crime analysis paper). But maybe instead of bagging on others, I should just try to lead by example.

So below are several of my recent reviews. I’ve only pulled out the recent ones that I know the paper has been published. This is then subject to a selection bias, papers I have more negative things to say are less likely to be published in the end. So keep that bias in mind when judging these.

The major components of how I believe I am different from the modal reviewer is I subdivide between major and minor concerns. Many reviewers take the running commentary through the paper approach, which not only does not distinguish between major/minor, but many critiques are often explicitly addressed (just at a different point in the manuscript from where it popped into the reviewers head).

The way I do my reviews I actually do the running commentary in the side of the paper, then I sleep on it, then I organize into the major sections. In this process of organizing I drop quite a bit of minor fluff, or things that are cleared up at later points in the paper. (I probably end up deleting ~50% of my original notes.)

Second, for papers I give a thumbs up for I take actual time to articulate why they are good. I don’t just give a complement sandwich and pile on a series of minor critiques. Positive comments are fleeting in peer review, but necessary to judge a piece.

So here are some of my examples from peer review, take them for what they are worth. I no doubt do not always follow my advice I lay out above, by try my best to.


Title: Going Local: Do Consent Decrees and Other Forms of Federal Intervention in Municipal Police Departments Reduce Police Killings?

The article is well written and a timely topic. The authors have a quote that I think sums it up quite nicely “This paper is therefore the first to evaluate the effects of civil rights investigations and the consent decree process on an important – arguably the most important – measure of use of force: death.” Well put.

The analysis is also well executed. It is observational, but city fixed effects and the event history study were the main things I was looking for.

I have a three major revision requests. But they are just editing the manuscript type stuff, so can be easily accomplished by the authors.

  1. Drop the analysis of Crime Rates

While I think the analysis of officer deaths is well done, the analysis of crime rates I have more doubts about. Front end is not focused on this at all, and would need a whole different section about it discussing recent work in Chicago/Baltimore/NYC. Also I don’t think the same analysis (fixed effects), is sufficient for crime trends – we are basically talking about the crime drop period. Also very important to assess heterogeneity in that analysis – a big part of the discussion is that Chicago/Baltimore are different than NYC.

The analysis of officer deaths is sufficient to stand on its own. I agree the crime rates question is important – save it for another paper where it can be given enough attention to do the topic justice.

  1. Conclusion

Like I said, I was most concerned about the event study, and the authors show that there was some evidence of pre-treatment selection trends. You don’t talk about the event study results in the conclusion though. It is a limitation that the event study analysis had less clear evidence of crime reductions than the simpler pre/post analysis.

This is likely due to larger error bars with the rare outcome, which is itself a limitation of using shootings as the outcome. I think it deserves to be mentioned even if overall the effects of consent decrees are not 100% clear on reducing officer involved deaths, locally monitoring more frequent outcomes is worthwhile. See the recent work by MacDonald/Braga in NYC. New Orleans is another example, https://journals.sagepub.com/doi/full/10.1177/1098611117709785.

I note the results are important to publish without respect to the results of the analysis. It would be important to publish this work even if it did not show reductions in officer involved deaths.

  1. Front end lit review

For 2.2 racial disparities, this section misses much of the recent work on the topic of officer involved shootings except for Greg Ridgeway’s article. There is a wide array of work that uses other counterfactual approaches to examine implicit bias, see Roland Fryer work (https://www.journals.uchicago.edu/doi/abs/10.1086/701423), or the Wheeler/Worrall papers on Dallas shoot/don’t shoot data (https://journals.sagepub.com/doi/full/10.1177/1525107118759900 & https://journals.sagepub.com/doi/full/10.1177/0011128718756038). These are the observational duel to the experimental lab work by Lois James. Also there are separate papers by Justin Nix (https://www.tandfonline.com/doi/full/10.1080/0735648X.2018.1547269), and Joseph Cesario (https://journals.sagepub.com/doi/10.1177/1948550618775108) that show when using different benchmarks estimates of disparity can vary by quite abit.

For the use of force review (section 2.3), people typically talk about situational factors that influence use of force (in addition to individual/extra-legal and organizational). So you may say “consent decrees don’t/can’t change situational behavior of offenders, so what is the point of writing about it” tis true, but it still should be articulated. To the extent that situational factors are a large determinant of shootings, it may be consent decrees are not the right way to reduce officer deaths then if it is all situational. But, consent decrees may indirectly effect police/citizen interactions (such as via de-escalation or procedural justice training), that could be a mechanism through which fewer officer deaths occur.

Long story short, 2.2 should be updated with more recent work on officer involved shootings and the benchmark problem, and 2.3 should be updated to include discussion of the importance of situational factors in the use of force.

Additional minor stuff:

  • pg 12, killings are not a proxy for use of force (they count as force!)
  • regression equations need some editing. Definitely need a log or exponential function on one of the sides of the equation, and generalized linear models do not have an error term like linear models do. I personally write them as:

log(E[crime_it]) = intercept + B1*X + …..

where E[crime_it] is the expected value of crime at place i and time t (equivalent to lambda in your current formulation).

  • pg 19 monitor misspelled (equation type)

Title: Immigration Enforcement, Crime and Demography: Evidence from the Legal Arizona Workers Act

Well done paper, uses current status quo empirical techniques to estimate the effect employment oversight for illegal immigrant workers had on subsequent crime reductions.

Every critique I thought of was systematically addresses in the paper already. Discussed issues with potential demographic spillovers (biasing estimates because controls may have crime increases). Eliminating states from the pool of placebos with stronger E-Verify support, and later on robustness checks for neighboring states. And using some simple analyses to give decompositions that would be expected due to the decrease in the share of young males.

Minor stuff

  • Light and Miller quote on pg 21 not sure if it is space/dash missing or just kerning problems in LaTex
  • Pg 26, you do the estimates for 08 and 09 separately, but I think you should pool 08-09 together in that table (the graphs do the visual tests for 08 & 09 independently). For example, Violent crimes are negative for both 08 & 09, which in the graphs are not outside the typical bounds for each individual year, but cumulatively that may be quite low (most will have a variable low and then high). This should get you more power I think given the few potential placebo tests. So it should be something like (-0.067 + -0.108) with error bars for violent crimes.
  • I had to stare at your change equation (pg 31) for quite a bit. I get the denominator of equation 2, I’m confused about the numerator (although it is correct). Here is how I did it (using your m1, m2, a, and X).

Pre-Crime = m1*aX + (1 - m1)X = X * (m1*a + 1 - m1)

Post-Crime = m2*aX + (1 - m2)X = X * (m2*a + 1 - m2) #so you can drop the X

% Change = (Post - Pre) / Pre = Post/Pre - 1

At least that is easier for me to wrap my head around. Also should m1 and m2 be the overall share of young adults? Not just limited to immigrants? (Since the estimated crime reduction is for everybody, not just crimes committed by immigrants?)


Title: How do close-circuit television cameras impact crimes and clearances? An evaluation of the Milwaukee Police Department’s public surveillance system

Well written paper. Uses appropriate quasi-experimental methods (matching and diff-in-diff) to evaluate reductions in crimes and potential increases in case clearances.

I have some minor requests on analysis/descriptive stats, but things that can be easily addressed by the authors.

First, I would request a simpler pre-post DiD table. The panel models not only introduce more complications of interpretation, but they are based on asymptotics that I’m not sure if they are met here.

So if you do something like:

              Pre Crime   Post Crime  Difference DiD
Treated      100          80              -20         -30
Control      100        110               10  

It is much more straightforward to interpret, and I provide a stat test and power analysis advice in https://crimesciencejournal.biomedcentral.com/articles/10.1186/s40163-018-0085-5. Your violent crime counts are very low, so I think you would need unrealistic effects (of say 50% reductions) to detect an effect with your group sizes.

You can do the same table for % clearances, and do whatever binomial test of proportions (which will have much more power than the regression equation). And again is simpler to interpret.

Technically doing a poisson regression with an exposure is not the same as modelling the clearance counts with total crimes as an exposure. The predicted PMF can technically go above 1 (so you can have %’s above 100%). It would be more appropriate to use binomial regression, so something like below in Stata:

glm arrests i.Treat i.Post Treat#Post i.Unit, family(binomial crimes) link(logit)

(No xtbinomial unfortunately, maybe can coerce xtlogit with weights to do it, or use meglm since you are doing random effects. I think you should probably do fixed effects here anyway.) I don’t know if it will make a difference in the results here though.

Andy Wheeler


Title: The formation of suspicion: A vignette study

Well done experimental study examining suspiciousness using a vignette approach. There is no power analysis done, but it is a pretty large sample, and only examines first order effects. (With a note about examining a well powered interaction effect described in the analysis section.)

My only big ask is that the analysis should include a dummy variable for the two different sampling frames (e.g. a dummy variable for 1=New York). Everything else is minor/easy stuff.

Minor stuff:

  • How were the vignettes randomized? (They obviously were, balance is really good!)
  • For the discussion, it is important to understand these characteristics that start an interaction because of another KT heuristic bias – anchoring effects. Paul Taylor has some recent work of interest on Dispatch priming that is relevent. (Also Dan Mears had a recent overview paper on biases/heuristics in Journal of Criminal Justice I also think should probably be cited.)
  • For Table 1 I would label the “Dependent Variable” with a more descriptive label (suspiciousness)
  • Also people typically code the variables 0/1 instead of 1/2, it actually won’t impact the analysis here, since it is just a linear shift of +1 (just change the intercept term). The variables of Agency Size, Education, & Experience are coded as ordinal variables though, and they should maybe be included as dummy variables for each category. (I don’t think this will make a big difference though for the main randomized variables of interest though.)

Title: The criminogenic effect of marijuana dispensaries in Denver, Colorado: A microsynthetic controls quasi-experiment and cost-benefit analysis

This work is excellent. It is a timely topic, as many states are still considering whether to legalize and what that will exactly look like.

The work is a replication/extension of a prior study also in Denver, but this is a stronger research design. Using the micro units allows for matching on pre-trends and drawing synthetic controls, which is a stronger design than prior pre/post work at neighborhood level (both in Denver and in LA). Micro units are also more relevant to test direct criminogenic effects of the stores. The authors may be interested in https://www.tandfonline.com/doi/full/10.1080/0735648X.2019.1582351, which is also a stronger research design using matched comparison groups, but is only for medical dispensaries in DC and is a much smaller sample.

Even if one considers that to be a minor contribution (crime increase findings are similar magnitude to Hughes JQ paper), the cost benefit analysis is a really important contribution. It hits on all of the important considerations – that even if the book of costs/benefits is balanced, they are relevant to really different segments of society. So even if tax revenue offsets the books, places may still not want to take on that extra crime burden.

Only two (very minor) suggestions. One, some of the permutation lines are outside of the figure boundaries. Two, I would like a brief ado in the discussion mentioning the trade-off in economic investment making places busier (which fits right into the current discussion of how costs-benefits). Likely if you added 100 ice-cream shops crime might go up due to increased commercial activity – weed has the same potential negative externality – but is not necessarily worse than say opening a bunch of bars or convenience stores. (Same thing is relevant for any BID, https://journals.sagepub.com/doi/full/10.1177/0011128719834559.)


Title: Understanding the Predictors of Street Robbery Hot Spots: A Matched Pairs Analysis and Systematic Social Observation

Note: Reviewed for Justice Quarterly and rejected. Final published version is at Crime & Delinquency (which I was not a reviewer for)

The article uses the case-control method to match high-crime to low-crime street segments, and then examine local land use factors (bars, convenience stores, etc.) as well as the more novel source of physical disorder coded from Google Street View images. The latter is the main motivation for the case-control design, as manual coding prevents one from doing the entire city.

Case-control designs by their nature you cannot manipulate the number of cases you have at your disposal. Thus the majority of such designs typically focus on ONE exposure of interest, and match on any other characteristic that is known to affect the outcome, but is not of direct interest to the study. E.g. if you examined lung cancer given exposure to vehicle emissions, you would need to match controls as to whether they smoked or not. This allows you to assess the exposure of interest with the maximum power given the design limitations, although you can’t say anything about smoking vs not smoking on the outcome.

The authors here match within neighborhoods and on several other local characteristics, but then go onto examine different crime generators (7 factors), as well as the two disorder coded variables. Given that this is a fairly small sample size of 129 matched cases, this is likely a pretty underpowered research design. Logistic regression relies on asymptotic properties, and even with fewer variables it is questionable whether 260 cases is sufficient, see https://www.tandfonline.com/doi/abs/10.1080/02664763.2017.1282441. Thus you get in abstract terms fairly large odds ratios, but are still not significant (e.g. physical disorder has an odds ratio of 1.7, but is insignificant). So you have low power to test all of those coefficients.

I believe a stronger research design would focus on the novel measures (the Google Street View disorder), and match on the crime generator variables. The crime generator factors have been well established in prior research, so that work is a small contribution. The front end focuses on the typical crime generator typology, and lacks any broader “so what” about the disorder measures. (Which can be made, focusing on broken windows theory and the controversy it has caused.)

It would be feasible for authors to match on the crime generators, but it would result in different control cases, so they would need to code additional locations. (If you do match on crime generators, I think it is OK to include 0 crime areas in the pool. Main reason it is sometimes not fair to include 0 crime areas is because they are things like a bridge or a park.)

Minor notes:

  • You cannot interpret the coefficients in causal terms on which you matched in the conclusion. (Top page 23.) It only says the extent to which your matching was successful, not anything about causality. Later on you also attempt to weasel out of causal interpretations (page 26). Saying this is not causality, but otherwise interpreting regression coefficients as if they have any meaning is an act of cognitive dissonance.
  • Given that there is perfect separation between convenience stores and hot spots, the model should have infinite standard errors for that factor. (You have a few coefficients that appear to have explosive standard errors.)
  • I wouldn’t describe as you match on the dependent variable [bottom page 8]. I realize it is confusing mixing propensity score terminology with case-control (although the method is fine). You match cases-to-controls on the independent variables you choose at the onset.
  • page 5, Dan O’Brien in his work has shown that you have super-callers for 311. Which fits right into your point of how coding of images may be better, as a single person can bias the measure at micro places. (This is sort of the opposite of not calling problem you mention.)
  • You may be interested, some folks have tried to automate the scoring part using computer vision, see https://ieeexplore.ieee.org/document/6910072?tp=&arnumber=6910072&url=http:%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D6910072 or https://mikebader.net/projects/canvas-usa/ . George Mohler had a talk at ASC where he used Google’s automated image labeling to identify disorder (like graffiti) in pictures https://cloud.google.com/vision/docs/labels