CrimRxiv, Alt-Journal Contributions, and Mike Maltz’s Retrospective

As I’m sure followers of mine know, I am a big proponent of posting pre-prints. Spearheaded by Scott Jacques, he has started a specifically criminology focused pre-print server title CrimRxiv. It is still in beta but anyone can contribute a paper if they want.

One of the things me and Scott have been jamming about is how to leverage crimrxiv to make a journal that not only takes advantage of all the goodies on the internet, such as being able to embed interactive graphics or other rich media directly in a journal articles. But to really widen the scope of what ‘counts’ in terms of scholarly contribution. Why can’t things like a cool app, or a really good video lecture you edited, or a blog post illustrating code be put on the same level with journal articles?

Part of the reason I am writing this blog post is that I saw Michael Maltz recently publish a retrospective on his career on Academia.edu. This isn’t a typical journal article, but despite that there is no reason why you shouldn’t share such pieces. So I was able to convince Mike to post A Retrospective Look at My Professional Life to crimrxiv. When he first posted it on Academia.edu here was my response on how Mike (despite never having crossed paths) has influenced my career.


Hi Michael and thank you for sharing,

I’ve followed your work since a grad student at Albany. I initially got hooked on data viz based on Tufte’s book. When I looked for examples of criminologists discussing data viz you were the only one I found. That was sometime around 2010, so you had that chapter in the handbook of quantitative crim. You also had another article about drawing glyphs to illustrate life course transitions I was familiar with.

When I finished my classes at SUNY, I then worked at Troy as a crime analyst while finishing my dissertation. I doubt any of the coffee shops were the same from your time, but I did like walking over to Famous hotdogs for lunch every now and then.

Most of my work at the PD was making time series graphs and maps. No regression, so most of my stats training was not particularly useful. Even my mapping course I took focused on areal data analysis was not terribly relevant.

I tried to do similar projects to your glyph life-courses with interval censored crime data, but I was never really successful with that, they always ended up being too complicated with even moderately large crime datasets, see https://andrewpwheeler.com/2013/02/28/interval-graph-for-viz-temporal-overlap-in-crime-events/ and https://andrewpwheeler.com/2014/10/02/stacking-intervals/ for my attempts.

What was much more helpful was simply doing monitoring metrics over time, simple running means, and then I just inverted the PDF of the Poisson to give error bars, e.g. https://andrewpwheeler.com/2016/06/23/weekly-and-monthly-graphs-for-monitoring-crime-patterns-spss/. Then cases that were outside the error bands signified an anomalous pattern. In Troy there was an arrest of a single prolific person breaking into cars, and the trend went from a creeping 10 year high to a 10 year low instantly in those graphs.

So there again we have your work on the Poisson distribution and operations research in that JQC article. Also sometime in there I saw a comment you made on Andrew Gelman’s blog pointing to your work with error bands for BJS. Took that ‘fan chart’ idea later on and provided error bands for city level and USA level homicide trends, e.g. https://apwheele.github.io/MathPosts/FanChart_NewOrleans.html. Most of popular discussion of large scale crime trends is misguided over-interpreting short term noise in my opinion.

So all my degrees are in criminal justice, but I have been focusing more on linear programming over time borrowing from operations researchers as well, https://andrewpwheeler.com/2020/05/29/an-intro-to-linear-programming-for-criminologists/. I’ve found that taking outputs from a predictive model and then applying a decision analysis to specifically articulate strategies CJ agencies should take is much more fruitful than the typical way academic research is done.

Thank you again for sharing your story and best, Andy Wheeler

Some more peer review musings

It is academics favorite pastime to complain about peer review. There is no getting around it – peer review is degrading, both in the adjective sense of demoralizing, as well as in the verb ‘wear down a rock’ sense.

There is nothing quite like it in other professional spheres that I can tell. For example if I receive a code review at work, it is not adversarial in nature like peer review is. I think most peer reviewers treat the process like a prosecutor, poking at all the minutia that they can uncover, as opposed to being judges of the truth.

At work I also have a pretty unobjectional criteria for success – if my machine learning model is making money then it is good. Not so for academic papers. Despite everyone learning about the ‘scientific method’, academics don’t really have a coda to follow like that. And that lack of objective criteria causes quite a bit of friction in the peer review process.

But all the things I think are wrong with peer review I have a hard time articulating succinctly. So we have a bias problem, in that reviewers have preferences for particular methods or styles. We have a problem that individuals get judged based on highly arbitrary standards of what is interesting. Many critiques are focused on highly pedantic things that make little to no material difference, such as the use of personal pronouns. Style advice can be quite bad to be frank, in my career I’m up to something like four different peer reviews saying providing results in the introduction is inappropriate. Edit: And these complaints are not exhaustive as well, we have reviewers pile on endless complaints in multiple rounds, and people phone it in with nonsense short descriptions as well. (I’m sure I can continue to add to the list, but I’ve personally experienced all of these things, in addition to being called a racist in peer review.)

I’ve previously provided advice about how I think peer reviews should be done. To sum up my advice:

  • Differentiate between big problems and minor stuff
  • Be very specific what things you want changed (and how to change them)

I think I should add two to this as well – don’t be a jerk, and don’t pile on (or be respectful of peoples time). For the jerk part I don’t even meet that standard if I am being honest with myself. For one of my peer reviews I am going to share a later round I got pretty snippy (with the Urban folks on their CCTV paper in Milwaukee), that was over the top in retrospect. (I don’t think reviewers should get a say on revisions, editors should just arbiter whether the responses by the original authors were sufficient. I am pretty much never happy if I suggest something and an author says no thanks.)

For the pile on part, I recently posted my responses to my cost of crime hot spots paper. Although all three reviewers were positive, it still took me ~40 hours to respond to all of the critiques. Even though all of the reviews were in good faith, it honestly was not worth my time to make those revisions. (I think two were legitimate, changing the front end to say my hot spots are crime cost, not crime harm, and the ask for more details on the Hunt estimates. The rest were just fluff though.)

If you do consulting think about your rate – and whether addressing all those minor critiques meet the threshold of ‘is the paper improved by the amount to justify my consulting fee’. My experience it does not come close, and I am quite a cheap consultant.

So I have shared in the past a few examples of my response to reviewers (besides above, see here for the responses to my how to select participants for focussed deterrence paper, and here for my tables and graphs for crime analysis paper). But maybe instead of bagging on others, I should just try to lead by example.

So below are several of my recent reviews. I’ve only pulled out the recent ones that I know the paper has been published. This is then subject to a selection bias, papers I have more negative things to say are less likely to be published in the end. So keep that bias in mind when judging these.

The major components of how I believe I am different from the modal reviewer is I subdivide between major and minor concerns. Many reviewers take the running commentary through the paper approach, which not only does not distinguish between major/minor, but many critiques are often explicitly addressed (just at a different point in the manuscript from where it popped into the reviewers head).

The way I do my reviews I actually do the running commentary in the side of the paper, then I sleep on it, then I organize into the major sections. In this process of organizing I drop quite a bit of minor fluff, or things that are cleared up at later points in the paper. (I probably end up deleting ~50% of my original notes.)

Second, for papers I give a thumbs up for I take actual time to articulate why they are good. I don’t just give a complement sandwich and pile on a series of minor critiques. Positive comments are fleeting in peer review, but necessary to judge a piece.

So here are some of my examples from peer review, take them for what they are worth. I no doubt do not always follow my advice I lay out above, by try my best to.


Title: Going Local: Do Consent Decrees and Other Forms of Federal Intervention in Municipal Police Departments Reduce Police Killings?

The article is well written and a timely topic. The authors have a quote that I think sums it up quite nicely “This paper is therefore the first to evaluate the effects of civil rights investigations and the consent decree process on an important – arguably the most important – measure of use of force: death.” Well put.

The analysis is also well executed. It is observational, but city fixed effects and the event history study were the main things I was looking for.

I have a three major revision requests. But they are just editing the manuscript type stuff, so can be easily accomplished by the authors.

  1. Drop the analysis of Crime Rates

While I think the analysis of officer deaths is well done, the analysis of crime rates I have more doubts about. Front end is not focused on this at all, and would need a whole different section about it discussing recent work in Chicago/Baltimore/NYC. Also I don’t think the same analysis (fixed effects), is sufficient for crime trends – we are basically talking about the crime drop period. Also very important to assess heterogeneity in that analysis – a big part of the discussion is that Chicago/Baltimore are different than NYC.

The analysis of officer deaths is sufficient to stand on its own. I agree the crime rates question is important – save it for another paper where it can be given enough attention to do the topic justice.

  1. Conclusion

Like I said, I was most concerned about the event study, and the authors show that there was some evidence of pre-treatment selection trends. You don’t talk about the event study results in the conclusion though. It is a limitation that the event study analysis had less clear evidence of crime reductions than the simpler pre/post analysis.

This is likely due to larger error bars with the rare outcome, which is itself a limitation of using shootings as the outcome. I think it deserves to be mentioned even if overall the effects of consent decrees are not 100% clear on reducing officer involved deaths, locally monitoring more frequent outcomes is worthwhile. See the recent work by MacDonald/Braga in NYC. New Orleans is another example, https://journals.sagepub.com/doi/full/10.1177/1098611117709785.

I note the results are important to publish without respect to the results of the analysis. It would be important to publish this work even if it did not show reductions in officer involved deaths.

  1. Front end lit review

For 2.2 racial disparities, this section misses much of the recent work on the topic of officer involved shootings except for Greg Ridgeway’s article. There is a wide array of work that uses other counterfactual approaches to examine implicit bias, see Roland Fryer work (https://www.journals.uchicago.edu/doi/abs/10.1086/701423), or the Wheeler/Worrall papers on Dallas shoot/don’t shoot data (https://journals.sagepub.com/doi/full/10.1177/1525107118759900 & https://journals.sagepub.com/doi/full/10.1177/0011128718756038). These are the observational duel to the experimental lab work by Lois James. Also there are separate papers by Justin Nix (https://www.tandfonline.com/doi/full/10.1080/0735648X.2018.1547269), and Joseph Cesario (https://journals.sagepub.com/doi/10.1177/1948550618775108) that show when using different benchmarks estimates of disparity can vary by quite abit.

For the use of force review (section 2.3), people typically talk about situational factors that influence use of force (in addition to individual/extra-legal and organizational). So you may say “consent decrees don’t/can’t change situational behavior of offenders, so what is the point of writing about it” tis true, but it still should be articulated. To the extent that situational factors are a large determinant of shootings, it may be consent decrees are not the right way to reduce officer deaths then if it is all situational. But, consent decrees may indirectly effect police/citizen interactions (such as via de-escalation or procedural justice training), that could be a mechanism through which fewer officer deaths occur.

Long story short, 2.2 should be updated with more recent work on officer involved shootings and the benchmark problem, and 2.3 should be updated to include discussion of the importance of situational factors in the use of force.

Additional minor stuff:

  • pg 12, killings are not a proxy for use of force (they count as force!)
  • regression equations need some editing. Definitely need a log or exponential function on one of the sides of the equation, and generalized linear models do not have an error term like linear models do. I personally write them as:

log(E[crime_it]) = intercept + B1*X + …..

where E[crime_it] is the expected value of crime at place i and time t (equivalent to lambda in your current formulation).

  • pg 19 monitor misspelled (equation type)

Title: Immigration Enforcement, Crime and Demography: Evidence from the Legal Arizona Workers Act

Well done paper, uses current status quo empirical techniques to estimate the effect employment oversight for illegal immigrant workers had on subsequent crime reductions.

Every critique I thought of was systematically addresses in the paper already. Discussed issues with potential demographic spillovers (biasing estimates because controls may have crime increases). Eliminating states from the pool of placebos with stronger E-Verify support, and later on robustness checks for neighboring states. And using some simple analyses to give decompositions that would be expected due to the decrease in the share of young males.

Minor stuff

  • Light and Miller quote on pg 21 not sure if it is space/dash missing or just kerning problems in LaTex
  • Pg 26, you do the estimates for 08 and 09 separately, but I think you should pool 08-09 together in that table (the graphs do the visual tests for 08 & 09 independently). For example, Violent crimes are negative for both 08 & 09, which in the graphs are not outside the typical bounds for each individual year, but cumulatively that may be quite low (most will have a variable low and then high). This should get you more power I think given the few potential placebo tests. So it should be something like (-0.067 + -0.108) with error bars for violent crimes.
  • I had to stare at your change equation (pg 31) for quite a bit. I get the denominator of equation 2, I’m confused about the numerator (although it is correct). Here is how I did it (using your m1, m2, a, and X).

Pre-Crime = m1*aX + (1 - m1)X = X * (m1*a + 1 - m1)

Post-Crime = m2*aX + (1 - m2)X = X * (m2*a + 1 - m2) #so you can drop the X

% Change = (Post - Pre) / Pre = Post/Pre - 1

At least that is easier for me to wrap my head around. Also should m1 and m2 be the overall share of young adults? Not just limited to immigrants? (Since the estimated crime reduction is for everybody, not just crimes committed by immigrants?)


Title: How do close-circuit television cameras impact crimes and clearances? An evaluation of the Milwaukee Police Department’s public surveillance system

Well written paper. Uses appropriate quasi-experimental methods (matching and diff-in-diff) to evaluate reductions in crimes and potential increases in case clearances.

I have some minor requests on analysis/descriptive stats, but things that can be easily addressed by the authors.

First, I would request a simpler pre-post DiD table. The panel models not only introduce more complications of interpretation, but they are based on asymptotics that I’m not sure if they are met here.

So if you do something like:

              Pre Crime   Post Crime  Difference DiD
Treated      100          80              -20         -30
Control      100        110               10  

It is much more straightforward to interpret, and I provide a stat test and power analysis advice in https://crimesciencejournal.biomedcentral.com/articles/10.1186/s40163-018-0085-5. Your violent crime counts are very low, so I think you would need unrealistic effects (of say 50% reductions) to detect an effect with your group sizes.

You can do the same table for % clearances, and do whatever binomial test of proportions (which will have much more power than the regression equation). And again is simpler to interpret.

Technically doing a poisson regression with an exposure is not the same as modelling the clearance counts with total crimes as an exposure. The predicted PMF can technically go above 1 (so you can have %’s above 100%). It would be more appropriate to use binomial regression, so something like below in Stata:

glm arrests i.Treat i.Post Treat#Post i.Unit, family(binomial crimes) link(logit)

(No xtbinomial unfortunately, maybe can coerce xtlogit with weights to do it, or use meglm since you are doing random effects. I think you should probably do fixed effects here anyway.) I don’t know if it will make a difference in the results here though.

Andy Wheeler


Title: The formation of suspicion: A vignette study

Well done experimental study examining suspiciousness using a vignette approach. There is no power analysis done, but it is a pretty large sample, and only examines first order effects. (With a note about examining a well powered interaction effect described in the analysis section.)

My only big ask is that the analysis should include a dummy variable for the two different sampling frames (e.g. a dummy variable for 1=New York). Everything else is minor/easy stuff.

Minor stuff:

  • How were the vignettes randomized? (They obviously were, balance is really good!)
  • For the discussion, it is important to understand these characteristics that start an interaction because of another KT heuristic bias – anchoring effects. Paul Taylor has some recent work of interest on Dispatch priming that is relevent. (Also Dan Mears had a recent overview paper on biases/heuristics in Journal of Criminal Justice I also think should probably be cited.)
  • For Table 1 I would label the “Dependent Variable” with a more descriptive label (suspiciousness)
  • Also people typically code the variables 0/1 instead of 1/2, it actually won’t impact the analysis here, since it is just a linear shift of +1 (just change the intercept term). The variables of Agency Size, Education, & Experience are coded as ordinal variables though, and they should maybe be included as dummy variables for each category. (I don’t think this will make a big difference though for the main randomized variables of interest though.)

Title: The criminogenic effect of marijuana dispensaries in Denver, Colorado: A microsynthetic controls quasi-experiment and cost-benefit analysis

This work is excellent. It is a timely topic, as many states are still considering whether to legalize and what that will exactly look like.

The work is a replication/extension of a prior study also in Denver, but this is a stronger research design. Using the micro units allows for matching on pre-trends and drawing synthetic controls, which is a stronger design than prior pre/post work at neighborhood level (both in Denver and in LA). Micro units are also more relevant to test direct criminogenic effects of the stores. The authors may be interested in https://www.tandfonline.com/doi/full/10.1080/0735648X.2019.1582351, which is also a stronger research design using matched comparison groups, but is only for medical dispensaries in DC and is a much smaller sample.

Even if one considers that to be a minor contribution (crime increase findings are similar magnitude to Hughes JQ paper), the cost benefit analysis is a really important contribution. It hits on all of the important considerations – that even if the book of costs/benefits is balanced, they are relevant to really different segments of society. So even if tax revenue offsets the books, places may still not want to take on that extra crime burden.

Only two (very minor) suggestions. One, some of the permutation lines are outside of the figure boundaries. Two, I would like a brief ado in the discussion mentioning the trade-off in economic investment making places busier (which fits right into the current discussion of how costs-benefits). Likely if you added 100 ice-cream shops crime might go up due to increased commercial activity – weed has the same potential negative externality – but is not necessarily worse than say opening a bunch of bars or convenience stores. (Same thing is relevant for any BID, https://journals.sagepub.com/doi/full/10.1177/0011128719834559.)


Title: Understanding the Predictors of Street Robbery Hot Spots: A Matched Pairs Analysis and Systematic Social Observation

Note: Reviewed for Justice Quarterly and rejected. Final published version is at Crime & Delinquency (which I was not a reviewer for)

The article uses the case-control method to match high-crime to low-crime street segments, and then examine local land use factors (bars, convenience stores, etc.) as well as the more novel source of physical disorder coded from Google Street View images. The latter is the main motivation for the case-control design, as manual coding prevents one from doing the entire city.

Case-control designs by their nature you cannot manipulate the number of cases you have at your disposal. Thus the majority of such designs typically focus on ONE exposure of interest, and match on any other characteristic that is known to affect the outcome, but is not of direct interest to the study. E.g. if you examined lung cancer given exposure to vehicle emissions, you would need to match controls as to whether they smoked or not. This allows you to assess the exposure of interest with the maximum power given the design limitations, although you can’t say anything about smoking vs not smoking on the outcome.

The authors here match within neighborhoods and on several other local characteristics, but then go onto examine different crime generators (7 factors), as well as the two disorder coded variables. Given that this is a fairly small sample size of 129 matched cases, this is likely a pretty underpowered research design. Logistic regression relies on asymptotic properties, and even with fewer variables it is questionable whether 260 cases is sufficient, see https://www.tandfonline.com/doi/abs/10.1080/02664763.2017.1282441. Thus you get in abstract terms fairly large odds ratios, but are still not significant (e.g. physical disorder has an odds ratio of 1.7, but is insignificant). So you have low power to test all of those coefficients.

I believe a stronger research design would focus on the novel measures (the Google Street View disorder), and match on the crime generator variables. The crime generator factors have been well established in prior research, so that work is a small contribution. The front end focuses on the typical crime generator typology, and lacks any broader “so what” about the disorder measures. (Which can be made, focusing on broken windows theory and the controversy it has caused.)

It would be feasible for authors to match on the crime generators, but it would result in different control cases, so they would need to code additional locations. (If you do match on crime generators, I think it is OK to include 0 crime areas in the pool. Main reason it is sometimes not fair to include 0 crime areas is because they are things like a bridge or a park.)

Minor notes:

  • You cannot interpret the coefficients in causal terms on which you matched in the conclusion. (Top page 23.) It only says the extent to which your matching was successful, not anything about causality. Later on you also attempt to weasel out of causal interpretations (page 26). Saying this is not causality, but otherwise interpreting regression coefficients as if they have any meaning is an act of cognitive dissonance.
  • Given that there is perfect separation between convenience stores and hot spots, the model should have infinite standard errors for that factor. (You have a few coefficients that appear to have explosive standard errors.)
  • I wouldn’t describe as you match on the dependent variable [bottom page 8]. I realize it is confusing mixing propensity score terminology with case-control (although the method is fine). You match cases-to-controls on the independent variables you choose at the onset.
  • page 5, Dan O’Brien in his work has shown that you have super-callers for 311. Which fits right into your point of how coding of images may be better, as a single person can bias the measure at micro places. (This is sort of the opposite of not calling problem you mention.)
  • You may be interested, some folks have tried to automate the scoring part using computer vision, see https://ieeexplore.ieee.org/document/6910072?tp=&arnumber=6910072&url=http:%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D6910072 or https://mikebader.net/projects/canvas-usa/ . George Mohler had a talk at ASC where he used Google’s automated image labeling to identify disorder (like graffiti) in pictures https://cloud.google.com/vision/docs/labels

 

The length it takes from submission to publication

The other day I received a positive comment about my housing demolition paper. It made me laugh abit inside — it felt like I finished that work so long ago it was talking about history. That paper was not so ancient though, I submitted it 8/4/17, went through one round of revision, and I got the email from Jean McGloin for conditional acceptance on 1/16/18. It then came online first a few months later (3/15/18), and is in the current print issue of JRCD, which came out in May 2018.

This ignores the time it takes from conception to finishing a project (we started the project sometime in 2015), but focusing just on the publishing process this is close to the best case scenario for the life-cycle of a paper through peer reviewed journals in criminology & criminal justice. The realist best case scenario typically is:

  • Submission
  • Wait 3 months for peer reviews
  • Get chance to revise-resubmit
  • Wait another 3 months for second round of reviews and editor final decision

So ignoring the time it takes for editors to make decisions and the time for you to turn around edits, you should not bank on a paper being accepted under 6 months. There are exceptions to this, some journals/editors don’t bother with the second three month wait period for reviewers to look at your revisions (which I think is the correct way to do it), and sometimes you will get reviews back faster or slower than three months, but that realist scenario is the norm for most journals in the CJ/Crim field. Things that make this process much slower (multiple rounds of revisions, editors taking time to make decisions, time it takes to make extensive revisions), are much more common than things that can make it go shorter (I’ve only heard myths about a uniform accept on the first round without revisions).

Not having tenure this is something that is on my mind. It is a bit of a rat race trying to publish all the papers expected of you, and due to the length of peer review times you essentially need to have your articles out and under review well before your tenure deadline is up. The six month lag is the best case scenario in which your paper is accepted at the first journal you submit to. The top journals are uber competitive though, so you often have to go through that process multiple times due to rejections.

So to measure that time I took my papers, including those not published, to see what this life-cycle time is. If I only included those that were published it would bias the results to make the time look shorter. Here I measured the time it took from submission of the original article until when I received the email of the paper being accepted or conditionally accepted. So I don’t consider the lag time at the end with copy-editing and publishing online, nor do I consider up front time from conception of the project or writing the paper. Also I include three papers that I am not shopping around anymore, and censored them at the date of the last reject. For articles still under review I censored them at 5/9/18.

So first, for 25 of my papers that have received one editorial decision, here is a graph of the typical number of rejects I get for each paper. A 0 for a paper means it was published at the first journal I submitted to, a 1 means I had one reject and was accepted at the second journal I submitted the paper to, etc. (I use "I" but this includes papers I am co-author on as well.) The Y axis shows the total percentage, and the label for each bar shows the total N.

So the proportion of papers of mine that are accepted on the first round is 28%, and I have a mean of 1.6 rejections per article. This does not take into account censoring (not sure how to for this estimate), and that biases the estimate of rejects per paper downward here, as it includes some articles under review now that will surely be rejected at some point after writing this blog post.

The papers with multiple rejects run the typical gamut of why academic papers are sometimes hard to publish. Null results, a hostile reviewer at multiple places, controversial findings. It also illustrates that peer review is not necessarily a beacon showing the absolute truth of an article. I’m pretty sure everything I’ve published, even papers accepted at the first venue, have had one reviewer with negative comments. You could find reasons to reject the findings of anything I write that has been peer reviewed — same as you can think many of my pre-print articles are correct or useful even though they do not currently have a peer review stamp of approval.

Most of those rejections add about three months to the life-cycle, but some can be fast (these include desk rejections), and some can be slower (rejections on later rounds of revisions). So using those begin times, end times, and taking into account censoring, I can estimate the typical survival time of my papers within the peer-review system when lumping all of those different factors together into the total time. Here is the 1 - survival chart, so can be interpreted as the number of days until publication. This includes 26 papers (one more that has not had a first decision), so this estimate does account for papers that are censored.

The Kaplan-Meier estimate of the median survival times for my papers is 290 days. So if you want a 50% chance of your article being published, you should expect 10 months based on my experience. The data is too sparse to estimate extreme quantiles, but say I want an over 80% probability of an article being published based on this data, how much time do I need? The estimate based on this data is at least 460 days.

Different strategies will produce different outcomes — so my paper survival times may not generalize to yours, but I think that estimate will be pretty reasonable for most folks in Crim/CJ. I try to match papers to journals that I think are the best fit (so I don’t submit everything to Criminology or Justice Quarterly at the first go), so I have a decent percent of papers that land on the first round. If I submitted first round to more mediocre journals overall my survival times would be faster. But even many mid-tiered journals in our field have overall acceptance rates below 10%, nothing I submit I ever think is really a slam dunk sure thing, so I don’t think my overall strategy is the biggest factor. Some of that survival time is my fault and includes time editing the article in between rejects and revise-resubmits, but the vast majority of this is simply waiting on reviewers.

So the sobering truth for those of us without tenure is that based on my estimates you need to have your journal articles out of the door well over a year before you go up for review to really ensure that your work is published. I have a non-trivial chunk of my work (near 20%) that has taken over one and a half years to publish. Folks currently getting their PhD it is the same pressure really, since to land a tenure track job you need to have publications as well. (It is actually one I think reasonable argument to take a longer time writing your dissertation.) And that is just for the publishing part — that does not include actually writing the article or conducting the research. The nature of the system is very much delayed gratification in having your work finally published.

Here is a link to the data on survival times for my papers, as well as the SPSS code to reproduce the analysis.

Roadblocks in Buffalo update (plus more complaints about peer-review!)

I’ve updated the roadblocks in Buffalo manuscript due to a rejection and subsequent critiques. So be prepared about my complaints of the peer-review!

I’ve posted the original manuscript, reviews and a line-by-line response here. This was reviewed at Policing: An International Journal of Police Strategies & Management. I should probably always do this, but I felt compelled to post this review by the comically negative reviewer 1 (worthy of an article on The Allium).

The comment of reviewer 1 that really prompted me to even bother writing a response was the critique of the maps. I spend alot of time on making my figures nice and understandable. I’m all ears if you think they can be improved, but best be prepared for my response if you critique something silly.

So here is the figure in question – spot anything wrong?

The reviewer stated it did not have legend, so it does not meet "GIS standards". The lack of a legend is intentional. When you open google maps do they have a legend? Nope! It is a positive thing to make a graphic simple enough that it does not need a legend. This particular map only has three elements: the outline of Buffalo, the streets, and the points where the roadblocks took place. There is no need to make a little box illustrating these three things – they are obvious. The title is sufficient to know what you are looking at.

Reviewer 2 was more even keeled. The only thing I would consider a large problem in their review was they did not think we matched comparable control areas. If true I agree it is a big deal, but I’m not quite sure why they thought this (sure balance wasn’t perfect, but it is pretty close across a dozen variables). I wouldn’t release the paper if I thought the control areas were not reasonable.

Besides arbitrary complaints about the literature review this is probably the most frustrating thing about peer-reviews. Often you will get a list of two dozens complaints, with most being minor and fixable in a sentence (if not entirely arbitrary), but the article will still be rejected. People have different internal thresholds for what is or is not publishable. I’m on the end that even with worts most of the work I review should still be published (or at least the authors given a chance to respond). Of the 10 papers I’ve reviewed, my record is 5 revise-and-resubmits, 4 conditional accepts, and 1 rejection. One of the revise-and-resubmits I gave a pretty large critique of (in that I didn’t think it was possible to improve the research design), but the other 4 would be easily changed to accept after addressing my concerns. So worst case scenario I’ve given the green light to 8/10 of the manuscripts I’ve reviewed.

Many reviewers are at the other end though. Sometimes comically so, in that given the critiques nothing would ever meet their standards. I might call it the golden-cow peer review standard.

Even though both of my manuscripts have been rejected from PSM, I do like their use of a rubric. This experience makes me wonder what if the reviewers did not give a final reject-accept decision – just the editors took the actual comments and made their own decision. Editors do a version of this currently, but some are known to reject if any of the reviewers give a rejection no matter what the reviewers actually say. It would force the editor to use more discretion if the reviewers themselves did not make the final judgement. It also forces reviewers to be more clear in their critiques. If they are superficial the editor will ignore them, whereas the final accept-reject is easy to take into account even if the review does not state any substantive critiques.

I don’t know if I can easily articulate what I think is a big deal and what isn’t though. I am a quant guy, so the two instances I rejected were for model identification in one and for sample selection biases in the other. So things that could not be changed essentially. I haven’t read a manuscript that was so poor I considered it to be unsalvagable in terms of writing. (I will do a content analysis of reviews I’ve recieved sometime, but almost all complaints about the literature review are arbitrary and shouldn’t be used as reasons for rejection.)

Often times I write abunch of notes on the paper manuscript my first read, and then when I go to write up the critique specifically I edit them out. This often catches silly initial comments of mine, as I better understand the manuscript. Examples of silly comments in the reviews of the roadblock paper are claiming I don’t conduct a pre-post analysis (reviewer 1), and asking for things already stated in the manuscript (reviewer 2 asking for how long the roadblocks were and whether they were "high visibility"). While it is always the case things could be explained more clearly, at some point the reviewer(s) needs to be more careful in their reading of the manuscript. I think my motto of "be specific" helps with this. Being generic helps to conceal silly critiques.

Posting my peer reviews on Publons and a few notes about reviewing

Publons is a service that currates your peer review work. In a perfect world this would be done by publishers – they just forward your reviews with some standard meta-data. This would be useful when one paper is reviewed multiple times, as well as identifying good vs. poor reviewers. I forget where I saw the suggestion recently (maybe on the orgtheory or scatterplot blog), but someone mentioned it would be nice if you submit your paper to a different journal to forward your previous reviews at your discretion. I wouldn’t mind that at all, as oft the best journals will reject for lesser reasons because they can be more selective. (Also I’ve gotten copy-paste same reviews from different journals, even though I have updated the draft to address some of the comments. Forwarding would allow me to address those comments directly before the revise-resubmit decision.)

I’ve posted all of my reviews so far, but they are only public if the paper is eventually accepted. So here you can see my review for the recent JQC article Shooting on the Street: Measuring the Spatial Influence of Physical Features on Gun Violence in a Bounded Street Network by Jie Xu and Elizabeth Griffiths.

I’ve done my fair share of complaining about reviews before, but I don’t think the whole peer-review process is fatally flawed despite its low reliability. People take peer review a bit too seriously at times – but that is a problem for most academics in general. Even if you think your idea is not getting a fair shake, just publish it yourself on your website (or places like SSRN and ArXiv). This of course does not count towards things like tenure – but valuing quantity over quality is another separate problem currently in academia.


In the spirit of do unto others as you would have them do unto you, here are a two main points I try to abide by when I review papers.

  • be as specific as possible in your critique

There is nothing more frustrating than getting a vague critique (the paper has multiple mispellings and grammar issues). A frequent one I have come across (both in reviews of my papers and seeing comments others have made on papers I’ve reviewed) is in the framing of the paper – a.k.a. the literature review. (Which makes sending the paper to multiple journals so frustrating, you will always get more arbitrary framing debates each time with new reviewers.)

So for a few examples:

  • (bad) The literature review is insufficient
  • (good) The literature review skips some important literature, see specifically (X, 2004; Y, 2006; Z, 2007). The description of (A, 2000) is awkward/wrong.
  • (bad) The paper is too long, it can be written in half the length
  • (better) The paper could be shortened, section A.1 can be eliminated in my opinion, and section A.2 can be reduced to one paragraph on X.

Being specific provides a clear path for the author to correct what you think, or at least respond if they disagree. The “you can cut the paper in half” I don’t even know how to respond to effectively, nor the generic complaint about spelling. One I’ve gotten before is “x is an innapropriate measure” with no other detail. This is tricky because I have to guess why you think it is innapropriate, so I have to make your argument for you (mind read) and then respond why I disagree (which obviously I do, or I wouldn’t have included that measure to begin with). So to respond to a critique I at first have to critique my own paper – maybe this reviewer is more brilliant than I thought.

Being specific I also think helps cut down on arbitrary complaints that are arguable.

  • provide clear signals to the editor, both about main critiques and the extent to which they can be addressed

Peer review has two potential motivations, one is a gate-keeper and one is to improve the draft. Often times arbitrary advice by reviewers intended for the latter is not clearly delineated in the review, so it is easily confused for evidence pertinent to the gate-keeper function. I’ve gotten reviews of 20 bullet points or 2,000 words that make it seem like a poor paper due to sheer length of the comment, but the majority are minor points or arbitrary suggestions. Longer reviews actually suggest the paper is better – if there is something clearly wrong you can say it in a much shorter space.

Gabriel Rossman states these different parts of peer review a bit more succintly than me:

You need to adopt a mentality of “is it good how the author did it” rather than “how could this paper be made better”

I think this is a good quip to follow. I might add “don’t sweat the small stuff” to that as well. Some editors will read the paper and reviews and make judgement calls – but some editors just follow the reviewers blindly – so I worry with the 20 bullet point minor review that it unduly influenced a reject decision. I’m happy to respond to the bullets, and happy you took the time, but I’m not happy about you (the reviewer) not giving clear advice to the editor of the extent to which I can address those points.

I still give advice about improving the manuscript, but I try to provide clear signals to the editor about main critiques, and I also will explicitly state whether they can be addressed. The “can be addressed” is not for the person writing the paper – it is for the editor making the decision for whether to revise-and-resubmit! The main critiques in my experience will either entail 2-3 main points (or none at all for some papers). I also typically say when things are minor and put them in a separate section, which editors can pretty much ignore.

Being a quantitative guy, the ones that frustrate me the most are complaints about model specifications. Some are legitimately major complaints, but often times it will be things that are highly unlikely to greatly influence the reported results. Examples are adding/dropping/changing a particular control variable and changes in the caliper used for propensity score matching. Note I’m not saying you shouldn’t ask to see differences, I’m asking that you clearly articulate why your suggestion is preferable and make an appropriate judgement as to whether it is a big problem or a little problem. A similar complaint is what information to include in tables or in the main manuscript or appendix. The author already has the information, so it is minor editing, not a major problem.


While I am here I will end with three additional complaints that don’t fit into anywhere previously in my post. One, multiple rounds of review are totally a waste. So the life cycle of the paper-review should be

paper -> review -> editor decision -> reject/accept
                                          or
                                      revise-resumbit -> new paper + responses to reviews -> editor decision

The way the current system works, I have to submit another review after the new paper has been submitted. I rather the editor take the time to see if the authors sufficiently addressed the original complaints, because as a reviewer I am not an unbiased judge of that. So if I say something is important and you retort it is not, what else do you want me to say in my second review! It is the editors job at that point to arbiter disagreements. This then stops the cycle of multiple rounds of review, which have a very large amount of diminishing returns in improving the draft.

This then leads into my second complaint, generally about keeping a civil tone for reviews. In general I don’t care if a reviewer is a bit gruff in the review – it is not personal. But since reviewers have a second go, when I respond I need to keep an uber deferential tone. I don’t think that is really necessary, and I’d rather original authors have similar latitude to be gruff in responses. Reviewers say stupid things all the time (myself included) and you should be allowed to retort that my critique is stupid! (Of course with reasoning as to why it is stupid…)

Finally, I had one reviewer start recently:

This paper is interesting and very well written…I will not focus on the paper’s merits, but instead restrict myself to areas where it can be improved.

The good is necessary to signal to the editor whether a paper should be published. I’ve started to try in my own reviews to include more of the good (which is definately not the norm) and argue why a paper should be published. You can see in my linked review of the Xu and Griffiths paper by the third round I simply gave arguments why the paper should be published, despite a disagreement about the change-point model they reported on in the paper.