Spatial consistency of shootings and NIJ recid working papers

I have two recent working papers out:

The NIJ forecasting paper is the required submission by NIJ. Gio and I will likely try to turn this into a real paper in the near future. I’d note George Mohler and Michael Porter did the same thing as us, clip the probabilities to under 0.5 to win the fairness competition.

NIJ was interested in “what variables are the most important” – I will need to slate a longer blog post about this in the future, but this generally is not the right way to frame predictive challenges. You do not need real in depth understanding of the underlying system, and many times different effects can be swapped out for one another (e.g. Dawes, 1979).

The paper on shootings in Buffalo is consistent with my blog posts on shootings in NYC (precincts, grid cells). Even though shootings have gone up by quite a bit in Buffalo overall, the spatial distribution is very consistent over time. Appears similar to a recent paper by Jeff Brantingham and company as well.

It is a good use case for the differences in SPPT results when adjusting for multiple comparisons, we get a S index of 0.88 without adjustments, (see below distribution of p-values). These are consistent with random data though, so when doing a false discovery rate correction we have 0 areas below 0.05.

If you look at the maps there are some fuzzy evidence of shifts, but it is quite weak overall. Also one thing I mention here is that even though we have hot spots of shootings, even the hottest grid cells only have 1 shooting a month. Not clear to me if that is sufficient density (if only considering shootings) to really justify a hot spots approach.

References

  • Brantingham, P. J., Carter, J., MacDonald, J., Melde, C., & Mohler, G. (2021). Is the recent surge in violence in American cities due to contagion?. Journal of Criminal Justice, 76, 101848.
  • Circo, G., & Wheeler, A. (2021). National Institute of Justice Recidivism Forecasting Challenge Team “MCHawks” Performance Analysis. CrimRxiv. https://doi.org/10.21428/cb6ab371.9aa2c75a
  • Dawes, R. M. (1979). The robust beauty of improper linear models in decision making. American Psychologist, 34(7), 571.
  • Drake, G., Wheeler, A., Kim, D.-Y., Phillips, S. W., & Mendolera, K. (2021). The Impact of COVID-19 on the Spatial Distribution of Shooting Violence in Buffalo, NY. CrimRxiv. https://doi.org/10.21428/cb6ab371.e187aede
  • Mohler, G., & Porter, M. D. (2021). A note on the multiplicative fairness score in the NIJ recidivism forecasting challenge. Crime Science, 10(1), 1-5.

CrimRxiv, Alt-Journal Contributions, and Mike Maltz’s Retrospective

As I’m sure followers of mine know, I am a big proponent of posting pre-prints. Spearheaded by Scott Jacques, he has started a specifically criminology focused pre-print server title CrimRxiv. It is still in beta but anyone can contribute a paper if they want.

One of the things me and Scott have been jamming about is how to leverage crimrxiv to make a journal that not only takes advantage of all the goodies on the internet, such as being able to embed interactive graphics or other rich media directly in a journal articles. But to really widen the scope of what ‘counts’ in terms of scholarly contribution. Why can’t things like a cool app, or a really good video lecture you edited, or a blog post illustrating code be put on the same level with journal articles?

Part of the reason I am writing this blog post is that I saw Michael Maltz recently publish a retrospective on his career on Academia.edu. This isn’t a typical journal article, but despite that there is no reason why you shouldn’t share such pieces. So I was able to convince Mike to post A Retrospective Look at My Professional Life to crimrxiv. When he first posted it on Academia.edu here was my response on how Mike (despite never having crossed paths) has influenced my career.


Hi Michael and thank you for sharing,

I’ve followed your work since a grad student at Albany. I initially got hooked on data viz based on Tufte’s book. When I looked for examples of criminologists discussing data viz you were the only one I found. That was sometime around 2010, so you had that chapter in the handbook of quantitative crim. You also had another article about drawing glyphs to illustrate life course transitions I was familiar with.

When I finished my classes at SUNY, I then worked at Troy as a crime analyst while finishing my dissertation. I doubt any of the coffee shops were the same from your time, but I did like walking over to Famous hotdogs for lunch every now and then.

Most of my work at the PD was making time series graphs and maps. No regression, so most of my stats training was not particularly useful. Even my mapping course I took focused on areal data analysis was not terribly relevant.

I tried to do similar projects to your glyph life-courses with interval censored crime data, but I was never really successful with that, they always ended up being too complicated with even moderately large crime datasets, see https://andrewpwheeler.com/2013/02/28/interval-graph-for-viz-temporal-overlap-in-crime-events/ and https://andrewpwheeler.com/2014/10/02/stacking-intervals/ for my attempts.

What was much more helpful was simply doing monitoring metrics over time, simple running means, and then I just inverted the PDF of the Poisson to give error bars, e.g. https://andrewpwheeler.com/2016/06/23/weekly-and-monthly-graphs-for-monitoring-crime-patterns-spss/. Then cases that were outside the error bands signified an anomalous pattern. In Troy there was an arrest of a single prolific person breaking into cars, and the trend went from a creeping 10 year high to a 10 year low instantly in those graphs.

So there again we have your work on the Poisson distribution and operations research in that JQC article. Also sometime in there I saw a comment you made on Andrew Gelman’s blog pointing to your work with error bands for BJS. Took that ‘fan chart’ idea later on and provided error bands for city level and USA level homicide trends, e.g. https://apwheele.github.io/MathPosts/FanChart_NewOrleans.html. Most of popular discussion of large scale crime trends is misguided over-interpreting short term noise in my opinion.

So all my degrees are in criminal justice, but I have been focusing more on linear programming over time borrowing from operations researchers as well, https://andrewpwheeler.com/2020/05/29/an-intro-to-linear-programming-for-criminologists/. I’ve found that taking outputs from a predictive model and then applying a decision analysis to specifically articulate strategies CJ agencies should take is much more fruitful than the typical way academic research is done.

Thank you again for sharing your story and best, Andy Wheeler

Why I publish preprints

I encourage peers to publish preprint articles — journal articles before they go through the whole peer review process and are published. It isn’t normative in our field, and I’ve gotten some pushback from colleagues, so figured I would put on paper why I think it is a good idea. In short, the benefits (increased exposure) outweigh the minimal costs of doing so.

The good — getting your work out there

The main benefit of posting preprints is to get your work more exposure. This occurs in two ways: one is that traditional peer-review work is often behind paywalls. This prevents the majority of non-academics from accessing your work. This point about paywalls applies just the same to preventing other academics from reading your work in some cases. So while the prior blog post I linked by Laura Huey notes that you can get access to some journals through your local library, it takes several steps. Adding in steps you are basically losing out on some folks who don’t want to spend the time. Even through my university it is not uncommon for me to not be able to access a journal article. I can technically take the step of getting the article through inter-library loan, but that takes more time. Time I am not going to spend unless I really want to see the contents of the article.

This I consider a minor benefit. Ultimately if you want your academic work to be more influential in the field you need to write about your work in non-academic outlets (like magazines and newspapers) and present it directly to CJ practitioner audiences. But there are a few CJ folks who read journal articles you are missing, as well as a few academics who are missing your work because of that paywall.

A bigger benefit is actually that you get your work out much quicker. The academic publishing cycle makes it impossible to publish your work in a timely fashion. If you are lucky, once your paper is finished, it will be published in six months. More realistically it will be a year before it is published online in our field (my linked article only considers when it is accepted, tack on another month or two to go through copy-editing).

Honestly, I publish preprints because I get really frustrated with waiting on peer review. No offense to my peers, but I do good work that I want others to read — I do not need a stamp from three anonymous reviewers to validate my work. I would need to do an experiment to know for sure (having a preprint might displace some views/downloads from the published version) but I believe the earlier and open versions on average doubles the amount of exposure my papers would have had compared to just publishing in traditional journals. It is likely a much different audience than traditional academic crim people, but that is a good thing.

But even without that extra exposure I would still post preprints, because it makes me happy to self-publish my work when it is at the finish line, in what can be a miserably long and very much delayed gratification process otherwise.

The potential downsides

Besides the actual time cost of posting a preprint (next section I will detail that more precisely, it isn’t much work), I will go through several common arguments why posting preprints are a bad idea. I don’t believe they carry much weight, and have not personally experienced any of them.

What if I am wrong — Typically I only post papers either when I am doing a talk, or when it is ready to go out for peer review. So I don’t encourage posting really early versions of work. While even at this stage there is never any guarantee you did not make a big mistake (I make mistakes all the time!), the sky will not fall down if you post a preprint that is wrong. Just take it down if you feel it is a net negative to the scholarly literature (which is very hard to do — the results of hypothesis tests do not make the work a net positive/negative). If you think it is good enough to send out for peer review it is definitely at the stage where you can share the preprint.

What if the content changes after peer review — My experience with peer review is mostly pedantic stuff — lit. review/framing complaints, do some robustness checks for analysis, beef up the discussion. I have never had a substantive interpretation change after peer-review. Even if you did, you can just update the preprint with the new results. While this could be bad (an early finding gets picked up that is later invalidated) this is again something very rare and a risk I am willing to take.

Note peer review is not infallible, and so hedging that peer review will catch your mistakes is mostly a false expectation. Peer review does not spin your work into gold, you have to do that yourself.

My ideas may get scooped — This I have never personally had happen to me. Posting a preprint can actually prevent this in terms of more direct plagiarism, as you have a time-stamped example of your work. In terms of someone taking your idea and rewriting it, this is a potential risk (same risk if you present at a conference) — really only applicable for folks working on secondary data analysis. Having the preprint the other person should at least cite your work, but sorry, either presenting some work or posting a preprint does not give you sole ownership of an idea.

Journals will view preprints negatively — Or journals do not allow preprints. I haven’t come across a journal in our field that forbids preprints. I’ve had one reviewer note (out of likely 100+ at this point) that the pre-print was posted as a negative (suggesting I was double publishing or plagiarizing my own work). An editor that actually reads reviews should know that is not a substantive critique. That was likely just a dinosaur reviewer that wasn’t familiar with the idea of preprints (and they gave an overall positive review in that one case, so did not get the paper axed). If you are concerned about this, just email the editor for feedback, but I’ve never had a problem from editors.

Peer reviewers will know who I am — This I admit is a known unknown. So peer review in our crim/cj journals are mostly doubly blind (most geography and statistic journals I have reviewed for are not, I know who the authors are). If you presented the work at a conference you have already given up anonymity, and also the field is small enough a good chunk of work the reviewers can guess who the author is anyway. So your anonymity is often a moot point at the peer review stage anyway.

So I don’t know how much reviewers are biased if they know who you are (it can work both ways, if you get a friend they may be more apt to give a nicer review). It likely can make a small difference at the margins, but again I personally don’t think the minor risk/cost outweighs the benefits.

These negatives are no doubt real, but again I personally find them minor enough risks to not outweigh the benefits of posting preprints.

The not hard work of actually posting preprints

All posting a preprint involves is uploading a PDF file of your work to either your website or a public hosting service. My workflow currently I have my different components of a journal article in several word documents (I don’t use LaTex very often). (Word doesn’t work so well when it has one big file, especially with many pictures.) So then I export those components to PDF files, and stitch them together using a freeware tool PDFtk. It has a GUI and command line, so I just have a bat file in my paper directory that lists something like:

pdftk.exe TitlePage.pdf MainPaper.pdf TablesGraphs.pdf Appendix.pdf cat output CombinedPaper.pdf

So just a double click to update the combined pdf when I edit the different components.

Public hosting services to post preprints I have used in the past are Academia.edu, SSRN, and SoxArXiv, although again you could just post the PDF on your webpage (and Google Scholar will eventually pick it up). I use SocArXiv now, as SSRN currently makes you sign up for an account to download PDFs (again a hurdle, the same as a going through inter-library loan). Academia.edu also makes you sign up for an account, and has weird terms of service.

Here is an example paper of mine on SocArXiv. (Note the total downloads, most of my published journal articles have fewer than half that many downloads.) SocArXiv also does not bother my co-authors to create an account when I upload a paper. If we had a more criminal justice focused depository I would use that, but SocArXiv is fine.

There are other components of open science I should write about — such as replication materials/sharing data, and open peer reviewed journals, but I will leave those to another blog post. Posting preprints takes very little extra work compared to what academics are currently doing, so I hope more people in our field start doing it.