Roadblocks in Buffalo update (plus more complaints about peer-review!)

I’ve updated the roadblocks in Buffalo manuscript due to a rejection and subsequent critiques. So be prepared about my complaints of the peer-review!

I’ve posted the original manuscript, reviews and a line-by-line response here. This was reviewed at Policing: An International Journal of Police Strategies & Management. I should probably always do this, but I felt compelled to post this review by the comically negative reviewer 1 (worthy of an article on The Allium).

The comment of reviewer 1 that really prompted me to even bother writing a response was the critique of the maps. I spend alot of time on making my figures nice and understandable. I’m all ears if you think they can be improved, but best be prepared for my response if you critique something silly.

So here is the figure in question – spot anything wrong?

The reviewer stated it did not have legend, so it does not meet "GIS standards". The lack of a legend is intentional. When you open google maps do they have a legend? Nope! It is a positive thing to make a graphic simple enough that it does not need a legend. This particular map only has three elements: the outline of Buffalo, the streets, and the points where the roadblocks took place. There is no need to make a little box illustrating these three things – they are obvious. The title is sufficient to know what you are looking at.

Reviewer 2 was more even keeled. The only thing I would consider a large problem in their review was they did not think we matched comparable control areas. If true I agree it is a big deal, but I’m not quite sure why they thought this (sure balance wasn’t perfect, but it is pretty close across a dozen variables). I wouldn’t release the paper if I thought the control areas were not reasonable.

Besides arbitrary complaints about the literature review this is probably the most frustrating thing about peer-reviews. Often you will get a list of two dozens complaints, with most being minor and fixable in a sentence (if not entirely arbitrary), but the article will still be rejected. People have different internal thresholds for what is or is not publishable. I’m on the end that even with worts most of the work I review should still be published (or at least the authors given a chance to respond). Of the 10 papers I’ve reviewed, my record is 5 revise-and-resubmits, 4 conditional accepts, and 1 rejection. One of the revise-and-resubmits I gave a pretty large critique of (in that I didn’t think it was possible to improve the research design), but the other 4 would be easily changed to accept after addressing my concerns. So worst case scenario I’ve given the green light to 8/10 of the manuscripts I’ve reviewed.

Many reviewers are at the other end though. Sometimes comically so, in that given the critiques nothing would ever meet their standards. I might call it the golden-cow peer review standard.

Even though both of my manuscripts have been rejected from PSM, I do like their use of a rubric. This experience makes me wonder what if the reviewers did not give a final reject-accept decision – just the editors took the actual comments and made their own decision. Editors do a version of this currently, but some are known to reject if any of the reviewers give a rejection no matter what the reviewers actually say. It would force the editor to use more discretion if the reviewers themselves did not make the final judgement. It also forces reviewers to be more clear in their critiques. If they are superficial the editor will ignore them, whereas the final accept-reject is easy to take into account even if the review does not state any substantive critiques.

I don’t know if I can easily articulate what I think is a big deal and what isn’t though. I am a quant guy, so the two instances I rejected were for model identification in one and for sample selection biases in the other. So things that could not be changed essentially. I haven’t read a manuscript that was so poor I considered it to be unsalvagable in terms of writing. (I will do a content analysis of reviews I’ve recieved sometime, but almost all complaints about the literature review are arbitrary and shouldn’t be used as reasons for rejection.)

Often times I write abunch of notes on the paper manuscript my first read, and then when I go to write up the critique specifically I edit them out. This often catches silly initial comments of mine, as I better understand the manuscript. Examples of silly comments in the reviews of the roadblock paper are claiming I don’t conduct a pre-post analysis (reviewer 1), and asking for things already stated in the manuscript (reviewer 2 asking for how long the roadblocks were and whether they were "high visibility"). While it is always the case things could be explained more clearly, at some point the reviewer(s) needs to be more careful in their reading of the manuscript. I think my motto of "be specific" helps with this. Being generic helps to conceal silly critiques.

Preprint – A Quasi-Experimental Evaluation Using Roadblocks and Automatic License Plate Readers to Reduce Crime in Buffalo, NY

I have a new preprint article posted on SSRN – A Quasi-Experimental Evaluation Using Roadblocks and Automatic License Plate Readers to Reduce Crime in Buffalo, NY. This is some work I have been conducting with Scott Phillips out at SUNY Buffalo (as well as Dae-Young Kim, although he is not on this paper).

Here is the abstract:

Purpose: To evaluate the effectiveness of a hot spots policing strategy: using automated license plate readers at roadblocks.

Design: Different roadblock locations were chosen by the Buffalo Police Department every day over a two month period. We use propensity score matching to identify a set of control locations based on prior counts of crime and demographic factors before the intervention took place. We then evaluate the reductions in Part 1 crimes, calls for service, and traffic accidents at roadblock locations compared to control locations.

Findings: We find modest reductions in Part 1 violent crimes (10 over all roadblock locations and over the two months) using t-tests of mean differences. We find a 20% reduction in traffic accidents using fixed effects negative binomial regression models. Both results are sensitive to the model used though, and the fixed effects models predict increases in crimes due to the intervention.

Research Limitations: The main limitations are the quasi-experimental nature of the intervention, the short length of the intervention, and that many micro places have low baseline counts of crime.

Originality/Value: This adds to literature on hot spots policing – in particular on the use of automated license plate readers and traffic enforcement at hot spots of crime. While the results are mixed, it provides some evidence that the intervention has potential to reduce crime.

And here is one figure from the paper, showing how street units are defined, and given the intersection the road block was stationed on how we determined the treated street units:

Feedback is always welcome!

Testing day-of-week crime randomness paper published

My paper, Testing Serial Crime Events for Randomness in Day of Week Patterns with Small Samples, was recently published in the Journal of Investigative Pyschology and Offender Profiling. Here is the pre-print version on SSRN if you can’t get access to that journal.

The main idea behind the paper was if you had a series of a few crime events that you know are linked to the same offender, can we tell if those patterns are random with respect to the day of the week? We know spatial patterns are often clustered, but police responses such as surveillance are conditioned not only on a spatial location, but take place during certain days and times. I wanted to know when I could go to command staff and say, yeah you should BOLO on Saturday. Or just as importantly say in response, no the observed patterns could easily happen if the offender were just randomly picking days.

In the paper I show that if you have only 3 events and they all occur on the same day, you would reject the null that crimes have an equal probability across all seven days of the week at a p-value of less than 0.05. I also show that the exact test I propose has pretty good power for as few as 8 events in the series. So if you have, say 10 events and you fail to reject the null that each day of the week has equal probability of being chosen, it is pretty good evidence that a police response should not have any preference for a particular day.

To illustrate how one would use the test, I have a simple spreadsheet posted here (in the zip file has my other SPSS code to reproduce the results in the paper) in which you can type in the days of the week that the crimes are occurring on, and it calculates the hypothesis test.

The spreadsheet contains both the G-test and Kuiper’s V test. If you don’t read the paper and understand the difference, just use the G-test and ignore the Kuiper’s V results. For crime analysts, this is basically the minimum of what you need to know.

For analysts who are more into the nitty gritty, I also have some R code that is a bit more flexible, and calculates the exact test for varying numbers of bins and provides some code to conduct power analysis. So you can either download the code from GitHub and insert it to define the functions, or simply copy-paste it into the console. The only library dependency is the partitions library, so make sure that is installed before following along.

So if you have downloaded the code, you can use something like below to insert the functions and load the partitions library.

mydir <- "C:\\Users\\andrew.wheeler\\Dropbox\\Documents\\BLOG\\ExactTest_Weekdays"

Now, say you had a series of crimes that had 4 on Saturday, 3 on Tuesday, and 1 on Sunday. You can test this for randomness by simply using:

crime <- c(1,0,3,0,0,0,4)
res <- SmallSampTest(d=crime)

Which prints at the console:

Small Sample Test Object 
Test Type is G 
Statistic is 15.5455263389754 
p-value is:  0.0182662  
Data are:  1 0 3 0 0 0 4 
Null probabilities are:  0.14 0.14 0.14 0.14 0.14 0.14 0.14 
Total permutations are:  3003  

This defaults to using the likelihood ratio G-test, but you can also use Kuiper’s V, the chi-square test, or the Komolgrov-Smirnov test. Also you can change the null hypothesis to not equal probability in the bins. I default to the G-test in my paper because it is more powerful than the more typical chi-square after 8 crimes for 7 day-of-week bins, but equal in power to the chi-square for smaller sample sizes. So to do the chi-square test on the same data, use:

resChi <- SmallSampTest(d=crime, type="Chi")
chisq.test(crime) #for comparison to base R 
chisq.test(crime, simulate.p.value = TRUE, B = 10000)

Which you can see the test statistic mimics base R’s chisq.test, and the p-value is slightly higher than the asymptotic p-value (the exact test should always have a higher p-value than the asympotic distribution, and here it is lower than the simulated p-value). This situation the simulation approach would have been fine. I prefer the exact approach when feasible though, because it is exact, and you don’t need to worry about convergence for the simulation (which most everyone simply picks a large number and hopes for the best).

I’ve also made some code that allows for easy evaluation of the power of the exact test. Coding wise it was easiest to simply use the original object created with the test, so I know it invites post-hoc power analysis – forgive me for my slothness in coding practices. So say you wanted to do apriori power analysis with the Kuiper’s V test for 10 bins and 15 observations (so over 1.3 million permutations, i.e. n <- 15; m <- 10; choose(n+m-1,m-1)). You can simply make an original object (with any observed values across the bins).

test10_data <- c(15,rep(0,9))
test10_perm <- SmallSampTest(d=test10_data, type="KS")
#takes around a minute

The default null is equal probability across the bins, and to do a power analysis you have to specify an alternative. Lets say for the alternative there is equal probability in 5 of the bins, and zero probability in the other 5. (Most of the work is done in making the original permutation object, the power analysis is quite fast, hence why I coded it to work this way.)

p_alt <- c(rep(1/5,5),rep(0,5))
Pow_test <- PowAlt(SST=test10_perm,p_alt=p_alt)

This prints out at the console:

Power for Small Sample Test 
Test statistic is: KS  
Power is: 0.1822815  
Null is: 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1  
Alt is: 0.2 0.2 0.2 0.2 0.2   0   0   0   0   0  
Alpha is: 0.05  
Number of Bins: 10  
Number of Observations: 15  

So for this alternative there is quite low power, only 0.18. But if we change it to only have mass in four of the bins, the power goes way up to over 0.99.

> p_alt2  Pow_test2  Pow_test2
Power for Small Sample Test 
Test statistic is: KS  
Power is: 0.9902265  
Null is: 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1  
Alt is: 0.25 0.25 0.25 0.25   0   0   0   0   0   0  
Alpha is: 0.05  
Number of Bins: 10  
Number of Observations: 15 

So this shows how the exact test R code can be extended beyond just 7 day-of-week bins. I have not done really any exploration of the power of the KS test or differing numbers of bins though.

New working paper: What We Can Learn from Small Units of Analysis

I’ve posted a new working paper, What We Can Learn from Small Units of Analysis to SSRN. This is a derivative of my dissertation (by the same title). Below is the abstract:

This article provides motivation for examining small geographic units of analysis based on a causal logic framework. Local, spatial, and contextual effects are confounded when using larger units of analysis, as well as treatment effect heterogeneity. I relate these types of confounds to all types of aggregation problems, including temporal aggregation, and aggregation of dependent or explanatory variables. Unlike prior literature critiquing the use of aggregate level data, examples are provided where aggregation is unlikely to hinder the goals of the particular research design, and how heterogeneity of measures in smaller units of analysis is not a sufficient motivation to examine small geographic units. Examples of these confounds are presented using simulation with a dataset of crime at micro place street units (i.e. street segments and intersections) in Washington, D.C.

As always, if you have comments or critiques let me know.

Cartography and GIS special issue on Crime Mapping

My paper, Visualization techniques for journey to crime flow data, has been recently published in a special issue in CaGIS on crime mapping. Always feel free to email me for off-prints of published papers, but the pre-print of this one I posted on SSRN as well.

There is an annoying error that crept into the paper, in that the footnote linking to the results to replicate the maps and graphs says “REDACTED FOR ANONYMITY” – which is my fault for not pointing it out to the copy-editor. The files are available here. They are certainly not easy to walk through, so if you want help replicating any of the maps for your own data and can’t figure out my code feel free to send me an email. I would like to make an R package to make maps like below eventually, but that is just not going to happen in the forseeable future.

New paper: Replicating Group-Based Trajectory Models of Crime at Micro-Places in Albany, NY

I posted a pre-print of a paper myself, Rob Worden and Sarah McLean have finished, Replicating Group-Based Trajectory Models of Crime at Micro-Places in Albany, NY. This is part of the work of the Finn Institute in collaboration with the Albany police department, and the goal of the project was to identify micro places (street segments and intersections) that showed long term patterns of being high crime places.

The structured abstract is below:

Objectives: Replicate two previous studies of temporal crime trends at the street block level. We replicate the general approach of group-based trajectory modelling of crimes at micro-places originally taken by Weisburd, Bushway, Lum and Yan (2004) and replicated by Curman, Andresen, and Brantingham (2014). We examine patterns in a city of a different character (Albany, NY) than those previously examined (Seattle and Vancouver) and so contribute to the generalizability of previous findings.

Methods: Crimes between 2000 through 2013 were used to identify different trajectory groups at street segments and intersections. Zero-inflated Poisson regression models are used to identify the trajectories. Pin maps, Ripley’s K and neighbor transition matrices are used to show the spatial patterning of the trajectory groups.

Results: The trajectory solution with eight classes is selected based on several model selection criteria. The trajectory of each those groups follow the overall citywide decline, and are only separated by the mean level of crime. Spatial analysis shows that higher crime trajectory groups are more likely to be nearby one another, potentially suggesting a diffusion process.

Conclusions: Our work adds additional support to that of others who have found tight coupling of crime at micro-places. We find that the clustering of trajectories identified a set of street units that disproportionately contributed to the total level of crime citywide in Albany, consistent with previous research. However, the temporal trends over time in Albany differed from those exhibited in previous work in Seattle but were consistent with patterns in Vancouver.

And here is one of the figures, a drawing of the individual trajectory groupings over the 14 year period. As always, if you have any comments on the paper feel free to shoot me an email.

New paper: Tables and graphs for monitoring temporal crime patterns

I’ve uploaded a new pre-print, Tables and graphs for monitoring temporal crime patterns. The paper basically has three parts, which I will briefly recap here:

  • percent change is a bad metric
  • there are data viz. principles to constructing nicer tables
  • graphs >> tables for monitoring trends

Percent change encourages chasing the noise

It is tacitly understood that percent change when the baseline is small can fluctuate wildly – but how about when the baseline average is higher? If the average of crime was around 100 what would you guess would be a significant swing in terms of percent change? Using simulations I estimate for a 1 in 100 false positive rate you need an over 40% increase (yikes)! I’ve seen people make a big deal about much smaller changes with much smaller baseline averages.

I propose an alternative metric based on the Poisson distribution,

2*( SQRT(Post) - SQRT(Pre) )

This approximately follows a normal distribution if the data is Poisson distributed. I show with actual crime data it behaves pretty well, and using a value of 3 to flag significant values has a pretty reasonable rate of flags when monitoring weekly time series for five different crimes.

Tables are visualizations too!

Instead of recapping all the points I make in this section, I will just show an example. The top table is from an award winning statistical report by the IACA. The latter is my remake.

Graphs >> Tables

I understand tables are necessary for reporting of statistics to accounting agencies, but they are not as effective as graphs to monitor changes in time series. Here is an example, a seasonal chart of burglaries per month. The light grey lines are years from 04 through 2013. I highlight some outlier years in the chart as well. It is easy to see whether new data is an outlier compared to old data in these charts.

I have another example of monitoring weekly statistics in the paper, and with some smoothing in the chart you can easily see some interesting crime waves that you would never comprehend by looking at a single number in a table.

As always, if you have comments on the paper I am all ears.

Dissertation Draft

I figured I would post the current draft of my dissertation. It is being evaluated by the committee members now, and so why not have everyone evaluate it! Also, since I am on the job market there is proof I am close to finished.

Here is a pdf of the draft. This draft is not guaranteed to stay the same as I find errors, but at this point changes should (hopefully) be minimal. As always I appreciate any feedback. The title is What we can learn from small units of analysis, and below is the abstract.

The dissertation is aimed at advancing knowledge of the correlates of crime at small geographic units of analysis. I begin by detailing what motivates examining crime at small places, and focus on how aggregation creates confounds that limit causal inference. Local and spatial effects are confounded when using aggregate units, so to the extent the researcher wishes to distinguish between these two types of effects it should guide what unit of analysis is chosen. To illustrate these differences, I examine local, spatial and contextual effects for bars, broken windows and crime using publicly available data from Washington, D.C.



My Critique of Slope Graphs paper was recently rejected as a short article from The American Statistician. I’ve uploaded the new paper to SSRN with the suggested critiques and my responses to them (posted here).

I ended up bugging Nick Cox for some pre peer-review feedback and he actually agreed! (A positive externality of participating at the Cross Validated Q/A site.) The main outcome of Nick’s review was a considerably shorter paper. The reviews from TAS were pretty mild (and totally reasonable), but devoid of anything positive. The main damning aspect of the paper is that the reviewers (including Cox) just did not find the paper very interesting or well motivated.

My main motivation was the recent examples of slope graphs in the popular media, most of which are poor statistical graphics (and are much better suited as a scatterplot). The most obvious being Cairo’s book cover, which I thought in and of itself deserved a critique – but maybe I should not have been so surprised about a poor statistical graphic on the cover. This I will not argue is a rather weak motivation, but one I felt was warranted given the figures praising the use of slopegraphs in inappropriate situations.

In the future I may consider adding in more examples of slopegraphs besides the cover of Albert Cairo’s book. In my collection of examples I may pull out a few more examples from the popular media and popular data viz books (besides Cairo’s there are blog post examples from Ben Fry and Andy Kirk – haven’t read their books so I’m unsure if they are within them.) For a preview, pretty much all of the examples I consider bad except for Tufte’s original ones. Part of the reason I did not do this is that I wrote the paper as a short article for TAS — and I figured adding these examples would make it too long.

I really had no plans to submit it anywhere besides TAS, so this may sit as just a pre-print for now. Let me know if you think it may be within the scope of another journal that I may consider.

A critique of slopegraphs

I’ve recently posted a pre-print of an article, A critique of slopegraphs, on SSRN. In the paper I provide a critique of the use of slopegraphs and present alternative graphics to use in their place, using the slopegraph displayed on the cover of Albert Cairo’s The Functional Art as motivation – below is my rendering of that slopegraph.

Initially I wanted to write a blog post about the topic – but I decided to give all of the examples and full discussion I wanted it would be far too long. So I ended up writing a (not so short) paper. Below is the abstract, and I will try to summarize it in a few quick points (but obviously I encourage you to read the full paper!)

Slopegraphs are a popular form of graphic depicting change along two independent axes by means of a connecting line. The critique here lists several reasons why interpreting the slopes may be misleading and suggests alternative plots depending on the goals of the visualization. Guidelines as to appropriate situations to use slopegraphs are discussed.

So the three main points I want to make are:

  • The slope is not the main value of interest in a slopegraph. The slope is itself an arbitrary function of how far away the axes are placed from one another.
  • Slopegraphs are poor for judging correlation and seeing a functional relationship between the two values. Scatterplots or just graphing the change directly are often better choices.
  • Slopegraphs are difficult to judge when the variance between axes changes (which produce either diverging or converging slopes) and when the relationship is negative (which produces many crossings in the slopes).

I’ve catalogued a collection of articles, examples and other critiques of slopegraphs at this location. Much of what I say is redundant with critiques of slopegraphs already posted in other blogs on the internet.

I’m pretty sure my criminal justice colleagues will not be interested in the content of the paper, so I may need to cold email someone to review it for me before I send it off. So if you have comments or a critique of the paper I would love to hear it!