Paper: The Effect of 311 Calls for Service on Crime in D.C. at Microplaces published

My paper, The Effect of 311 Calls for Service on Crime in D.C. at Microplaces, was published online first at Crime & Delinquency. Here is the link to the published paper. If you do not have access to a library where you can get the paper always feel free to email and I will send an off-print. But I also have the pre-print posted on SSRN. Often the only difference between my pre-prints and the finished version is the published paper is shorter!

As a note, I’ve also posted all of the data and code to replicate my findings. The note is unfortunately buried at the end of the paper, instead of the beginning.

This was the first paper published from my dissertation. I have pre-prints out for two others, What we can learn from small units and Local and Spatial Effect of Bars. Hopefully you will see those two in print the near future as well!

New working paper – Monitoring volatile homicide trends across U.S. cities

I have a new working paper out — Monitoring volatile homicide trends across U.S. cities, with one of my colleagues Tomislav Kovandzic. You can grab the pre-print on SSRN, and the paper has links to code to replicate the charts and models in the paper.

Here I look at homicide rates in U.S. cities and use funnel charts and fan charts to show the typical volatility in homicide rates between cities and within cities over time. As I’ve written previously, I think much of the media narrative around homicide increases are hyperbolic and often cherry pick reasons why they think homicides are going up.

I’ve shown examples of funnel charts on this blog before, so I will use a different image as the tease. To generate the prediction intervals for fan charts I estimate binomial random effect models. Below is an example for New Orleans (homicide rate per 100,000 population):

As always, if you have feedback feel free to send me an email.

SPSS Statistics for Data Analysis and Visualization – book chapter on Geospatial Analytics

A book I made contributions to, SPSS Statistics for Data Analysis and Visualization, is currently out. Keith and Jesus are the main authors of the book, but I contributed one chapter and Jon Peck contributed a few.

The book is a guided tour through many of the advanced statistical procedures and data visualizations in SPSS. Jon also contributed a few chapters towards using syntax, python, and using extension commands. It is a very friendly walkthrough, and we have all contributed data files for you to be able to follow along through the chapters.

So there is alot of content, but I wanted to give a more specific details on my chapter, as I think they will be of greater interest to crime analysts and criminologists. I provide two case studies, one of using geospatial association rules to identify areas of high crime plus high 311 disorder complaints in DC (using data from my dissertation). The second I give an example of spatio-temporal forecasting of ShotSpotter data at the weekly level in DC using both prior shootings as well as other prior Part 1 crimes.

Geospatial Association Rules

The geospatial association rules is a technique for high dimensional contingency tables to find particular combinations among categories that are more prevalent. I show examples of finding that thefts from motor vehicles tend to be associated in places nearby graffiti incidents.

And that assaults tend to be around locations with more garbage complaints (and as you can see each has a very different spatial patterning).

I consider this to be a useful exploratory data analysis type technique. It is very similar in application to conjunctive analysis, that has prior very similar crime mapping applications in risk terrain modeling (see Caplan et al., 2017).

Spatio-Temporal Prediction

The second example case study is forecasting weekly shootings in fairly small areas (500 meter grid cells) using ShotSpotter data in DC. I also use the prior weeks reported Part 1 crime types (Assault, Burglary, Robbery, etc.), so it is similar to the leading indicators forecasting model advocated by Wilpen Gorr and colleagues. I show that prior shootings predict future shootings up to 5 lags prior (so over a month), and that the prior crimes do have an effect on future shootings (e.g. robberies in the prior week contribute to more shootings in the subsequent week).

If you have questions about the analyses, or are a crime analyst and want to apply similar techniques to your data always feel free to send me an email.

New working paper: Choosing Representatives to Deliver the Message in a Group Violence Intervention

I have a new preprint up on SSRN, Choosing Representatives to Deliver the Message in a Group Violence Intervention. This is what I will be presenting at ACJS next Friday the 24th. Here is the abstract:

Objectives: The group based violence intervention model is predicated on the assumption that individuals who are delivered the deterrence message spread the message to the remaining group members. We focus on the problem of who should be given the initial message to maximize the reach of the message within the group.

Methods: We use social network analysis to create an algorithm to prioritize individuals to deliver the message. Using a sample of twelve gangs in four different cities, we identify the number of members in the dominant set. The edges in the gang networks are defined by being arrested or stopped together in the prior three years. In eight of the gangs we calculate the reach of observed call-ins, and compare these with the sets defined by our algorithm. In four of the gangs we calculate the reach for a strategy that only calls-in members under supervision.

Results: The message only needs to be delivered to around 1/3 of the members to reach 100% of the group. Using simulations we show our algorithm identifies the minimal dominant set in the majority of networks. The observed call-ins were often inefficient, and those under supervision could be prioritized more effectively.

Conclusions: Group based strategies should monitor their potential reach based on who has been given the message. While only calling-in those under supervision can reach a large proportion of the gang, delivering the message to those not under supervision will likely be needed to reach 100% of the group.

And here is an image of the observed reach for one of the gang networks using both call-ins and custom notifications.

The paper has the gang networks available at this link, and uses Python to do the network analysis and SPSS to draw the graphs.

If you are interested in applying this to your work let me know! Not only do I think this is a good idea for focused deterrence initiatives for criminal justice agencies, but I think the idea can be more widely applied to other fields in social sciences, such as public health (needle clean/dirty exchange programs) or organizational studies (finding good leaders in an organization to spread a message).

Paper on Roadblocks in Buffalo published

My paper with Scott Phillips, A quasi-experimental evaluation using roadblocks and automatic license plate readers to reduce crime in Buffalo, NY, has just been published online first in the Security Journal. Springer gifts me a special link in which you can read the paper. Previously when I have been given links like that from the publisher they have a time limit, but the email for this one said nothing. But even if that goes bad you can always read my pre-print of the article I posted on SSRN.


Title: A quasi-experimental evaluation using roadblocks and automatic license plate readers to reduce crime in Buffalo, NY

Abstract:

This article evaluates the effective of a hot spots policing strategy: using automated license plate readers at roadblocks in Buffalo, NY. Different roadblock locations were chosen by the Buffalo Police Department every day over a two-month period. We use propensity score matching to identify a set of control locations based on prior counts of crime and demographic factors. We find modest reductions in Part 1 violent crimes (10 over all roadblock locations and over the two months) using t tests of mean differences. We find a 20% reduction in traffic accidents using fixed effects negative binomial regression models. Both results are sensitive to the model used though, and the fixed effects models predict increases in crimes due to the intervention. We suggest that the limited intervention at one time may be less effective than focusing on a single location multiple times over an extended period.

And here is Figure 2 from the paper, showing the units of analysis (street midpoints and intersections) and how the treatment locations were assigned.

Much ado about nothing: Overinterpreting volatility in homicide rates

I’m not much of a macro criminologist, but being asked questions by my dad (about Richard Rosenfeld and the Ferguson effect) and the dentist yesterday (asking about some of Trumps comments about rising crime trends) has prompted me to jump into it and give my opinion. Long story short — many sources I believe are overinterpreting short term fluctuations as more meaningful than they are.

First I will tackle national crime rates. So if you have happened to walk by a TV playing CNN the past few days, you may have heard Donald Trump being criticized for his statements on crime rates. This is partially a conflation with the difference between overall levels of crime versus changes in crime over time. Basically crime is currently low compared to historical patterns, but homicide rates have been rising in the past two years. This is easier to show in a chart than to explain in words. So here is the national estimated homicide rate per 100,000 individuals since 1960.1

2016 is not official and is still an estimate, but basically the pattern is this – crime has been falling generally across the country since the early 1990’s. Crime rates in just the past few years have finally dropped below levels in the 1960’s, but for the past two years homicides have been increasing. So some have pointed to the increase in the past two years and have claimed the sky is falling. To say this they say the rate of change is the largest in past 40 years. There are better charts to show rates of change (a semi-log chart), but the overall look is basically the same.

You have to really squint to see that change from 2014 to 2015 is a larger jump than any of the changes over the entire period, so arguments based on the size of recent changes in the homicide rate are hyperbole (either on a linear scale or a logarithmic scale). And even if you take the recent increases over the past two years as evidence of a more general rising trend, for a broader term pattern we still have homicide rates close to a low point in the past 50 years.

For a bit of general advice — any source that gives you a percent change you always want to see the base numbers and any longer term historical trends. Any media source that cites recent increases in homicides without providing this graph of long term historical crime trends is simply misleading. I’ve seen this done in many places, see this example from the New York Times or this recent note from the Economist. So this isn’t something specific to the President.

Now, macro criminologists don’t really have any better track record explaining these patterns than macro economists have in explaining economic trends. Basically we have a bunch of patch work theories that make sense for parts of the trend, but not the entire time frame. Changes in routine activities in 1960’s, increases in incarceration, the decline of crack use, ease of calling 911 with cell-phones, lead use, abortion (just to name a few). And academics come up with new theories all the time, the most recent being the Ferguson effect — which is simply another term for de-policing.

Now a bit on trends for specific cities. How this ties in with the national trend is that some articles have been pointing out that some cities have seen increases and some have not. That is fine to point out (albeit trivial), but then the articles frequently go on generate stories about why crime is rising in those specific places. Those on the left cite civil unrest and police brutality as possible reasons (Milwaukee, St. Louis, Chicago, Baltimore), while those on the right cite the deleterious effects of police departments not being as proactive (stops in Chicago, arrests in Baltimore).

While any of these explanations may turn out reasonable in the end, I’m pretty sure most of these articles severely underappreciate the volatility in homicide rates. Take an example with St. Louis, with a city population of just over 300,000. A homicide rate of 50 individuals per 100,000 means a total of 150 murders. A homicide rate of 40 per 100,000 means 120 murders. So we are only talking about a change of 30 murders overall. Fluctuations of around 10 in the murder rate would not be unexpected for a city with a population of 300,000 individuals. The confidence interval for a rate of 150 murders per 300,000 individuals is 126 to 176 murders.2

Even that though understates the typical volatility in homicide rates. As basically that assumes the proportion does not change over time. In reality crime statistics are more bursty, and show wilder fluctuations in different places.3 To show this for many cities, I use the data from the Economist article mentioned earlier, and create a motion chart of the changes in homicide rates over time. The idea behind this chart is a funnel chart. Cities with lower populations will show higher variance, and subsequently those dots on the left hand side of the chart will jump around alot more. The population figures are current and not varying, so the dots just move up and down on the Y axis.

For best viewing, make the X axis on the log scale, and size the points according to the population of the city. If you are at a desktop computer, you can open up a bigger version of the chart here.

Selecting individual points and then letting the animation run though illustrates the typical variability of crime over time. Here is the trace of St. Louis over the 36 year period.

New Orleans is another good example, we have fluctuations from under 30 to over 90 in the time period.

And here is Chicago, which shows less fluctuation than the smaller cities (as expected) but still has a range of homicide rates around 20 over the time period.

Howard Wainer has previously pointed this relationship out, and called it The Most Dangerous Equation. Basically, if you look you will be able to find some upward crime trends, especially in smaller cities. You need to look at it in the long term though and understand typical fluctuations to make a reasonable decision as to whether crime is increasing or if it is just typical year to year variation. The majority of news articles on the topic and just chock full of post hoc ergo propter hoc for particular cherry picked cites, and they often don’t make sense in explaining crime patterns over the past decade in those particular cities, let alone make sense for different cities experience similar conditions but not having rising homicide rates.



  1. For my notes about data sources, generally the data have come from the FBI UCR data tool (for the 1960 through 2014 data). 2015 data have come from the FBI web page for the 2015 UCR report. The 2016 projections come from this Economist article as well as the 50 cities data for the google motion chart.
  2. Calculated in R via (binom.test(150,300000)$conf.int[1:2])*300000. This is the exact Clopper-Pearson confidence interval.
  3. So even though this 538 article does a better job of acknowledging volatility, whatever test they use to determine statistically significant increases is likely to have too many false positives.

Blogging in Review – 2016

The site has continued to grow in 2016. Looking back over the prior years it has looked pretty linear the whole time.

I take a hit in December, but I almost managed on average 200 site views per day in November. I topped the 100,000 cumulative site views for the entire blogs existence in November of this year.

Despite moving from Albany to Texas, I still managed to publish 40 new pages this year, which I am pretty happy with. I don’t set myself with any hard expectations, but I like to publish something at least once every two to four weeks.

While some of my initial traffic is bursty, e.g. gets shared on a popular site and you get a couple hundred views in a day, most of my traffic is a slow trickle of referrals from google. Here is a plot of my pages by average views per day, broken down by some of my main categories. Posts colored in red have an SPSS tag, and so the Python and R columns can also be posts on SPSS. (So most of my python posts are calling python from SPSS.)

So even my most popular posts do not average more than a few views per day, and most do not get any appreciable traffic at all. Here are the labels in that dot plot to show what posts they are.

Don’t ask me why some end up being more popular than others (who knew Venn diagrams in R?). I wrote a few more blog posts on using various google maps APIs with python in response to the google places post being popular. The google street view post is doing pretty well, the others not so much though.

My motivation for posts though are more in line with an academic journal/notebook/diary – I post on some project I am working on essentially, I don’t go and research specific topics just for the blog. I am happy with the extra exposure though – and I’m sure there is more value added to a tutorial blog post than there is for a stuffy academic paper that is read by two dozen individuals (even if that is what counts towards my tenure)!

Review of Trees, maps, and theorems: Effective Communication for rational minds by Jean-luc Doumont

I was recently introduced to the work of Jean-luc Doumont via Robert Kosara. So I picked up his book, Trees, maps, and theorems: Effective Communication for rational minds, and it does not disappoint.

In a nutshell, if you have read Tufte’s Visual display of quantitative information and you like it, you will like Doumont’s book as well. He persists in the same minimalist ideal as Tufte, but has advice not just about statistical graphics, but about all aspects of scientific communication; writing, presentations, and even email.

Doumont’s chapter on effective graphical displays is mainly a brief overview of Tufte’s main points for statistical graphics (also he gives some advice on pictures and icons), but otherwise the book has quite a bit of new advice. Here is a quick sampling of some of the points that most resonated with me:

The rule of three: It is very difficult to maintain any more than three items in our short term memory. While some people use the magic number 7 rule, Doumont notes this is clearly the upper limit. Doumont’s suggestion of using three (such as for subheadings in a document, or bullet points in a powerpoint presentation) also coincides with Howard Wainer’s suggestion to limit the number of significant digits in tables to three as well.

For oral presentations with slides, he suggests printing out your slides 6 to a page on a standard letter size paper. If you have a hard time reading them, the font is too small. I’m not sure if this fits inline with my suggestions for font sizes, it will take some more investigation on my part. Another piece of advice for oral presentations is that you can’t read text on slides and listen to the presenter at the same time. Those two inputs compete in our brain, as opposed to images and talking at the same time. Doumont gives the same advice as Tufte (prepare a handout), but I don’t think this is a good idea. (The handout can be distracting.) If you need people to read text, just take a break and get a sip of water. Otherwise make the text as minimal as possible.

My only real point of contention is that Doumont makes the mistake in talking about graphics that one only needs two points labeled on axes. This is not true in general, you need three. Imagine I gave you an axis:

2--?--8

For a linear scale, the missing point would be 5, but for a logarithmic scale (in base 2) the missing point would be 4. I figured this is worth pointing out as I recently reviewed a paper where a legend for a raster image (pretty sure ArcGIS was the culprit) only had the end points labeled.

Doumont also has a bunch of advice about writing that I will need to periodically reread. In general one point is that the first sentence of either a section (or paragraph) should be declarative as to the point of that section. Sometimes folks lead with fluff that is only revealed to be related to the material later on in the section.

My writing and work will definitely not live up to Doumont’s standard, but it is a goal I believe scientists should strive for.

Paper – Replicating Group Based Trajectory Models of Crime at Micro-Places in Albany, NY published

My article on estimating crime trajectories in Albany from 2000 through 2014 has been published in the latest issue of JQC.

That link is permanent, but Springer gifts me a temporary free pdf link for everyone for up to four weeks. So grab that if you are interested.

Also note though that I have the pre-print posted on SSRN. Since that is Albany PD’s data, I cannot provide code to replicate the analysis. But, I have produced a series of blog posts showing to to replicate the trajectory and the point pattern analysis on your own data if you are interested, see

Here is the cross Ripley’s L plot testing for clustering between the different trajectory groupings.

Also always feel free to send me an email if you have questions about the findings and paper.

ASC 2016 – Quantifying the Local and Spatial Effects of Alcohol Outlets on Crime

This year at the American Society of Criminology I will be presenting some work from my dissertation, Quantifying the Local and Spatial Effects of Alcohol Outlets on Crime. I have the working paper posted on SSRN, and that also has a link to download data and code to reproduce the findings in the paper.

I will be presenting at the panel Alcohol and Crime on Wednesday at 9:30 (at the Cambridge room on the 2nd level).

Here is the abstract:

This paper estimates the relationship between alcohol outlets and crime at micro place street units in Washington, D.C. Three specific additions to this voluminous literature are articulated. First, the diffusion effect of alcohol outlets is larger than the local effect. This has important implications for crime prevention. The second is that in this sample the effects of on-premise and off-premise outlets are very similar in magnitude. I argue this is evidence in favor of routine activities theory, in opposition to theories which emphasize individual alcohol consumption. The final is that alcohol outlets have large effects on burglary, despite the fact that alcohol outlets cannot increase the number of vulnerable targets, as it can with interpersonal crimes. I discuss how this can either be interpreted as evidence that alcohol outlets self-select into already crime prone areas, or potentially that the presence of motivated offenders’ matters much more than increasing the number of potential victims.

The most interesting finding is the fact that I estimate the diffusion effect of alcohol outlets is larger than the local effect. I then show that this is the case for some other papers as well, it is just interpreting the regression model is tricky. Here is a diagram showing what happens. The idea is the regression coefficient for the spatial lag is one orange dot, and the local effect is the blue dot. Adding a bar though diffuses to multiple places, so when adding up all the smaller orange dots, they result in more crime than the one bigger blue dot.