Buffalo shootings paper published

My article examining spatial shifts in shootings in Buffalo pre/post Covid, in collaboration with several of my Buffalo colleagues, is now published in the Journal of Experimental Criminology (Drake et al., 2022).

If you do not have access to that journal, you can always just email, or check out the open access pre-print. About the only difference is a supplement we added in response to reviewers, including maps of different grid cell areas, here is a hex grid version of the changes:

The idea behind this paper was to see if given the dramatic increase in shootings in Buffalo after Covid started (Kim & Phillips, 2021), they about doubled (similar to NYC), did spatial hot spots change? The answer is basically no (and I did a similar analysis in NYC as well).

While other papers have pointed out that crime increases disproportionately impact minority communities (Schleimer et al., 2022), which is true, it stands to be very specific what the differences in my work and this are saying. Imagine we have two neighborhoods:

Neighborhood A, Disadvantaged/Minority, Pre 100 crimes, Post 200 crimes
Neighborhood B,    Advantaged/Majority, Pre   1 crimes, Post   2 crimes

The work that I have done has pointed to these increases due to Covid being that relative proportions/rates are about the same (shootings ~doubled in both Buffalo/NYC). And that doubling was spread out pretty much everywhere. It is certainly reasonable to interpret this as an increased burden in minority communities, even if proportional trends are the same everywhere.

This proportional change tends to occur when crime declines as well (e.g. Weisburd & Zastrow, 2022; Wheeler et al., 2016). And this just speaks to the stickiness of hot spots of crime. Even with large macro changes in temporal crime trends, crime hot spots are very durable over time. So I really think it makes the most sense for police departments to have long term strategies to deal with hot spots of crime, and they don’t need to change targeted areas very often.

References

Power and bias in logistic regression

Michael Sierra-Arévalo, Justin Nix and Bradley O’Guinn have a recent article about examining officer fatalities following gunshot assaults (Sierra-Arévalo, Nix, & O-Guinn). They do not find that distance to a Level 1/2 trauma ERs make a difference in the survival probabilities, which conflicts with prior work of mine with Gio Circo (Circo & Wheeler, 2021). Justin writes this as a potential explanation for the results:

The results of our multivariable analysis indicated that proximity to trauma care was not significantly associated with the odds of officers surviving a gunshot wound (see Table 2 on p. 9 of the post-print). On the one hand, this was somewhat surprising given that proximity to trauma care predicts survival of gunshot wounds among the general public.1 On the other hand, police have specialized equipment, such as ballistic vests and tourniquets, that reduce the severity of gunshot wounds or allow them to be treated immediately.

I think it is pretty common when results do not pan out, people turn to theoretical (or sociological) reasons why their hypothesis may be invalid. While these alternatives are often plausible, often equally plausible are simpler data based reasons. Here I was concerned about two factors, 1) power and 2) omitted severity of gun shot wound factors. I did a quick simulation in R to show power seems to be OK, but the omitted severity confounders may be more problematic in this design, although only bias the effect towards 0 (it would not cause the negative effect estimate MJB find).

Power In Logistic Regression

First, MJB’s sample size is just under 1,800 cases. You would think offhand this is plenty of power for whatever analysis right? Well, power just depends on the relevant effect size, a small effect and you need a bigger sample. My work with Gio found a linear effect in the logistic equation of 0.02 (per minute driving increases the logit). We had 5,500 observations, and our effect had a p-value just below 0.05, hence why a first thought was power. Also logistic regression is asymptotic, it is common to have small sample biases in situations even up to 1000 observations (Bergtold et al., 2018). So lets see in a simple example ignoring the other covariates:

# Some upfront work
logistic <- function(x){1/(1+exp(-x))}
set.seed(10)

# Scenario 1, no covariates omitted
n <- 2000; 
de <- 0.02
dist <- runif(n,5,200)
p <- logistic(-2.5 + de*dist)
y <- rbinom(n,1,p)

# Variance is small enough, seems reasonably powered
summary(glm(y ~ dist, family = "binomial"))

Here with 2000 cases, taking the intercept from MJB’s estimates and the 0.02 from my paper, we see 2000 observations is plenty enough well powered to detect that same 0.02 effect in mine and Gio’s paper. Note when doing post-hoc power analysis, you don’t take the observed effect (the -0.001 in Justin’s paper), but a hypothetical effect size you think is reasonable (Gelman, 2019), which I just take from mine and Gio’s paper. Essentially saying “Is Justin’s analysis well powered to detect an effect of the same size I found in the Philly data”.

One thing that helps MJB’s design here is more variance in the distance parameter, looking intra city the drive time distances are smaller, which will increase the standard error of the estimate. If we pretend to limit the distances to 30 minutes, this study is more on the fence as to being well enough powered (but meets the threshold in this single simulation):

# Limited distance makes the effect have a higher variance
n <- 2000; 
de <- 0.02
dist <- runif(n,1,30)
p <- logistic(-2.5 + de*dist)
y <- rbinom(n,1,p)

# Not as much variation in distance, less power
summary(glm(y ~ dist, family = "binomial"))

For a more serious set of analysis you would want to do these simulations multiple times and see the typical result (since they are stochastic), but this is good enough for me to say power is not an issue in this design. If people are planning on replications though, intra-city with only 1000 observations is really pushing it with this design though.

Omitted Confounders

One thing that is special about logistic regression, unlike linear regression, even if an omitted confounder is uncorrelated with the effect of interest, it can still bias the estimates (Mood, 2010). So even if you do a randomized experiment your effects could be biased if there is some large omitted effect from the regression equation. Several people interpret this as logistic regression is fucked, but like that linked Westfall article I think that is a bit of an over-reaction. Odds ratios are very tricky, but logistic regression as a method to estimate conditional means is not so bad.

In my paper with Gio, the largest effect on whether someone would survive was based on the location of the bullet wound. Drive time distances then only marginal pushed up/down that probability. Here are conditional mean estimates from our paper:

So you can see that being shot in the head, drive time can make an appreciable difference over these ranges, from ~45% to 55% probability of death. Even if the location of the wound is independent of drive time (which seems quite plausible, people don’t shoot at your legs because you are far away from a hospital), it can still be an issue with this research design. I take Justin’s comment about ballistic vests as reducing death as essentially taking the people in the middle of my graph (torso and multiple injuries) and pushing them into the purple line at the bottom (extremities). But people shot in the head are not impacted by the vests.

So lets see what happens to our effect estimates when we generate the data with the extremities and head effects (here I pulled the estimates all from my article, baseline reference is shot in head and negative effect is reduction in baseline probability when shot in extremity):

# Scenario 3, wound covariate omitted
dist <- runif(n,5,200)
ext_wound <- rbinom(n,1,0.8)
ef <- -4.8
pm <- logistic(0.2 + de*dist + ef*ext_wound)
ym <- rbinom(n,1,pm)

# Biased downward (but not negative)
summary(glm(ym ~ dist, family = "binomial"))

You can see here the effect estimate is biased downward by a decent margin (less than half the size of the true effect). If we estimate the correct equation, we are on the money in this simulation run:

What happens if we up the sample size? Does this bias go away? Unfortunately it does not, here is an example with 10,000 observations:

# Scenario 3, wound covariate ommitted larger sample
n2 <- 10000
dist <- runif(n2,5,200)
ext_wound <- rbinom(n2,1,0.8)
ef <- -4.8
pm <- logistic(0.2 + de*dist + ef*ext_wound)
ym <- rbinom(n2,1,pm)

# Still a problem
summary(glm(ym ~ dist, family = "binomial"))

So this omission is potentially a bigger deal – but not in the way Justin states in his conclusion. The quote earlier suggests the true effect is 0 due to vests, I am saying here the effect in MJB’s sample is biased towards 0 due to this large omitted confounder on the severity of the wound. These are both plausible, there is no way based just on MJB’s data to determine if one interpretation is right and the other is wrong.

This would not explain the negative effect estimate MJB finds though in their paper, it would only bias towards 0. To be fair, Jessica Beard critiqued mine and Gio’s paper in a similar vein (saying the police wound location data had errors), this would make our drive time estimates be biased towards 0 as well, so if that factor may be even larger than me and Gio even estimated.

Potential robustness checks here are to simply do a linear regression instead of logistic with the same data (my graph above shows a linear regression would be fine for the data if I included interaction effects with wound location). And another would be to look at the unconditional marginal distribution of distance vs probability of death. If that is highly non-linear, it is likely due to omitted confounders in the data (I suspect it may plateau as well, eg the first 30 minutes make a big difference, but after that it flattens out, you’ve either stabilized someone or they are gone at that point).

Policy?

In the case of intra-city public violence, the policy implication of drive times on survival are relevant when people are determining whether to keep open or close trauma centers. I did not publish this in my paper with Gio (you can see the estimates in the replication code), but we actually estimated counter-factual increased deaths by taking away facilities. Its marginal effect is around 10~20 homicides over the 4.5 years if you take away one of the facilities in Philadelphia. I don’t know if reducing 5 homicides per year is sufficient justification to keep a trauma facility open, but officer shootings are themselves much less frequent, and so the marginal effects are very unlikely to justify keeping a trauma facility open/closed by themselves.

You could technically figure out the optimal location to site a new trauma facility from mine and Gio’s paper, but probably a more reasonable response would be to site resources to get people to the ER faster. Philly already does scoop and run (Winter et al., 2021), where officers don’t wait for an ambulance. Another possibility though is to proactively locate ambulances to get to scenes faster (Hosler et al., 2019). Again though it just isn’t as relevant/feasible outside of major urban areas though to do that.

Often times social science authors do an analysis, and then in the policy section say things that are totally reasonable on their face, but are not supported by the empirical analysis. Here the suggestion that officers should increase their use of vests by MJB is totally reasonable, but nothing in their analysis supports that conclusion (ditto with the tourniquets statement). You would need to measure those incidents that had those factors, and see its effect on officer survival to make that inference. MJB could have made the opposite statement, since drive time doesn’t matter, maybe those things don’t make a difference in survival, and be equally supported by the analysis.

I suspect MJB’s interest in the analysis was simply to see if survival rates were potential causes of differential officer deaths across states (Sierra-Arévalo & Nix, 2020). Which is fine to look at by itself, even if it has no obviously direct policy implications. Talking back and forth with Justin before posting this, he did mention it was a bit of prodding from a reviewer to add in the policy implications. Which it goes for both (reviewers or original writers), I don’t think we should pad papers with policy recommendations (or ditto for theoretical musings) that aren’t directly supported by the empirical analysis we conduct.

References

  • Bergtold, J. S., Yeager, E. A., & Featherstone, A. M. (2018). Inferences from logistic regression models in the presence of small samples, rare events, nonlinearity, and multicollinearity with observational data. Journal of Applied Statistics, 45(3), 528-546.
  • Circo, G. M., & Wheeler, A. P. (2021). Trauma Center Drive Time Distances and Fatal Outcomes among Gunshot Wound Victims. Applied Spatial Analysis and Policy, 14(2), 379-393.
  • Gelman, A. (2019). Don’t calculate post-hoc power using observed estimate of effect size. Annals of Surgery, 269(1), e9-e10.
  • Hosler, R., Liu, X., Carter, J., & Saper, M. (2019). RaspBary: Hawkes Point Process Wasserstein Barycenters as a Service.
  • Mood, C. (2010). Logistic regression: Why we cannot do what we think we can do, and what we can do about it. European Sociological Review, 26(1), 67-82.
  • Sierra-Arévalo, M., & Nix, J. (2020). Gun victimization in the line of duty: Fatal and nonfatal firearm assaults on police officers in the United States, 2014–2019. Criminology & Public Policy, 19(3), 1041-1066.
  • Sierra-Arévalo, Michael, Justin Nix, & Bradley O’Guinn (2022). A National Analysis of Trauma Care Proximity and Firearm Assault Survival among U.S. Police. Forthcoming in Police Practice and Research. Post-print available at
  • Winter, E., Hynes, A. M., Shultz, K., Holena, D. N., Malhotra, N. R., & Cannon, J. W. (2021). Association of police transport with survival among patients with penetrating trauma in Philadelphia, Pennsylvania. JAMA network open, 4(1), e2034868-e2034868.

ptools feature engineering vignette update

For another update to my ptools R package in progress, I have added a vignette to go over the spatial feature engineering functions I have organized. These include creating vector spatial features (grid cells, hexagons, or Voronoi polygons), as well as RTM style features on the right hand side (e.g. distance to nearest, kernel density estimates at those polygon centroids, different weighted functions ala egohoods, etc.)

If you do install the package turning vignettes on you can see it:

install_github("apwheele/ptools", build_vignettes = TRUE)
vignette("spat-feateng")

Here is an example of hexgrids over NYC (I have datasets for NYC Shootings, NYC boroughs, NYC Outdoor Cafes, and NYC liquor licenses to illustrate the functions).

The individual functions I think are reasonably documented, but it is somewhat annoying to get an overview of them all. If you go to something like “?Documents/R/win-library/4.1/ptools/html/00Index.html” (or wherever your package installation folder is) you can see all of the functions currently in the package in one place (is there a nice way to pull this up using help()?). But between this vignette and the Readme on the front github page you get a pretty good overview of the current package functionality.

I am still flip flopping whether to bother to submit to CRAN. Installing from github is so easy not sure it is worth the hassle while I continually add in new things to the package. And I foresee tinkering with it for an extended period of time.

Always feel free to contribute, I want to not only add more functions, but should continue to do unit tests and add in more vignettes.

The Big 2020 Homicide Increase in Context

Jeff Asher recently wrote about the likely 2020 increase in Homicides, stating this is an unprecedented increase. (To be clear, this is 2020 data! Homicide reporting data in the US is just a few months shy of a full year behind.)

In the past folks have found me obnoxious, as I often point to how homicide rates (even for fairly large cities), are volatile (Wheeler & Kovandzic, 2018). Here is an example of how I thought the media coverage of the 2015/16 homicide increase was overblown.

I actually later quantified this more formally with then students Haneul Yim and Jordan Riddell (Yim et al., 2020). We found the 2015 increase was akin to when folks on the news say a 1 in 100 year flood. So I was wrong in terms of it was a fairly substantive increase relative to typical year to year changes. Using the same methods, I updated the charts to 2020 data, and 2020 is obviously a much larger increase than the ARIMA model we fit based on historical data would expect:

Looking at historical data, people often argue “it isn’t as high as the early 90’s” – this is not the point though I really intended to make (it is kind of silly to make a normative argument about the right or acceptable number of homicides) – but I can see how I conflated those arguments. Looking at the past is also about understanding the historical volatility (what is the typical year to year change). Here this is clearly a much larger swing (up or down) than we would expect based on the series where we have decent US coverage (going back to 1960).


For thinking about crime spikes, I often come from a place in my crime analyst days where I commonly was posed with ‘crime is on the rise’ fear in the news, and needed to debunk it (so I could get back to doing analysis on actual crime problems, not imaginary ones). One example was a convenience store was robbed twice in the span of 3 days, and of course the local paper runs a story crime is on the rise. So I go and show the historical crime trends to the Chief and the Mayor that no, commercial robberies are flat. And even for that scenario there were other gas stations that had more robberies in toto when looking at the data in the past few years. So when the community police officer went to talk to that convenience store owner to simply lock up his cash in more regular increments, I told that officer to also go to other stores and give similar target hardening advice.

Another aspect of crime trends is not only whether a spike is abnormal (or whether we actually have an upward trend), but what causes it. I am going to punt on that – in short it is basically impossible in normal times to know what caused short term spikes absent identifying specific criminal groups (which is not so relevant for nationwide spikes, but even in large cities one active criminal group can cause observable spikes). We have quite a bit of crazy going on at the moment – Covid, BLM riots, depolicing – I don’t know what caused the increase and I doubt we will ever have a real firm answer. We cannot run an experiment to see why these increases occurred – it is mostly political punditry pinning it on one theory versus another.

For the minor bit it is worth – the time series methods I use here signal that the homicide series is ARIMA(1,1,0) – which means both an integrated random walk component and a auto-regressive component. Random walks will occur in macro level data in which the micro level data are a bunch of AR components. So this suggests a potential causal attribution to increased homicides is homicides itself (crime begets more crime). And this can cause run away effects of long upwards/downwards trends. I don’t know of a clear way though to validate that theory, nor any obvious utility in terms of saying what we should do to prevent increases in homicides or stop any current trends. Even if we have national trends, any intervention I would give a thumbs up to is likely to be local to a particular municipality. (Thomas Abt’s Bleeding Out is about the best overview of potential interventions that I mostly agree with.)

References

  • Wheeler, A. P., & Kovandzic, T. V. (2018). Monitoring volatile homicide trends across US cities. Homicide Studies, 22(2), 119-144.
  • Yim, H. N., Riddell, J. R., & Wheeler, A. P. (2020). Is the recent increase in national homicide abnormal? Testing the application of fan charts in monitoring national homicide trends over time. Journal of Criminal Justice, 66, 101656.

Notes for the 2019/2020 updated homicide data. 2019 data is available from the FBI page, 2020 homicide data I have taken from estimates at USA Facts and total USA pop is taken from Google search results.

R code and data to replicate the chart can be downloaded here.

Reversion in the tech stack and why DS models fail

Hackernews recently shared a story about not using an IDE, and I feel mostly the same way. Hence the title in the post – so my current workflow for when I steal some time to work on my R package ptools my workflow looks like this, using Rterm from the shell:

I don’t have anything against RStudio, I just only have so much room in my brain. Sometimes conversations at work are like a foreign language, “How are we going to test the NiFi script from Hadoop to our Kubernetes environment” or “I can pull the docker image from JFrog, but I run out of room when extracting the image on our sandbox machine. But df says we have plenty of room on all the partitions?”.

If you notice at the top the (base) in front of the shell, that is because this is within the anaconda shell as well. So if you look at many of my past blog posts (see here for one example), I am just using the snipping tool in windows to take screenshots of the shell output in interactive mode.

I am typically just writing the code in Notepad++ (as well as this blog post) – and it is quite simple to switch between interactive copy this function/code and compiling entire scripts. Here is a screenshot of R unit tests for example.

So Notepad++ has some text highlight (for both R and python), and that is nice, but honestly not that necessary. Main thing I use is the selection of brackets to make sure they are balanced. I am sure I am missing out on some nice autocomplete features that would make me more productive, and function hints in Spyder are nice for pandas functions (I mostly use google still though for that when I need it).

I do use VS Code for development work on our headerless virtual machines at work. But that is more to replicate essentially the workflow on Windows with file explorer + Notepad++ + Shell (I am not a vim ninja – what is it esc + wq:, need to look that up everytime). I fucked up one of my git repos the other day using VS Code stuff tools, and I am just using git directly anymore. (Again this means I am the problem, not VS Code!)

Why data science projects fail?

Some more random musings, but the more I get involved and see what projects work and what don’t at work, pretty much all of the failures I have come across are due to what I will call “not modeling the right thing”. That potentially covers a bit, but quite a few are simply not understanding counterfactual reasoning and selection bias.

The modeling part (in terms of actually fitting models) is typically quite easy – you do some simple but slightly theoretically informed feature engineering and feed that info into a machine learning model that is very flexible. But maybe that is the problem, people can easily fool themselves into thinking a model looks good, but because they are modeling the wrong thing it does not result in better decision making.

Even in most of the failures I have seen, selection bias is surmountable (often just requires multiple models or models on different samples of data – reduced form for the win!). So learning how to train/test split the data, and feed your data into XGboost only takes a few classes to learn. How to know the right thing to model though takes a bit more thought.

A secondary part of the failure is not learning how to translate the model outputs into actionable decisions. But the not modeling the right thing is at the start, so makes any downstream decision not work out how you want.

ptools R package

It has been on my bucket list for a bit, but I wanted to take the time to learn how to construct an R package (same as for a python package). So I crafted a package with only a few functions in it so far, ptools, short for Poisson tools.

These are a handful of functions I have blogged about over the years, including functions for various WDD tests and the variants I have blogged about (weighted harm scores, different time periods, and different area sizes).

Small sample counts in bins (which can be used for Benford’s test), or my original application was checking if a chronic offender had a propensity to commit crimes on certain days of the week.

The Poisson e-test, and a function to check whether a distribution is Poisson distributed and two more Poisson related functions as well.

I think I will add quite a few more functions in the soup before I bother submitting to CRAN. (Installing via devtools via github is quite easy, so I do not feel too bad about that.) If you have functions you think I should add just let me know. (Or just make a pull request and add them yourself!) I also need to work on unit tests, and getting github actions set up. I will probably crunch on this for a bit, and then migrate personal projects back to creating some python libraries for my other work.

I do not use R-studio, but the open book R packages has been immensely helpful. On my windows box I had to bother to add R to my system path, so I can start my R session at the appropriate directory, but besides that very minor hassle it has been quite easy to follow.


I probably have not put in my 10k total hours as a guesstimate to mastery in computer programming. I think maybe closer to 5000, and that is spread out (maybe quite evenly at this point) between python, R, SPSS (and just a little Stata). And I still learn new stuff all the time. Being in an environment where I need to work with more people has really hammered down getting environments right, and making it shareable with other teammates. And part and parcel with that is documenting code in a much more thorough manner than most of the code snippets I leave littered on this blog.

So it probably is worth me posting less, but spending more time making nicer packages to share with everyone.

I do not know really how folks do R programming for making packages. I know a little at this point about creating separate conda environments for python to provide some isolation – is there something equivalent to conda environments for R? Do the R CMD checks make this level of isolation unnecessary? Should I just be working on an isolated docker image for all development work? I do not know. I do not have to worry about that at the moment though.


Part of this self learning journey is because I am trying to start a journal aimed at criminologists where you can submit software packages. Similar to the Journal of Open Source Software or the Journal of Statistical Software, etc. For submission to there I want people to have documentation for functions, and really that necessitates having a nice package (whether in R or python or whatever). So I can’t tell people you need to make a package if I don’t do that myself!

The software papers are not a thing yet (I would call it a soft launch at this point), but I have been bugging folks about submitting papers to get a dry run of the process. If you have something you would like to submit, feel free to get in touch and we can get you set up.

Spatial analysis of NYC Shootings using the SPPT

As a follow up to my prior post on spatial sample size recommendations for the SPPT test, I figured I would show an actual analysis of spatial changes in crime. I’ve previously written about how NYC shootings appear to be going up by a similar amount in each precinct. We can do a similar analysis, but at smaller geographic spatial units, to see if that holds true for everywhere.

The data and R code to follow along can be downloaded here. But I will copy-paste below to walk you through.

So first I load in the libraries I will be using and set my working directory:

###################################################
library(sppt)
library(sp)
library(raster)
library(rgdal)
library(rgeos)

my_dir <- 'C:\\Users\\andre\\OneDrive\\Desktop\\NYC_Shootings_SPPT'
setwd(my_dir)
###################################################

Now we just need to do alittle data prep for the NYC data. Concat the old and new files, convert the data fields for some of the info, and do some date manipulation. I choose the pre/post date here March 1st 2020, but also note we had the Floyd protests not to long after (so calling these Covid vs protest increases is pretty much confounded).

###################################################
# Read in the shooting data

old_shoot <- read.csv('NYPD_Shooting_Incident_Data__Historic_.csv', stringsAsFactors=FALSE)
new_shoot <- read.csv('NYPD_Shooting_Incident_Data__Year_To_Date_.csv', stringsAsFactors=FALSE)

# Just one column off
print( cbind(names(old_shoot), names(new_shoot)) )
names(new_shoot) <- names(old_shoot)
shooting <- rbind(old_shoot,new_shoot)

# I need to conver the coordinates to numeric fields
# and the dates to a date field

coord_fields <- c('X_COORD_CD','Y_COORD_CD')
for (c in coord_fields){
  shooting[,c] <- as.numeric(gsub(",","",shooting[,c])) #replacing commas in 2018 data
}

# How many per year to check no funny business
table(substring(shooting$OCCUR_DATE,7,10))

# Making a datetime variable in R
shooting$OCCUR_DATE <- as.Date(shooting$OCCUR_DATE, format = "%m/%d/%Y", tz = "America/New_York")

# Making a post date to split after Covid started
begin_date <- as.Date('03/01/2020', format="%m/%d/%Y")
shooting$Pre <- ifelse(shooting$OCCUR_DATE < begin_date,1,0)

#There is no missing data
summary(shooting)
###################################################

Next I read in a shapefile of the census tracts for NYC. (Pro-tip for NYC GIS data, I like to use Bytes of the Big Apple where available.) The interior has a few dongles (probably for here should have started with a borough outline file), so I do a tiny buffer to get rid of those interior dongles, and then smooth the polygon slightly. To check and make sure my crime data lines up, I superimpose with a tiny dot map — this is also a great/simple way to see the overall shooting density without the hassle of other types of hot spot maps.

###################################################
# Read in the census tract data

nyc_ct <- readOGR(dsn="nyct2010.shp", layer="nyct2010") 
summary(nyc_ct)
plot(nyc_ct)
nrow(nyc_ct) #2165 tracts

# Dissolve to a citywide file
nyc_ct$const <- 1
nyc_outline <- gUnaryUnion(nyc_ct, id = nyc_ct$const)
plot(nyc_outline)

# Area in square feet
total_area <- area(nyc_outline)
# 8423930027

# Turning crimes into spatial point data frame
coordinates(shooting) <- coord_fields
crs(shooting) <- crs(nyc_ct)

# This gets rid of a few dongles in the interior
nyc_buff <- gBuffer(nyc_outline,1,byid=FALSE)
nyc_simpler <- gSimplify(nyc_buff, 500, topologyPreserve=FALSE)

# Checking to make sure everything lines up
png('NYC_Shootings.png',units='in',res=1000,height=6,width=6,type='cairo')
plot(nyc_simpler)
points(coordinates(shooting),pch='.')
dev.off()
###################################################

The next part I created a function to generate a nice grid over an outline area of your choice to do the SPPT analysis. What this does is generates the regular grid, turns it from a raster to a vector polygon format, and then filters out polygons with 0 overlapping crimes (so in the subsequent SPPT test these areas will all be 0% vs 0%, so not much point in checking them for differences over time!).

You can see the logic from the prior blog post, if I want to use the area with power to detect big changes, I want N*0.85. Since I am comparing data over 10 years compared to 1+ years, they are big differences, so I treat N here as 1.5 times the newer dataset, which ends up being around a suggested 3,141 spatial units. Given the area for the overall NYC, this translates to grid cells that are about 1600 by 1600 feet. Once I select out all the 0 grid cells, there only ends up being a total of 1,655 grid cells for the final SPPT analysis.

###################################################
# Function to create sppt grid over areas with 
# Observed crimes

grid_crimes <- function(outline,crimes,size){
    # First creating a raster given the outline extent
    base_raster <- raster(ext = extent(outline), res=size)
    projection(base_raster) <- crs(outline)
    # Getting the coverage for a grid cell over the city area
    mask_raster <- rasterize(outline, base_raster, getCover=TRUE)
    # Turning into a polygon
    base_poly <- rasterToPolygons(base_raster,dissolve=FALSE)
    xy_df <- as.data.frame(base_raster,long=T,xy=T)
    base_poly$x <- xy_df$x
    base_poly$y <- xy_df$y
    base_poly$poly_id <- 1:nrow(base_poly)
    # May also want to select based on layer value
    # sel_poly <- base_poly[base_poly$layer > 0.05,]
    # means the grid cell has more than 5% in the outline area
    # Selecting only grid cells with an observed crime
    ov_crime <- over(crimes,base_poly)
    any_crime <- unique(ov_crime$poly_id)
    sub_poly <- base_poly[base_poly$poly_id %in% any_crime,]
    # Redo the id
    sub_poly$poly_id <- 1:nrow(sub_poly)
    return(sub_poly)
}


# Calculating suggested sample size
total_counts <- as.data.frame(table(shooting$Pre))
print(total_counts)

# Lets go with the pre-total times 1.5
total_n <- total_counts$Freq[1]*1.5

# Figure out the total number of grid cells 
# Given the total area
side <- sqrt( total_area/total_n ) 
print(side)
# 1637, lets just round down to 1600

poly_cells <- grid_crimes(nyc_simpler,shooting,1600)
print(nrow(poly_cells)) #1655

png('NYC_GridCells.png',units='in',res=1000,height=6,width=6,type='cairo')
plot(nyc_simpler)
plot(poly_cells,add=TRUE,border='blue')
dev.off()
###################################################

Next part is to split the data into pre/post, and do the SPPT analysis. Here I use all the defaults, the Chi-square test for proportional differences, along with a correction for multiple comparisons. Without the multiple comparison correction, we have a total of 174 grid cells that have a p-value < 0.05 for the differences in proportions for an S index of around 89%. With the multiple comparison correction though, the majority of those p-values are adjusted to be above 0.05, and only 25 remain afterwards (98% S-index). You can see in the screenshot that all of those significant differences are increases in proportions from the pre to post. While a few are 0 shootings to a handful of shootings (suggesting diffusion), the majority are areas that had multiple shootings in the historical data, they are just at a higher intensity now.

###################################################
# Now lets do the sppt analysis

split_shoot <- split(shooting,shooting$Pre)
pre <- split_shoot$`1`
post <- split_shoot$`0`

library(dplyr)
sppt_diff <- sppt_diff(pre, post, poly_cells)
summary(sppt_diff)

# Unadjusted vs adjusted p-values
sum(sppt_diff$p.value < 0.05) #174, around 89% similarity
sum(sppt_diff$p.adjusted  < 0.05) #25, 98% similarity

# Lets select out the increases/decreases
# And just map those

sig <- sppt_diff$p.adjusted < 0.05
sppt_sig <- sppt_diff[sig,]
head(sppt_sig,25) # to check out all increases
###################################################

The table is not all that helpful though for really digging into patterns, we need to map out the differences. The first here is a map showing the significant grid cells. They are somewhat tiny though, so you have to kind of look close to see where they are. The second map uses proportional circles to the percent difference (so bigger circles show larger increases). I am too lazy to do a legend/scale, but see my prior post on a hexbin map, or the sp website in the comments.

###################################################
# Making a map
png('NYC_SigCells.png',units='in',res=1000,height=6,width=6,type='cairo')
plot(nyc_simpler,lwd=1.5)
plot(sppt_sig,add=TRUE,col='red',border='white')
dev.off()

circ_sizes <- sqrt(-sppt_sig$diff_perc)*3

png('NYC_SigCircles.png',units='in',res=1000,height=6,width=6,type='cairo')
plot(nyc_simpler,lwd=1.5)
points(coordinates(sppt_sig),pch=21,cex=circ_sizes,bg='red')
dev.off() 

# check out https://edzer.github.io/sp/
# For nicer maps/legends/etc.
###################################################

So the increases appear pretty spread out. We have a few notable ones that made the news right in the thick of things in Manhattan, but there are examples of grid cells that increased scattered all over the boroughs. I am not going to the trouble here, but if I were a crime analyst working on this, I would export this to a format where I could zoom into the local areas and drill down into the specific incidents. You can do that either in ArcGIS, or more directly in R by creating a leaflet map.

So if folks have any better ideas for testing out crime increases I am all ears. At some point will give the R package sparr a try. (Here you could treat pre as the controls and post as the cases.) I am not a real big fan of over interpreting changes in kernel density estimates though (they can be quite noisy, and heavily influenced by the bandwidth), so I do like the SPPT analysis by default (but it swaps out a different problem with choosing a reasonable grid cell size).

Spatial sample size suggestions for SPPT analysis

I’ve reviewed several papers recently that use Martin Andresen’s Spatial Point Pattern Test (Andresen, 2016). I have been critical of these papers, as I think they are using too small of samples to be reasonable. So here in this blog post I will lay out spatial sample size recommendations. Or more specifically if you have N crimes, advice about how you can conduct the SPPT test in S spatial units of analysis.

Long story short, if you have N crimes, I think you should either use 0.85*N = S spatial units of analysis at the high end, but can only detect very large changes. To be well powered to detect smaller changes between the two distributions, use 0.45*N = S. That is, if you have two crime samples you want to compare, and the smaller sample has 1000 crimes, the largest spatial sample size I would recommend is 850 units, but I think 450 units is better.

For those not familiar with the SPPT technique, it compares the proportion of events falling inside a common area (e.g. police beats, census block groups, etc.) between two patterns. So for example in my work I compared the proportion of violent crime and the proportion of SQF in New York City (Wheeler et al., 2018). I think it makes sense as a gross monitoring metric for PDs this way (say for those doing DDACTS, swap out pedestrian stops with traffic stops), so you can say things like area A had a much lower proportion of crimes than stops, so we should emphasize people do fewer stops in A overall.

If you are a PD, you may already know the spatial units you want to use for monitoring purposes (say for each police sector or precinct). In that case, you want the power analysis to help guide you for how large a sample you need to effectively know how often you can update the estimates (e.g. you may only have enough traffic stops and violent crimes to do the estimates on a quarterly basis, not a monthly one) Many academic papers though are just generally theory testing, so don’t have an a priori spatial unit of analysis chosen. (But they do have two samples, e.g. a historical sample of 2000 shootings and a current sample of 1000 shootings.) See Martin’s site for a list of prior papers using the SPPT to see it in action.

I’ve reviewed several papers that examine these proportion changes using the SPPT at very tiny spatial units of analysis, such as street segments. They also happen to have very tiny numbers of overall crimes, and then break the crimes into subsequent subsets. For example reviewed a paper that had around 100 crimes in each subset of interest, and had around 20,000 street segments. I totally get wanting to examine micro place crime patterns – but the SPPT is not well suited for this I am afraid.

Ultimately if you chunk up the total number of crimes into smaller and smaller areas, you will have less statistical power to uncover differences. With very tiny total crime counts, you will be basically only identifying differences between areas that go from 0 to 1 or 0 to 2 etc. It also becomes much more important to control for multiple comparisons when using a large number of spatial units. In general this technique is not going to work out well for micro units of analysis, it will only really work out for larger spatial units IMO. But here I will give my best advice about how small you can reasonably go for the analysis.

Power analysis logic

There are quite a few different ways people have suggested to determine the spatial sample for areas when conducting quadrat analysis (e.g. when you make your own spatial areas). So one rule of thumb is to use 2*A/N, where A is the area of the study and N is the total number of events (Paez, 2021).

Using the SPPT test itself, Malleson et al. (2019) identify the area at which the spatial pattern exhibits the highest similarity index with itself using a resampling approach. Ramos et al. (2021) look at the smallest spatial unit at which the crime patterns within that unit show spatial randomness.

So those later two take an error metric based approach (the spatial unit of analysis likely to result in the miminal amount of error, with error defined different ways). I take a different approach here – power analysis. We want to compare to spatial point patterns for proportional differences, how can we construct the test to be reasonably powered to identify differences we want to detect?

I do not have a perfect way to do this power analysis, but here is my logic. Crime patterns are often slightly overdispersed, so here I assume if you split up say 1000 crimes into 600 areas, it will have an NB2 distribution with a mean of 1000/600 = 1.67 and an overdispersion parameter of 2. (I assume this parameter to be 2 for various reasons, based on prior analysis of crime patterns, and that 2 tends to be in the general ballpark for the amount of overdispersion.) So now we want to see what it would take to go from a hot spot of crime, say the 98th percentile of this distribution to the median 50th percentile.

So in R code, to translate the NB2 mean/dispersion to N & P notation results in N & P parameters of 1.0416667 and 0.3846154 respectively:

trans_np <- function(mu,disp){
    a <- disp
    x <- mu^2/(1 - mu + a*mu)
    p <- x/(x + mu)
    n <- (mu*p)/(1-p)
    return(c(n,p))
}

# Mean 1000/600 and dispersion of 2
nb_dis <- trans_np(1000/600,2)

Now we want to see what the counts are to go from the 98th to the 50th percentile of this distribution:

crime_counts <- qnbinom(c(0.98,0.5), size=nb_dis[1], prob=nb_dis[2])

And this gives us a result of [1] 8 1 in the crime_counts object. So a hot spot place in this scenario will have around 8 crimes, and the median will be around 1 crime in our hypothetical areas. So we can translate these to percentages, and then feed them into R’s power.prop.test function:

crime_prop <- crime_counts/1000
power.prop.test(n = 1000, p1 = crime_prop[1], p2 = crime_prop[2])

And this gives us a result of:

     Two-sample comparison of proportions power calculation 

              n = 1000
             p1 = 0.008
             p2 = 0.001
      sig.level = 0.05
          power = 0.6477139
    alternative = two.sided

NOTE: n is number in *each* group

Note that this is for one N estimate, and assumes that N will be the same for each proportion. In practice for the SPPT test this is not true, oftentimes we have two crime samples (or crime vs police actions like stops), which have very different total baseline N’s. (It is part of the reason the test is useful, it doesn’t make so much sense in that case to compare densities as it does proportions.) So subsequently when we do these estimates, we should either take the average of the total number of crimes we have in our two point patterns for SPPT (if they are close to the same size), or the minimum number of events if they are very disparate. So if you have in sample A 1000 crimes, and sample B 2000 crimes, I think you should treat the N in this scenario as 1500. If you have 5000 crimes vs 1000000 crimes, you should treat N here as 5000.

So that estimate above is for one set of crimes (1000), and one set of areas (600). But what if we vary the number of areas? At what number of areas do we have the maximum power?

So I provide functions below to generate the power estimate curve, given these assumptions about the underlying crime distribution (which will generally be in the ballpark for many crime patterns, but not perfect), for varying numbers of spatial units. Typically we know the total number of crimes, so we are saying given I have N crimes, how finely can a split them up to check for differences with the SPPT test.

Both the Malleson and Ramos article place their recommendations in terms of area instead of total number of units. But it would not surprise me if our different procedures end up resulting in similar recommendations based on the observed outputs of each of the papers. (The 2A/N quadrat analysis suggestion translates to N*0.5 total number of areas, pretty close to my 0.45*N suggestion for example.)

R Code

Below I have a nicer function to do the analysis I walked through above, but give a nice power curve and dataframe over various potential spatial sample sizes:

# SPPT Power analysis example
library(ggplot)

# See https://andrewpwheeler.com/2015/01/03/translating-between-the-dispersion-term-in-a-negative-binomial-regression-and-random-variables-in-spss/
trans_np <- function(mu,disp){
    a <- disp
    x <- mu^2/(1 - mu + a*mu)
    p <- x/(x + mu)
    n <- (mu*p)/(1-p)
    return(c(n,p))
}

diff_suggest <- function(total_crimes,areas=round(seq(2,total_crimes*10,length.out=500)),
                         change_quant=c(0.98,0.5), nb_disp=2, alpha = 0.05,
                         plot=TRUE){
         # Figure out mean
         areas <- unique(areas)
         mean_cr <- total_crimes/areas
         # Initialize some vectors to place the results
         n_areas <- length(areas)
         power <- vector("numeric",length=n_areas)
         hign <- power
         lown <- power
         higp <- power
         lowp <- power
         # loop over areas and calculate power
         for (i in 1:length(areas)){
             # Negative binomial parameters
             dp <- trans_np(mean_cr[i],nb_disp)
             hilo <- qnbinom(change_quant, size=dp[1], prob=dp[2])
             hilo_prop <- hilo/total_crimes
             # Power for test
             pow <- power.prop.test(n = total_crimes, p1 = hilo_prop[1], p2 = hilo_prop[2], 
                                    sig.level = alpha)
             # Stuffing results in vector
             power[i] <- pow$power
             hign[i] <- hilo[1]
             lown[i] <- hilo[2]
             higp[i] <- hilo_prop[1]
             lowp[i] <- hilo_prop[2]
         }
         Ncrimes <- rep(total_crimes,n_areas)
         res_df <- data.frame(Ncrimes,areas,mean_cr,power,hign,lown,higp,lowp)
         # replacing missing with 0
         res_df[is.na(res_df)] <- 0
         if (plot) {
            require(ggplot2)
            fmt_cr <- formatC(total_crimes, format="f", big.mark=",", digits=0)
            title_str <- paste0("Power per area for Total number of crimes: ",fmt_cr)
            cap_str <- paste0("NB Dispersion = ",nb_disp,", alpha = ",alpha,
                              ", change quantiles = ",change_quant[1]," to ",change_quant[2])
            p <- ggplot(data=res_df,aes(x=areas,y=power)) + geom_line(size=1.5) +
                 theme_bw() + theme(panel.grid.major = element_line(linetype="dashed")) +
                 labs(x='Number of Areas',y=NULL,title=title_str,caption=cap_str) +
                 scale_x_continuous(minor_breaks=NULL) + scale_y_continuous(minor_breaks=NULL) +
                 theme(text = element_text(size=16), axis.title.y=element_text(margin=margin(0,10,0,0)))
            print(p)
         }
         return(res_df)
}

Once that function is defined, you can make a simple call like below, and it gives you a nice graph of the power given different numbers of grid cells:

diff_suggest(100)

So you can see here we never have very high power over this set of parameters. It is also non-monotonic and volatile at very small numbers of spatial units of analysis (at which the overdispersion assumption likely does not hold, and probably is not of much interest). But once that volatility tamps down we have stepped curve, that ends up happening to step whenever the original NB distribution changes from particular integer values.

So what happens with the power curve if we up the number of crimes to 3000?

diff_suggest(3000)

So those two patterns are quite similar. It happens that when breaking down to the smaller units, the highest power scenario is when the crimes are subdivided into around 0.85 fewer spatial units than total crimes. So if you have 1000 crimes, in this scenario I would suggest to use 850 areas.

Also note the behavior when you break it down into a very large number of spatial units S, where S >> N, you get a progressive decline until around 0 power in this analysis. E.g. if you have 100 times more spatial units than observations, only a handful of locations have any crimes, and the rest are all 0’s. So you need to be able to tell the difference between 1/N and 0, which is tough (and any inferences you do make will just pretty much be indistinguishable from noise).

What about if we change the quantiles we are examining, and instead of looking at the very high crime place to the median, look at the 80th percentile to the 20th percentile:

diff_suggest(3000, change_quant=c(0.8,0.2))

We have a similar step pattern, but here the power is never as high as before, and only maxes out slightly above 0.5. It happens that this 0.5 power is around number of crimes*0.45. So this suggests that to uncover more middling transitions, one would need to have less than half the number of spatial units of crime observed. E.g. if you have 1000 crimes, I would not suggest any more than 450 spatial units of analysis.

So the first scenario, crimes*0.85 you could say something like this is the highest power scenario to detect changes from very high crime locations (aka hot spots), to the middle of the distribution. For the second scenario (my preferred offhand), is to say crimes*0.45 total spatial units results in the highest power scenario to detect more mild changes in the middle of the distribution (and detecting changes from hot spots to cold spots thus have even more power).

For now that is the best advice I can give for determining the spatial sample size for the SPPT test. Will have a follow up blog post on using R to make a grid to conduct the test.

Also I have been wondering about the best way to quantify changes in the overall ranking. I have not come upon a great solution I am happy with though, so will need to think about it some more.

References

CCTV and clearance rates paper published

My paper with Yeondae Jung, The effect of public surveillance cameras on crime clearance rates, has recently been published in the Journal of Experimental Criminology. Here is a link to the journal version to download the PDF if you have access, and here is a link to an open read access version.

The paper examines the increase in case clearances (almost always arrests in this sample) for incidents that occurred nearby 329 public CCTV cameras installed and monitored by the Dallas PD from 2014-2017. Quite a bit of the criminological research on CCTV cameras has examined crime reductions after CCTV installations, which the outcome of that is a consistent small decrease in crimes. Cameras are often argued to help solve cases though, e.g. catch the guy in the act. So we examined that in the Dallas data.

We did find evidence that CCTV increases case clearances on average, here is the graph showing the estimated clearances before the cameras were installed (based on the distance between the crime location and the camera), and the line after. You can see the bump up for the post period, around 2% in this graph and tapering off to an estimate of no differences before 1000 feet.

When we break this down by different crimes though, we find that the increase in clearances is mostly limited to theft cases. Also we estimate counterfactual how many extra clearances the cameras were likely to cause. So based on our model, we can say something like, a case would have an estimated probability of clearance without a camera of 10%, but with a camera of 12%. We can then do that counterfactual for many of the events around cameras, e.g.:

Probability No Camera   Probability Camera   Difference
    0.10                      0.12             + 0.02
    0.05                      0.06             + 0.01
    0.04                      0.10             + 0.06

And in this example for the three events, we calculate the cameras increased the total expected number of clearances to be 0.02 + 0.01 + 0.06 = 0.09. This marginal benefit changes for crimes mostly depends on the distance to the camera, but can also change based on when the crime was reported and some other covariates.

We do this exercise for all thefts nearby cameras post installation (over 15,000 in the Dallas data), and then get this estimate of the cumulative number of extra theft clearances we attribute to CCTV:

So even with 329 cameras and over a year post data, we only estimate cameras resulted in fewer than 300 additional theft clearances. So there is unlikely any reasonable cost-benefit analysis that would suggest cameras are worthwhile for their benefit in clearing additional cases in Dallas.

For those without access to journals, we have the pre-print posted here. The analysis was not edited any from pre-print to published, just some front end and discussion sections were lightly edited over the drafts. Not sure why, but this pre-print is likely my most downloaded paper (over 4k downloads at this point) – even in the good journals when I publish a paper I typically do not get 1000 downloads.

To go on, complaint number 5631 about peer review – this took quite a while to publish because it was rejected on R&R from Justice Quarterly, and with me and Yeondae both having outside of academia jobs it took us a while to do revisions and resubmit. I am not sure the overall prevalence of rejects on R&R’s, I have quite a few of them though in my career (4 that I can remember). The dreaded send to new reviewers is pretty much guaranteed to result in a reject (pretty much asking to roll a Yahtzee to get it past so many people).

We then submitted to a lower journal, The American Journal of Criminal Justice, where we had reviewers who are not familiar with what counterfactuals are. (An irony of trying to go to a lower journal for an easier time, they tend to have much worse reviewers, so can sometimes be not easier at all.) I picked it up again a few months ago, and re-reading it thought it was too good to drop, and resubmitted to the Journal of Experimental Criminology, where the reviews were reasonable and quick, and Wesley Jennings made fast decisions as well.

Some microsynth notes

Nate Connealy, a criminologist colleague of mine heading to Tampa asks:

My question is from our CPP project on business improvement districts (Piza, Wheeler, Connealy, Feng 2020). The article indicates that you ran three of the microsynth matching variables as an average over each instead of the cumulative sum (street length, percent new housing structures, percent occupied structures). How did you get R to read the variables as averages instead of the entire sum of the treatment period of interest? I have the microsynth code you used to generate our models, but cannot seem to determine how you got R to read the variables as averages.

So Nate is talking about this paper, Crime control effects of a police substation within a business improvement district: A quasi-experimental synthetic control evaluation (Piza et al., 2020), and here is the balance table in the paper:

To be clear to folks, I did not balance on the averages, but simply reported the table in terms of averages. So here is the original readout from R:

So I just divided those noted rows by 314 to make them easier to read. You could divide values by the total number of treated units though in the original data to have microsynth match on the averages instead if you wanted to. Example below (this is R code, see the microsynth library and paper by Robbins et al., 2017):

library(microsynth)
#library(ggplot2) #not loading here, some issue
set.seed(10)

data(seattledmi) #just using data in the package
cs <- seattledmi
# calculating proportions
cs$BlackPerc <- (cs$BLACK/cs$TotalPop)*100
cs$FHHPerc <- (cs$FEMALE_HOU/cs$HOUSEHOLDS)*100
# replacing 0 pop with 0
cs[is.na(cs)] <- 0

cov.var <- c("TotalPop","HISPANIC","Males_1521","FHHPerc","BlackPerc")
match.out <- c("i_felony", "i_misdemea")

sea_prop <- microsynth(cs, 
                       idvar="ID", timevar="time", intvar="Intervention", 
                       start.pre=1, end.pre=12, end.post=16, 
                       match.out.min=match.out,match.out=FALSE,
                       match.covar=FALSE,check.feas=FALSE,
                       match.covar.min=cov.var, 
                       result.var=match.out)

summary(sea_prop) # balance table

And here you can see that we are matching on the cumulative sums for each of the areas, but we can divide our covariates by the number of treated units, and we will match on the proportional values.

# Can divide by 39 and get the same results
cs[,cov.var] <- cs[,cov.var]/39

sea_div <- microsynth(cs, 
                      idvar="ID", timevar="time", intvar="Intervention", 
                      start.pre=1, end.pre=12, end.post=16, 
                      match.out.min=match.out,match.out=FALSE,
                      match.covar=FALSE,check.feas=FALSE,
                      match.covar.min=cov.var, 
                      result.var=match.out)

summary(sea_div) # balance table

Note that these do not result in the same weights. If you look at the results you will see the treatment effects are slightly different. Also if you do:

# Showing weights are not equal
all.equal(sea_div$w$Weights,sea_prop$w$Weights)

It does not return True. Honestly not familiar enough with the procedure that microsynth uses to do the matching (Raking survey weights) to know if this is due to stochastic stuff or due to how the weighting algorithm works (I would have thought a linear change does not make a difference, but I was wrong).

On the bucket list is to do a matching algorithm that returns geographically contiguous areas and gives the weights all values of 1 (so creates comparable neighborhoods), instead of estimating Raking weights. That may be 5 years though before I get around to that. Gio has a nice map to show the way the weights work now is they may be all over the place (Circo et al., 2021) – I am not sure that is a good thing though.

But I did want to share some functions I used for the paper I worked with Nate on. First, this is for if you use the permutation approach, the function prep_synth returns some of the data in a nicer format to make graphs and calculate your own stats:

# Function to scoop up the data nicely
prep_synth <- function(mod){
    #Grab the plot data
    plotStats <- mod[['Plot.Stats']]
    #For the left graph
    Treat <- as.data.frame(t(plotStats$Treatment))
    Treat$Type <- "Treat"
    #This works for my data at years, will not 
    #Be right for data with more granular time though
    Treat$Year <- as.integer(rownames(Treat))
    Cont <- as.data.frame(t(plotStats$Control))
    Cont$Type <- "Control"
    Cont$Year <- as.integer(rownames(Cont))
    AllRes <- rbind(Treat,Cont)
    #For the right graph
    Perm <- as.data.frame(t(as.data.frame(plotStats$Difference)))
    SplitStr <- t(as.data.frame(strsplit(rownames(Perm),"[.]")))
    colnames(SplitStr) <- c("Type","Year")
    rownames(SplitStr) <- 1:nrow(SplitStr)
    SplitStr <- as.data.frame(SplitStr)
    Perm$Type <- as.character(SplitStr$Type)
    Perm$Year <- as.integer(as.character(SplitStr$Year))
    Perm$Group <- ifelse(Perm$Type == 'Main','Treatment Effect','Permutations') 
    #Reordering factor levels for plots
    AllRes$Type <- factor(AllRes$Type,levels=c('Treat','Control'))
    levels(AllRes$Type) <- c('Treated','Synthetic Control')
    Perm$Group <- factor(Perm$Group,levels=c('Treatment Effect','Permutations'))
    #Exporting result
    Res <- vector("list",length=2)
    Res[[1]] <- AllRes
    Res[[2]] <- Perm
    names(Res) <- c("AggOutcomes","DiffPerms")
    return(Res)
}

It works for the prior tables, but I really made these functions to work with when you used permutations to get the errors. (In the micro synth example, it is easier to work with permutations than in the state level example for synth, in which I think conformal prediction intervals makes more sense, see De Biasi & Circo, 2021 for a recent real example with micro place based data though.)

# Takes like 1.5 minutes
sea_perm <- microsynth(seattledmi, 
                      idvar="ID", timevar="time", intvar="Intervention", 
                      start.pre=1, end.pre=12, end.post=16, 
                      match.out.min=match.out,match.out=FALSE,
                      match.covar=FALSE,check.feas=FALSE,
                      match.covar.min=cov.var, 
                      result.var=match.out, perm=99)

res_prop <- prep_synth(sea_perm)
print(res_prop)

So the dataframe in the first slot is the overall treatment effect, and the second dataframe is a nice stacked version for the permutations. First, I really do not like the percentage change (see Wheeler, 2016 for the most direct critique, but I have a bunch on this site). So I wrote code to translate the treatment effects into crime count reductions instead of the percent change stuff.

# Getting the observed treatment effect on count scale
# vs the permutations

agg_fun <- function(x){
    sdx <- sd(x)
    minval <- min(x)
    l_025 <- quantile(x, probs=0.025)
    u_975 <- quantile(x, probs=0.975)
    maxval <- max(x)
    totn <- length(x)
    res <- c(sdx,minval,l_025,u_975,maxval,totn)
    return(res)
}

treat_count <- function(rp){
    # Calculating the treatment effect based on permutations
    keep_vars <- !( names(rp[[2]]) %in% c("Year","Group") )
    out_names <- names(rp[[2]])[keep_vars][1:(sum(keep_vars)-1)]
    loc_dat <- rp[[2]][,keep_vars]
    agg_treat <- aggregate(. ~ Type, data = loc_dat, FUN=sum)
    n_cols <- 2:dim(agg_treat)[2]
    n_rows <- 2:nrow(agg_treat)
    dif <- agg_treat[rep(1,max(n_rows)-1),n_cols] - agg_treat[n_rows,n_cols]
    dif$Const <- 1
    stats <- aggregate(. ~ Const, data = dif, FUN=agg_fun)
    v_names <- c("se","min","low025","up975","max","totperm")
    long_stats <- reshape(stats,direction='long',idvar = "Const", 
                      varying=list(2:ncol(stats)),
                      v.names=v_names, times=out_names)
    # Add back in the original stats
    long_stats <- long_stats[,v_names]
    rownames(long_stats) <- 1:nrow(long_stats)
    long_stats$observed <- t(agg_treat[1,n_cols])[,1]
    long_stats$outcome <- out_names
    ord_vars <- c('outcome','observed',v_names)
    return(long_stats[,ord_vars])
}

treat_count(res_prop)

So that is the cumulative total effect of the intervention. This is more similar to the WDD test (Wheeler & Ratcliffe, 2018), but since the pre-time period is matched perfectly, just is the differences in the post time periods. And here it uses the permutations to estimate the error, not any Poisson approximation.

But I often see folks concerned about the effects further out in time for synthetic control studies. So here is a graph that just looks at the instant effects for each time period, showing the difference via the permutation lines:

# GGPLOT graphs, individual lines
library(ggplot2)
perm_data <- res_prop[[2]]
# Ordering factors to get the treated line on top
perm_data$Group <- factor(perm_data$Group, c("Permutations","Treatment Effect"))
perm_data$Type <- factor(perm_data$Type, rev(unique(perm_data$Type)))
pro_perm <- ggplot(data=perm_data,aes(x=Year,y=i_felony,group=Type,color=Group,size=Group)) + 
            geom_line() +
            scale_color_manual(values=c('grey','red')) + scale_size_manual(values=c(0.5,2)) +
            geom_vline(xintercept=12) + theme_bw() + 
            labs(x=NULL,y='Felony Difference from Control') + 
            scale_x_continuous(minor_breaks=NULL, breaks=1:16) + 
            scale_y_continuous(breaks=seq(-10,10,2), minor_breaks=NULL) +
            theme(panel.grid.major = element_line(linetype="dashed"), legend.title= element_blank(),
            legend.position = c(0.2,0.8), legend.background = element_rect(linetype="solid", color="black")) +
            theme(text = element_text(size=16), axis.title.y=element_text(margin=margin(0,10,0,0)))

And I also like looking at this for the cumulative effects as well, which you can see with the permutation lines widen over time.

# Cumulative vs Pointwise
perm_data$csum_felony <- ave(perm_data$i_felony, perm_data$Type, FUN=cumsum)
pro_cum  <- ggplot(data=perm_data,aes(x=Year,y=csum_felony,group=Type,color=Group,size=Group)) + 
              geom_line() +
              scale_color_manual(values=c('grey','red')) + scale_size_manual(values=c(0.5,2)) +
              geom_vline(xintercept=12) + theme_bw() + 
              labs(x=NULL,y='Felony Difference from Control Cumulative') + 
              scale_x_continuous(minor_breaks=NULL, breaks=1:16) + 
              scale_y_continuous(breaks=seq(-20,20,5), minor_breaks=NULL) +
              theme(panel.grid.major = element_line(linetype="dashed"), legend.title= element_blank(),
              legend.position = c(0.2,0.8), legend.background = element_rect(linetype="solid", color="black")) +
              theme(text = element_text(size=16), axis.title.y=element_text(margin=margin(0,10,0,0)))

If you do a ton of permutations (say 999 instead of 99), it would likely make more sense to do a fan chart type error bars and show areas of different percentiles instead of each individual line (Yim et al., 2020).

I will need to slate a totally different blog post to discuss instant vs cumulative effects for time series analysis. Been peer-reviewing quite a few time series analyses of Covid and crime changes – most everyone only focuses on instant changes, and does not calculate cumulative changes. See for example estimating excess deaths for the Texas winter storm power outage (Aldhous et al., 2021). Folks could do similar analyses for short term crime interventions. Jerry has a good example of using the Causal Impact package to estimate cumulative effects for a gang takedown intervention (Ratcliffe et al., 2017) for one criminal justice example I am familiar with.

Again for folks feel free to ask me anything. I may not always be able to do as deep a dive as this, but always feel free to reach out.

References