The value of a PhD

For my current work as a data scientist, I spend most of my time writing SQL queries, generating some sort of predictive model on that data using python, and automating those data pipelines using additional command line scripts. Pretty much nothing coding wise I do on a day to day basis I learned in my entire educational career.

The only specific coding classes I took in school were SAS in undergrad and SPSS in grad. All other coding was in Stata and a very tiny bit in R, both incidental to statistical classes. Even those should hardly count, as all it entails is load a dataset and run reg y x or something similar.

That focuses on the software engineering side – the other side of being a data scientist is essentially being an applied mathematician. That may sound fancy, but the work I do I like to think is more akin to accounting with probabilities (where I have to personally create models to estimate the probabilities). While I had extensive quantitative training in graduate school, again nothing I was taught even remotely resembles the mathematics I use on a regular basis at my job.

My social science education entirely focused on causal inference, estimating parameters on the right hand side of the regression equation. I did not cover prediction/forecasting/machine-learning one iota in my classes. I did not even have any classes on cost-benefit analysis, which is more akin to me calculating potential return on investment when I am creating new machine learning models for my company.

The only thing I do regularly at my job you could reasonably point to specific educational training/prep on was presenting results in PowerPoint presentations.

That being said, no way I would be in my current position if I did not have a PhD. For a potential counter-factual, I debated on dropping out of undergrad at one point and going to community college to install HVAC systems. I feel pretty comfortable assuming I would not have ended up as a data scientist if I took that career path. (Before you think to poo-poo on that career path choice, it is easily possible my personal net worth would be in the same ballpark at this point in my life in that counter-factual installing HVAC world. There are significant opportunity costs you are eating when you pursue a PhD.)

So what exactly was the value of my PhD? While you take some classes as a PhD student, I don’t see the main benefit of those as being vocational in nature. When pursuing a PhD it is a full time endeavor, and it is the entire environment that marks it as a major difference from undergraduate education. Pretty much every conversation you have as a PhD student is focused on science.

A second major difference is that you are not a passive consumer of scientific research – you have bridged to becoming a producer of that knowledge. A PhD dissertation by its nature is very sink or swim – you are expected to come up with a particular research topic/agenda, and conduct the appropriate analysis to investigate that particular topic, then share your results with the world. This is very different than working in a job where someone tells you what to do – you show up in the morning and you have 100% latitude to pursue whatever you want.

These two things together I believe are where the value lies in a PhD. The independence necessary to be a successful in a PhD is by its nature not something you can get via prior work experience (unless you count say starting your own business). This coupled with the scientific environment provides an atmosphere where constant learning is necessary to get to the finish line of the dissertation. Even if I still was an academic, it is always necessary for me to consume new material, teach myself new things, and apply that to the work I am pursuing.

So while I did not learn python programming or machine learning in grad school, I just go out, try to consume as much as I can on the material, and apply that knowledge to solve the current problems I am dealing with. There will always be something new I need to teach myself while I am still working, but that is OK. I have the means to teach myself those things from my PhD experience. I am not sure I would have really ever gotten to that point just by focusing on vocational aspects (e.g. taking classes on machine learning or programming) – I think I only got to that point by having to pursue my own independent research.


I’ve been musing this more as potential students ask me whether it is worth it to pursue a PhD. I have mixed feelings, but have settled on this simple dichotomy – if you are only pursuing a PhD because you want to teach, I have grave reservations against recommending a PhD. The supply for these professor positions greatly outpace the demand from universities. So even if you do well as a student, there is no guarantee you will get a tenure track position. In the current market where there are dozens of really good candidates for any position, network effects can dominate that decision.

But, if you are more open to other potential positions, such as public sector researcher positions, think tanks, or private sector data science, I feel more comfortable in saying going for the PhD is a reasonable career choice.

Unfortunately, current education in terms of preparing you to be competitive for private sector data science is somewhat lacking across the social sciences. As I stated at the beginning of this post, I did not personally learn any of the tools I use regularly at my job via traditional education, but more as ancillary to my particular research interests. To follow in my path, the research you pursue needs to somewhat match the skills the current market wants, and these include:

  • predictive modeling (e.g. tree based models, boosted models, deep learning)
  • legitimate coding skills in python/R, as well as tools like git/Docker
  • working with moderately large datasets (SQL, Hadoop, or online AWS)
  • data visualization to explain results/models

I am hoping my former colleagues in social sciences will do a better job of expanding the graduate curricula to better teach these skills. They have utility for the more traditional research as well. I am not holding my breath though for that. So in the meantime if you are pursuing a PhD in the social sciences, and you want to pursue a data science job (or simply hedge in case you cannot land a tenure track gig), these are skills you need to develop on your own while also doing your PhD.

Comparing the WDD vs the Wilson log IRR estimator

So this is maybe my final post on the WDD estimator for the time being (Wheeler & Ratcliffe, 2018). Recently David Wilson had an article in JQC that proposes a different estimator using the same basic information, just pre-post crime counts for treated and control areas (Wilson, 2021). So say we had the table:

         Pre   Post
Treated   50     30
Control   60     55

So in this scenario, my WDD estimate is -20 in the treated area, and -5 in the control area, so the overall estimate is -20 – -5 = -15.

30 - 50 - (55 - 60) = -15

So an estimated reduction of -15 crimes overall. David’s estimator is the logged incident rate ratio (IRR), and so is just like above, except logs all of the values:

log(30) - log(50) - ( log(55) - log(60) ) = -0.4238142

This is a logged incident rate adjustment, so most of the time people exponentiate this value, which is exp(-0.4238142) = 0.6545455. So this suggests crime is reduced by approximately 35% in the treated area relative to the control area in this hypothetical. Or another way to write it is (30/50)/(55/60) = 0.6545455.

So instead of a linear estimate of the total numbers of crimes reduced, this is an estimate of the overall rate reduction. So this begs the question when would you prefer my WDD vs the IRR? I will try to answer that below – in short I think David’s estimator makes sense for meta-analyses (as I have said before in reference to the work in Braga & Weisburd, 2020). But for an individual agency doing an experimental evaluation I much prefer my estimator. The skinny of this logic is that we only really care about the overall crime reduction estimate from a cost-benefit analysis perspective. Backing out this total crime reduction count estimate from David’s IRR estimate can result in some funny business for an individual study.

Identifying Assumptions

So there are really two different assumptions my WDD estimator and David’s IRR estimator make. To generate a standard error estimate around the point estimate for either estimator, both require the data are Poisson distributed. So that makes no difference between the two. The assumption that really distinguishes between the WDD and the IRR estimate is the parallel trends assumption. The WDD assumes parallel trends are on the linear scale, whereas the IRR assumes parallel trends are on the ratio scale.

What exactly does this mean? Imagine we have a treated and control area, but look at the crime trends per time period before the treatment occurred. This set of areas has a set of parallel trends on the linear scale:

Time Treated Control
 0     50      60
 1     40      50
 2     45      55
 3     50      60

When the treated area goes down by 10 crimes, the control area goes down by 10 crimes. That is a parallel on the linear scale. Whereas this scenario is parallel on the ratio scale:

Time Treated Control
 0     50      60
 1     40      48
 2     45      54
 3     50      60

When crime goes down by 20% in the treated area, it goes down by 20% in the control area.

So while this gives a potential way to say you should use the WDD (parallel on the linear scale), or the IRR (parallel on the ratio scale), in practice it is not so simple. For one thing, if you only has the pre/post counts of crime, you cannot distinguish between these two scenarios. You can only tell in the case you have historical data to examine.

For a second part of this, you typically can choose your own control area (see for example the synthetic control estimator). So in most scenarios you could choose a control area to obey the linear or the ratio parallel trends assumption if you wanted to. However it may be in many scenarios there is a natural/easy control area, and you may see what is a better fit in that case for linear/ratio.

A final wee bit of a perverse aspect about this I will mention – pretend we have a treated/control area have approximately the same baseline crime counts/rates:

Time Treated Control
 0      30     30
 1      25     25
 2      20     20
 3      25     25

You actually cannot tell in this scenario whether the parallel trends are on the linear scale for my WDD or the ratio scale for the IRR estimate. They are consistent with either! In practice I think in many cases it will be like this – with noisy data, if you choose a control area that has approximately the same baseline crime counts, it will be quite hard to tell whether the linear parallel trends makes more sense or the ratio parallel trends makes more sense.

There are situations where the linear changes do not make sense, but they tend to be scenarios such as the control area has very little crime (so cannot go below 0 to match larger ups/downs in the treated area). So in that case sure the IRR is plausible and the WDD is not, but those are cases where the control area itself is quite questionable. Also note the IRR is not defined for any cells with 0 crimes – but again it is not good for either of our estimators in that case (although mine won’t fail to spit out a number, the power is so low the number it spits out won’t be worth much).

Bias/Coverage

So I have adapted the same simulation code I used in prior studies/blog posts to evaluate the null distribution and the coverage of David’s IRR estimator. I partly did not pursue it initially back when me and Jerry were discussing this idea, because I thought it would be biased. Generalized linear models are based on maximum likelihood estimators, which are only asymptotically valid. In short it appears I was wrong here and David’s IRR estimator is fine even with just four observations, at least for the handful of scenarios I have tried it (have not looked at very tiny counts of crime, it is undefined if any cell has 0 crimes, as you cannot take the log of 0).

Python code pasted at the very end of the blog post, but for example if we generate a set of null no changes pre/post with a baseline of 50 crimes, the logged irr estimate (converted into a z-score here) is just fine and dandy and has a very close to standard normal distribution based on 10k simulations.

So lets look at the scenario where the control area doesn’t change, but the treated area goes from 50 to 30. We can see again the point estimate in this scenario is spot on the money.

And then we can see the coverage of the logged irr estimator is spot on as well:

So if you are interested in slightly different baseline scenarios, you can use that same simulation code to check out the behavior of David’s estimator and conduct simulated power analysis the same way I have shown for the WDD estimator in prior blog posts.

So if both are unbiased and have good coverage again, why would you prefer the WDD estimator over the IRR estimator (or vice-versa)? Well, lets take the 35% reduction I talked about at the beginning of the post, and the department needs to spend $250k on extra officers to conduct whatever hot spot policing intervention. A 35% reduction may be worth it if we start with a baseline of 200 crimes (so would expect to go down to 130, for a reduction of 70 crimes). If the baseline is 20 crimes, it goes down to 13 crimes (so only a reduction of 7 crimes). The actual benefit of the IRR estimate is entirely dependent on the baseline count of crimes it is applied to.

Even if the IRR estimate is itself unbiased and has proper coverage, for even an individual study backing out the estimated reduction in total crimes from the IRR is biased. So here in this same simulated data (50 to 30 in treated, and 50 to 50 in control areas). The true count reduction is -20, and here is the point estimate on the X axis and the length of the confidence interval for each simulation on the Y axis for my WDD test. You can see they are nicely centered on -20, and the length of the confidence intervals has a very tiny variance – they are mostly just a smidge over 50 in total length. So that is probably tough to wrap your head around, but the variance of the variance estimates for the WDD are small.

Now lets do the same graph for the IRR estimate, but translated back out to a count crime reduction based on the simulated values:

We either have a ton of bias in this estimate (if the estimate of the count reduction is too large, the confidence interval is too small). Or the opposite, the estimate of the count reduction is too small, and the confidence interval is crazy wide. In Andrew Gelman’s terminology, it can result in pretty large type M (magnitude) errors in this simulated example (Gelman & Carlin, 2014). So the variance of the variance estimates in this scenario are quite large.

To be clear – if you are interested in estimating a percent reduction, by all means use David’s IRR estimator. If you however want to translate that percent reduction into an estimate of the total crimes reduced though you should use my WDD estimator in that case. You should not back out a total crimes reduced estimate from the IRR.

Final Thoughts

So I have said a few times I think the IRR estimator makes more sense for meta-analyses. Why do I think that? Well, imagine we have an underlying causal process through which a hot spots policing experiment can randomly deter/prevent a particular proportion of crimes. That underlying causal process suggests an IRR effect. And also the problem I mention with translating back to crime counts I believe should get smaller with tighter estimates.

For a causal process that is more akin to my WDD estimator, imagine some crimes will always be deterred/prevented from a hot spots policing experiment, and some will never be. And we don’t know up-front which is which, so the observed reduction is based on whatever mixture of the two we have at that particular location.

The proportion reduction seems to make more sense to me for active patrol type interventions (which are ephemeral) vs permanent CPTED like interventions which should prevent certain criminal acts in perpetuity. But of course any situation in the real world could have both occurring at the same time.

When you go and look at the meta-analysis of hot spots policing, those interventions are all over the place (Hinkle et al., 2020). I think my WDD estimate would not make sense to mash up into a final meta-analytic estimate. The IRR may not make sense either in the end, but it is plausibly more relevant to compare the IRRs from a study with a baseline of 200 crimes vs one with 40 crimes at baseline. I am not sure it makes sense to compare WDDs in that scenario. But that being said, a few of my blog posts have discussed the WDD normalized per unit area or per unit time. Those normalized estimates are probably more apples to apples in the 200 vs 40 scenario.

A final note I have not discussed here is that David discusses a correction for overdispersion, so that is a potential feather in the cap for his estimator vs the WDD. I’d be a bit hesitant though with that – only four observations to estimate the dispersion term is slicing it a bit thin IMO. But I was wrong about the original estimator, so I may be wrong about that as well. It will take simulation evidence to determine that though – David’s paper just provides the correction term, he doesn’t provide evidence for its utility with small sample data.

And to be fair I have not done simulations to see how my estimator behaves in the presence of overdispersion either. I believe it will simply just cause the standard errors to be too small, so like in Wheeler (2016), I imagine it will just require upping the interval (e.g. use a z-score of 3 instead of 2) to get proper coverage for real crime data.

References

Other Posts of Interest

Python simulation code

Here is a copy-pasted chunk of the entire python simulation code.

'''
Comparing WDD to log(IRR) from Wilson's
recent paper, https://link.springer.com/article/10.1007/s10940-021-09494-w

Andy Wheeler
'''

import pandas as pd
import numpy as np
from scipy.stats import norm
from scipy.stats import poisson
from scipy.stats import uniform
import matplotlib
import matplotlib.pyplot as plt
import os
my_dir = r'D:\Dropbox\Dropbox\Documents\BLOG\wdd_vs_irr'
os.chdir(my_dir)

#########################################################
#Settings for matplotlib

andy_theme = {'axes.grid': True,
              'grid.linestyle': '--',
              'legend.framealpha': 1,
              'legend.facecolor': 'white',
              'legend.shadow': True,
              'legend.fontsize': 14,
              'legend.title_fontsize': 16,
              'xtick.labelsize': 14,
              'ytick.labelsize': 14,
              'axes.labelsize': 16,
              'axes.titlesize': 20,
              'figure.dpi': 100}

matplotlib.rcParams.update(andy_theme)
#########################################################


#This works for the scipy functions as well
np.random.seed(seed=10)

# A function to generate the WDD estimate for simulated data
def wdd_sim(treat0,treat1,cont0,cont1,pre,post):
    tr_cr_0 = poisson.rvs(mu = treat0, size=int(pre)).sum()
    co_cr_0 = poisson.rvs(mu = cont0, size=int(pre)).sum()
    tr_cr_1 = poisson.rvs(mu = treat1, size=int(post)).sum()
    co_cr_1 = poisson.rvs(mu = cont1, size=int(post)).sum()
    # WDD estimates
    est = ( tr_cr_1/post - tr_cr_0/pre ) - ( co_cr_1/post - co_cr_0/pre )
    post2 = (1/post)**2
    pre2 = (1/pre)**2
    var_est = tr_cr_0*pre2 + tr_cr_1*post2 + co_cr_0*pre2 + co_cr_1*post2
    true_val = ( treat1 - treat0 ) - ( cont1 - cont0 )
    z_score = est / np.sqrt(var_est)
    # Wilson log IRR estimates
    true_logirr = np.log( (treat1*cont0) / (cont1*treat0) )
    est_logirr = np.log( ((tr_cr_1/post)*(co_cr_0/pre)) / ( (co_cr_1/post)*(tr_cr_0/pre) ) )
    se_logirr = np.sqrt( 1/tr_cr_1 + 1/co_cr_0 + 1/co_cr_1 + 1/tr_cr_0 )
    z_logirr = est_logirr / se_logirr
    return (tr_cr_0, co_cr_0, tr_cr_1, co_cr_0, est, var_est, true_val, z_score, true_logirr, est_logirr, se_logirr, z_logirr)

def make_data(n, treat0, treat1, cont0, cont1, pre, post):
    base = pd.DataFrame( range(n), columns=['index'])
    base['treat0'] = treat0
    if treat1 is not None:
        base['treat1'] = treat1
    else:
        base['treat1'] = base['treat0']
    if cont0 is not None:
        base['cont0'] = cont0
    else:
        base['cont0'] = base['treat0']
    if cont1 is not None:
        base['cont1'] = cont1
    else:
        base['cont1'] = base['cont0']
    base.drop(columns='index',inplace=True)
    base['pre'] = pre
    base['post'] = post
    sim_vals = base.apply(lambda x: wdd_sim(**x), axis=1, result_type='expand')
    sim_vals.columns = ['sim_t0','sim_c0','sim_t1','sim_c1','est','var_est','true_val','z_score',
                        'true_logirr','est_logirr','se_logirr','z_logirr']
    return pd.concat([base,sim_vals], axis=1)

# Coverage of the log irr estimate
# Lets look at the coverage rate for a decline from 40 to 20
def cover_logirr(data, ci=0.95):
    mult = (1 - ci)/2
    nv = norm.ppf(1 - mult)
    dif = nv*data['se_logirr']
    low = data['est_logirr'] - dif
    high = data['est_logirr'] + dif
    cover = ( data['true_logirr'] > low) & ( data['true_logirr'] < high )
    return cover

# Length of ci for WDD
def len_ci(data, ci=0.95):
    mult = (1 - ci)/2
    nv = norm.ppf(1 - mult)
    dif = nv*np.sqrt( data['var_est'] )
    low = data['est'] - dif
    high = data['est'] + dif
    return low, high, high - low

# Length of ci for IRR estimate on count scale
# This depends on the baseline estimate to multiply
# The IRR by, using the baseline average of the 
# Treatment area

def len_irr(data, ci=0.95):
    mult = (1 - ci)/2
    nv = norm.ppf(1 - mult)
    dif = nv*data['se_logirr']
    low = data['est_logirr'] - dif
    high = data['est_logirr'] + dif
    baseline = data['sim_t0']/data['pre']
    # Even if you use hypothetical, the variance is quite high
    #baseline = data['treat0']
    est_count = baseline*np.exp(data['est_logirr']) - baseline
    c1 = baseline*np.exp(low) - baseline
    c2 = baseline*np.exp(high) - baseline
    return est_count, c1, c2, np.abs(c2 - c1)

##########################
# Example with no change, lets look at the null distribution
sim_n = 10000
no_diff = make_data(sim_n, 50, 50, 50, 50, 1, 1)
no_diff['z_logirr'].describe()
##########################

##########################
# Example with equal time periods, a reduction from 50 to 30 and 50 to 50 in control area
sim_dat = make_data(sim_n, 50, 30, 50, 50, 1, 1)
sim_dat[['true_logirr','est_logirr','se_logirr']].describe()

cl = cover_logirr(sim_dat)
cl.mean()

# Compare length of CI for IRR vs WDD

# WDD length
lowdd, highwdd, lwdd = len_ci(sim_dat)
lwdd.describe()

# IRR length on the count scale
est_cnt_irr, lo_irr, hi_irr, ln_irr = len_irr(sim_dat)
ln_irr.describe()

# Scatterplot of estimated count reduction vs
# Length of CI
fig, ax = plt.subplots(figsize=(8,6))
ax.scatter(est_cnt_irr, ln_irr, c='k', 
            alpha=0.1, s=4)
ax.set_axisbelow(True)
ax.set_xlabel('Estimated Count Reduction [IRR]')
ax.set_ylabel('Length of CI on count scale [IRR]')
plt.savefig('IRR_Len_Est.png', dpi=500, bbox_inches='tight')
plt.show()

# Lets compare to the WDD estimate
fig, ax = plt.subplots(figsize=(8,6))
ax.scatter(sim_dat['est'], lwdd, c='k', 
            alpha=0.1, s=4)
ax.set_axisbelow(True)
ax.set_xlabel('Estimated Count Reduction [WDD]')
ax.set_ylabel('Length of CI on count scale [WDD]')
plt.savefig('WDD_Len_Est.png', dpi=500, bbox_inches='tight')
plt.show()
##########################

How arrests reduce near repeats: Breaking the Chain paper published

My paper (with colleagues Jordan Riddell and Cory Haberman), Breaking the chain: How arrests reduce the probability of near repeat crimes, has been published in Criminal Justice Review. If you cannot access the peer reviewed version, always feel free to email and I can send an offprint PDF copy. (For those not familiar, it is totally OK/legal for me to do this!) Or if you don’t want to go to that trouble, I have a pre-print version posted here.

The main idea behind the paper is that crimes often have near-repeat patterns. That is, if you have a car break in on 100 1st St on Monday, the probability you have another car break in at 200 1st St later in the week is higher than typical. This is most often caused by the same person going and committing multiple offenses in a short time period. So a way to prevent that would on its face be to arrest the individual for the initial crime.

I estimate models showing the reduction in the probability of a near repeat crime if an arrest occurs, based on publicly available Dallas PD data (paper has links to replication code). Because near repeat in space & time is a fuzzy concept, I estimate models showing reductions in near repeats for several different space-time thresholds.

So here the model is Prob[Future Crime = I(time < t & distance < d)] ~ f[Beta*Arrest + sum(B_x*Control_x)] where the f function is a logistic function, and I plot the Beta estimates given different time and space look aheads. Points indicate statistical significance, so you can see they tend to be negative for many different crime and different specifications (with a linear coefficient of around -0.3).

Part of the reason I pursued this is that the majority of criminal justice responses to near repeat patterns in the past were target hardening or traditional police patrol. Target hardening (e.g. when a break in occurs, go to the neighbors and tell them to lock their doors) does not appear to be effective, but traditional patrol does (see the work of Rachel/Robert Santos for example).

It seems to me ways to increase arrest rates for crimes is a natural strategy that is worthwhile to explore for police departments. Easier said than done, but one way may be to prospectively identify incidents that are likely to spawn near repeats and give them higher priority in assigning detectives. In many urban departments, lower level property crimes are never assigned a detective at all.

Open Data and Reproducible Criminology Research

This is part of a special issue put together by Jonathan Grubb and Grant Drawve on spatial approaches to community violence. Jon and Grant specifically asked contributors to discuss a bit about open data standards and replication materials. I repost my thoughts on that here in full:

In reference to reproducibility of the results, we have provided replication materials. This includes the original data sources collated from open sources, as well as python, Stata, and SPSS scripts used to conduct the near-repeat analysis, prepare the data, generate regression models, and graph the results. The Dallas Police Department has provided one of the most comprehensive open sources of crime data among police agencies in the world (Ackerman & Rossmo, 2015; Wheeler et al., 2017), allowing us the ability to conduct this analysis. But it also identifies one particular weakness in the data as well – the inability to match the time stamp of the occurrence of an arrest to when the crime occurred. It is likely the case that open data sources provided by police departments will always need to undergo periodic revision to incorporate more information to better the analytic potential of the data.

For example, much analysis of the arrest and crime relationship relies on either aggregate UCR data (Chamlin et al., 1992), or micro level NIBRS data sources (Roberts, 2007). But both of these data sources lack specific micro level geographic identifiers (such as census tract or addresses of the events), which precludes replicating the near repeat analysis we conduct. If however NIBRS were to incorporate address level information, it would be possible to conduct a wide spread analysis of the micro level deterrence effects of arrests on near repeat crimes across many police jurisdictions. That would allow much broader generalizability of the results, and not be dependent on idiosyncratic open data sources or special relationships between academics and police departments. Although academic & police practitioner relationships are no doubt a good thing (for both police and academics), limiting the ability to conduct analysis of key policing processes to the privileged few is not.

That being said, currently both for academics and police departments there are little to no incentives to provide open data and reproducible code. Police departments have some slight incentives, such as assistance from governmental bodies (or negative conditions for funding conditional on reporting). As academics we have zero incentives to share our code for this manuscript. We do so simply because that is a necessary step to ensure the integrity of scientific research. Relying on the good will of researchers to share replication materials has the same obvious disadvantage that allowing police departments to pick and choose what data to disseminate does – it can be capricious. What a better system to incentivize openness may look like we are not sure, but both academics and police no doubt need to make strides in this area to be more professional and rigorous.

Podcast and Video Shout Outs

So y’all know I really enjoy blogs. So much so I think they often have a higher value added than traditional peer review papers. There are other mediums I would like to recognize, and those are Podcasts and video tutorials. So while I like to do lab tutorials (pretty much like my blog posts in which I step through some code), I know many students would prefer I do videos and lectures. And I admit some of these I have seen done quite well on Coursera for example.

Another source I have been consuming quite a bit lately are Podcasts. These often take the form of an interview. So are not technical in nature, but are more soft story telling, such as talking about a particular topical area the interviewee is expert in, or that persons career path. So here are my list of these resources I have personally learned from and enjoyed.

None of these I have listened/watched 100% of the offerings, but have listened/watch multiple episodes (and will continue to listen/watch more)! These are very criminal justice focused, so would love to branch out to data science and health care resources if folks have suggestions!

Podcasts

Reducing Crime – Jerry Ratcliffe interviews a mix of academics and folks working in the criminal justice field. I have quite a few of these episodes I found personally very informative. John Eck, Kim Rossmo, and Phil Goff were perhaps my favorites of academics. Danny Murphy and Thomas Abt were really good as well (for my favorite non-academics offhand).

Niro Knowledge – Nicholas Roy is a current crime analyst, and interviews other crime analysts and academics. Favorite interviews so far are Cynthia Lum and Renee Mitchell. Similar to reducing crime is typically more focused on a particular topic of interest to the person being interviewed (e.g. Renee talked about her work on crime harm indices).

Analyst Talk – This is a podcast hosted by Jason Elder where he interviews crime analysts from all over about their careers. Annie Thompson and my former colleague Shelagh Dorn’s are my favorite so far, but I also need to listen in sometime on Sean Bair’s series of talks as well.

Abt Podcasts – This I only came across a week ago, but have listened to several on data science, CJ, and social determinants of health. These are a bit different than the other podcasts here, they are shorter and have two individuals from different fields discuss social science relevant to the chosen topic.

Videos

Canadian Society of Evidence Based Policing – Has many interviews of academics in crim/cj. I have an interview with them (would not recommend, I need to work on sitting still!) I really enjoyed the Peter Neyroud interview though is my favorite.

UARK CASDAL – These are instructional videos uploaded by Grant Drawve, mostly around doing crime analysis in Excel, but also has a few in ArcGIS.

StatQuest with Josh Starmer – This is one of the few non crim/cj examples I watch regularly. As interview questions at my work place for entry data scientists we often ask folks to explain machine learning models (such as random forests or XGBoost) in some simple terms. These videos are excellent resources to get you to understand the basics of the mathematics behind the techniques.

Again let me know if of podcasts/video series I am missing out on in the comments!

Reproducible research and code review for journals

Recently came across two different groups broaching the subject of code reviews and reproducible research more broadly for criminal justice. There are certainly aspects of either that make it difficult in the context of peer review. But I am not one to let the perfect be the enemy of the good, so I will layout the difficulties and give some comments on potential good enough solutions that still make marked improvements on the current state of affairs in crim/cj research.

Reproducible Research

So what do I mean by reproducible research? Jeromy Anglim on crossvalidated has a good breakdown on different ways we may apply the term. So to some it may mean if you did a hot spots policing experiment, can I replicate the same crime reduction results in another city.

These are important to publish (simply because social science experiments will inevitably have quite a bit of variance), but this is often not what we are talking about when we talk about replication. We are often talking about a much smaller in scope goal – if I give you the exact same data, can you reproduce the tables/figures in the manuscript you used to make your inferences?

One problem that is often the case with CJ research is that we are working with sensitive data. If I do analysis on a survey of a sensitive topic, I often cannot share the data. But, I do not believe that should entirely put a spike in the question of reproducible data. I have broken down different levels that are possible in making research more reproducible:

  1. A Sharing data and code files to reproduce the paper results
  2. B Sharing code files and simulated data that illustrate the results
  3. C Sharing the plain-text log files showing the code and results of tables/figures

So I have not seen C proposed anywhere, but it is a dead simple solution that almost everyone should be able to accommodate. It simply involves typing log using "output.txt", text at the top of your Stata file, or OUTPUT EXPORT /PDF DOCUMENTFILE="output.pdf" at the end of your SPSS analysis (or could be done via the GUI), etc. These are the log/output files used to generate the results you report in the paper, and typically contain both the commands run, as well as the resulting tables. These files can quite easily not contain privileged information (in fact they won’t be default most of the time, unless you printed out individual names in a table for example in intermediate results).

To accomplish C does take some modicum of wherewithal in terms of writing code, but it is a pretty low bar. So I see no reason why all quantitative analyses cannot require at least this step right now. I realize it is not foolproof – a bad actor could go and edit the results (same as they could edit the results without this information). But it ups the level of effort to manipulate results by quite a bit, and more importantly has the potential to catch more mundane transcription errors that occur quite frequently.

Sometimes I want more details on the code used, the nature of the data etc. (Most quasi-experimental design for example can be summed up as shape your data in a special way and run a particular regression model.) For people like me who care about that, B helps with that, in that I can see the code front-to-back, can actually go and inspect the shape and values in a particular rectangular dataset, and see how the code interacts with those objects. The only full on example of this I am aware of is a recent example paper in Nature Behavior that shares the code using simulated data.

B is also very similar to people who release statistical packages to reproduce their code. So if you release an R package that conducts your new fancy technique, even if you can’t share your data it is really good for people to be able to view the underlying code even by itself to understand the technique better and in conjunction build on your work more. If you do a new technique, it is a crazy ton of work to replicate that on your own, so most people will not bother.

A is most of the way there to the gold standard – if you can share both the data and the code used to reproduce the analysis. Both A and B take a significant amount of knowledge of statistical programming to accomplish. Most people in our field do not have the skills to write an analysis front-to-back that can run in a series of scripts though. To get to A/B grad programs in crim/cj need to spend crazy more time on teaching these skills, which is near zero now almost across the board.

One brief thing to mention about A is that the boundary is difficult to define. So for example, I share code to reproduce analysis in my 311 and crime at micro places in DC paper (paper link, code). But this starts from a dataset that has the street units in DC and all of the covariates already compiled. But where did that dataset come from? I created it by compiling many different sources, so the base dataset is itself very difficult to replicate. Again not letting the perfect be the enemy of the good, I think just starting from your compiled dataset, and replicating the tables/graphs in the manuscript is better than letting the fuzzy boundary prevent you from sharing anything.

Code Reviews for Journal Submissions

The hardest part of A is that even after you share your data, some journals want to be able to run the code locally to entirely reproduce your results. So while I have shared data code (A above) for many papers, see this spreadsheet, they have not been externally vetted by any of those journals. This vetting is the standard in some economic journals now I believe, and would not be surprised in some poli-sci journals as well. This is a very hard problem though, and requires significant resources from both the journal and the researcher to be able to do that.

The biggest hurdle is that even if you share your data/code, your particular system may be idiosyncratic. You may have different R libraries installed than me. You may have different versions of python packages. I may have used a program on Windows to do some analysis you cannot do on a Mac. You may rely on some paid API I cannot access.

These are often solvable problems, but take quite a bit of time to work out. A comparable example to my work is when data scientists say ‘going to production’. This often involves taking some analysis I did on my local machine, and making it run autonomously on my companies servers. There are some things that make it more or less difficult than the typical academic situation, but I think it is broadly comparable. To go to production for a project will typically take me 3-6 months at 50% of my time, so maybe something like 300 hours for a lowish end estimate. And that is just the time it takes from the researchers end, from the journals end it will also take a significant amount of time to compile every ones code and verify the results.

Because of this, I don’t think the fully reproducible re-run my code and generate the exact same tables are feasible in the current way we do academic research and peer review. But again that is why I list C above – we shouldn’t let the perfect be the enemy of the good.

Validating New Empirical Techniques

The code review above is not really code review in the sense that someone looks at your code and says this is correct, it is simply just saying can I get the same results as you. You may want peer review to accomplish the task of not only saying is it reproducible, but is it valid/correct? There are a few things towards this end I would like to see more often in crim/cj. I realize we are not statistics, so cannot often ask for formal proofs. But there are simpler things we can do to verify the results. These are the responsibility of the researcher to provide, not the reviewer to script up on their own to validate someone elses work.

One, illustrate the technique using a very simplified example. So for instance, in my p-median patrol areas paper, I show an example of constructing the linear program with only four areas. You should be able to calculate what the result should be by hand, so can verify the correctness of your algorithm. This has the added benefit of being a very good pedagogical way to describe your method.

Two, illustrate the technique on a larger sample of simulated data in which you again know the correct result. For one example of this, I showed how to estimate group based trajectory models using deep learning libraries. Again your model/method should be able to recover the correct result (which you know) given the simulated fake data.

Three, validate the result using real data compared to the current standard. For crime mapping papers, this means comparing forecasts compared to RTM, or simpler regression models, or simply prior crime = future crime on out of sample data. Amazingly many machine learning papers in CJ do not do out of sample predictions. If it is an inferential procedure, comparing the results to some other status quo technique is similar, such as showing conformal prediction intervals have smaller widths (so more statistical power) than placebo results for synthetic control designs (at least for that example with state panel level crime data).

You may not have all three of these examples in any particular paper, but I think for very new techniques 1 or 2 is necessary. 3 is often a by-product on the analysis anyway. So I do not believe any of these asks are that onerous. If you have the skills to create some new technique, you should be able to accomplish 1 or 2.

I do not have any special advice in terms of the reviewers perspective. When I do code reviews at work, what we do is go line by line, and my co-workers give high level design advice. E.g. you should use a config file for this instead of defining it inline, you should turn this block into a function, you should make a class to open/close the database connections etc. The code reviews do not validate the technical correctness, so if I queried the wrong data they wouldn’t know in the code review. The proof is in the pudding so to speak, so if my results are performing really badly in the real world I know I am doing something wrong. (And the obverse, if my results are on the mark and making money I am pretty sure I did nothing terribly wrong.)

Because there are not these real world mechanisms to validate code in peer reviewed papers, my suggestions for 1/2/3 are the closest I think we can get in many circumstances. That and simply making your code available will dramatically improve the reproducibility and validity of your research compared to the current status quo in our field.

My online course lab materials and musings about online teaching

I often refer folks to the courses I have placed online. Just for an update for everyone, if you look at the top of my website, I have pages for each of my courses at the header of my page. Several of these are just descriptions and syllabi, but the few lab based courses I have done over the years I have put my materials entirely online. So those are:

And each of those pages links to a GitHub page where all the lab goodies are stored.

The seminar in research focuses on popular quasi-experimental designs in CJ, and has code in R/Stata/SPSS for the weekly lessons. (Will need to update with python, I may need to write my own python margins library though!)

Grad GIS is mostly old ArcGIS tutorials (I don’t think I will update ArcPro, will see when Eric Piza’s new book comes out and just suggest that probably). Even though the screenshots are perhaps old at this point though the ideas/workflow are not. (It also has some tutorials on other open source tools, such as CrimeStat, Jerry’s Near Repeat Calculator, GeoDa, spatial regression analysis in R, and Mallesons/Andresens SPPT tool are examples I remember offhand.)

Undergrad Crime Analysis is mostly focused on number crunching relevant to crime analysts in Excel, although has a few things in Access (making SQL queries), and making a BOLO in publisher.

So for folks self-learning of course use those resources however you want. My suggestion is to skim through the syllabus, see if you want to learn about any particular lesson, and then jump right to that one. No need to slog through the whole course if you are just interested in one specific thing.

They are also freely available to any instructors who want to adapt those materials for their own courses as well.


One of the things that has disappointed me about the teaching response to Covid is instead of institutions taking the opportunity to really invest in online teaching, people are just running around with their heads cut off and offering poor last minute hybrid courses. (This is both for the kiddos as well as higher education.)

If you have ever taken a Coursera course, they are a real production! And the ones I have tried have all been really well done; nice videos, interactive quizzes with immediate feedback, etc. A professor on their own though cannot accomplish that, we would need investment from the University in filming and in scripting the webpage. But once it is finished, it can be delivered to the masses.

So instead of running courses with a tiny number of students, I think it makes more sense for Universities to actually pony up resources to help professors make professional looking online courses. Not the nonsense with a bad recorded lecture and a discussion board. It is IMO better to give someone a semester sabbatical to develop a really nice online course than make people develop them at the last minute. Once the course is set up, you really only need to administer the course, which takes much less work.

Another interested party may be professional organizations. For example, the American Society of Criminology could make an ad-hoc committee to develop a model curriculum for an intro criminology course. You can see in my course pages I taught this at one point – there is no real reason why every criminology teacher needs to strike out on their own. This is both more work for the individual teacher, as well as introduces quite a bit of variation in the content that crim/cj students receive.

Even if ASC started smaller, say promoting individual lessons, that would be lovely. Part of the difficulty in teaching a broad course like Intro to Criminology is that I am not an expert on all of criminology. So for example if someone made a lesson plan/video for bio-social criminology, I would be more apt to use that. Think instead of a single textbook, leveraging multi-media.


It is a bit ironic, but one of the reasons I was hired at HMS was to internally deliver data science training. So even though I am in the private sector I am still teaching!

Like I said previously, you are on your own for developing teaching content at the University. There is very little oversight. I imagine many professors will cringe at my description, but one of the things I like at HMS is the collaboration in developing materials. So I initially sat down with my supervisor and project manager to develop the overall curricula. Then for individual lessons I submit my slides/lab portion to my supervisor to get feedback, and also do a dry run in front of one of my peers on our data science team to get feedback. Then in the end I do a recorded lecture – we limit to something like 30 people on WebEx so it is not lagging, but ultimately everyone in the org can access the video recording at a later date.

So again I think this is a better approach. It takes more time, and I only do one lecture at a time (so take a month or two to develop one lecture). But I think that in the end this will be a better long term investment than the typical way Uni’s deliver courses.

New book: Micro geographic analysis of Chicago homicides, 1965-2017

In joint work with Chris Herrmann and Dick Block, we now have a book out – Understanding Micro-Place Homicide Patterns in Chicago (1965 – 2017). It is a Springer Brief book, so I recommend anyone who has a journal article that is too long that this is a potential venue for the work. (Really this is like the length of three journal articles.)

A few things occurred to prompt me to look into this. First, Chicago increased a big spike of homicides in 2016 and 2017. Here is a graph breaking them down between domestic related homicides and all other homicides. You can see all of the volatility is related to non-domestic homicides.

So this (at least to me) begs the question of whether those spiked homicides show similar characteristics compared to historical homicides. Here we focus on long term spatial patterns and micro place grid cells in the city, 150 by 150 meter cells. Dick & Carolyn Block had collated data, including the address where the body was discovered, using detective case notes starting in 1965 (ending in 2000). The data from 2000 through 2017 is the public incident report data released by Chicago PD online. Although Dick and Carolyn’s public dataset is likely well known at this point, Dick has more detailed data than is released publicly on ICPSR and a few more years (through 2000). Here is a map showing those homicide patterns aggregated over the entire long time period.

So we really have two different broad exploratory analyses we employed in the work. One was to examine homicide clustering, and the other was to examine temporal patterns in homicides. For clustering, we go through a ton of different metrics common in the field, and I introduce even one more, Theil’s decomposition for within/between neighborhood clustering. This shows Theil’s clustering metric within neighborhoods in Chicago (based on the entire time period).

So areas around the loop showed more clustering in homicides, but here it appears it is somewhat confounded with neighborhood size – smaller neighborhoods appear to have more clustering. This is sort of par for the course for these clustering metrics (we go through several different Gini variants as well), in that they are pretty fickle. You do a different temporal slice of data or treat empty grid cells differently the clustering metrics can change quite a bit.

So I personally prefer to focus on long term temporal patterns. Here I estimated group based trajectory models using zero-inflated Poisson models. And here are the predicted outputs for those grid cells over the city. You can see unlike prior work David Weisburd (Seattle), myself (Albany), or Martin Andresen (Vancouver) has done, they are much more wavy patterns. This may be due to looking over a much longer horizon than any of those prior works though have.

The big wave, Group 9, ends up being clearly tied to former large public housing projects, which their demolitions corresponds to the downturn.

I have an interactive map to explore the other trajectory groups here. Unfortunately the others don’t show as clear of patterns as Group 9, so it is difficult to answer any hard questions about the uptick in 2016/2017, you could find evidence of homicides dispersing vs homicides being in the same places but at a higher intensity if you slice the data different ways.

Unfortunately the analysis is never ending. Chicago homicides have again spiked this year, so maybe we will need to redo some analysis to see if the more current trends still hold. I think I will migrate away from the clustering metrics though (Gini and Theil), they appear to be too volatile to say much of anything over short term patterns. I think there may be other point pattern analysis that are more diagnostic to really understand emerging/changing spatial patterns.

The coffee next to the cover image is Chris Herrmann’s beans, so go get yourself some as well at Fellowship Coffee!

Publishing in Peer Review?

I am close, but not quite, entirely finished with my current crim/cj peer reviewed papers. Only one paper hangs on, the CCTV clearance paper (with Yeondae Jung). Rejected twice so far (once on R&R from Justice Quarterly), and has been under review in toto around a year and a half so far. It will land somewhere eventually, but who knows where at this point. (The other pre-prints I have on my CV but are not in peer review journals I am not actively seeking to publish anymore.)

Given the typical lags in the peer review process, if you look at my CV I will appear active in terms of publishing in 2020 (6 papers) and 2021 (4 papers and a book). But I have not worked on any peer review paper in earnest since I started working at HMS in December 2019, only copy-editing things I had already produced. (Which still takes a bit of work, for example my Cost of Crime hot spots paper took around 40 hours to respond to reviewers.)

At this point I am not sure if I will pursue any more peer reviewed publications directly in criminology/criminal justice. (Maybe as part of a team in giving support, but not as the lead.) Also we have discussed at my workplace pursuing publications, but that will be in healthcare related projects, not in Crim/CJ.

Part of the reason is that the time it takes to do a peer review publication is quite a bit relative to publishing a simple blog post. Take for instance my recent post on incorporating harm weights into the WDD test. I received the email question for this on Wednesday 11/18, thought about how to tackle the problem overnight, and wrote the blog post that following Thursday morning before my CrimCon presentation, (I took off work to attend the panel with no distractions). So took me around 3 hours in total. Many of my blog posts take somewhat longer, but I definitely do not take any more than 10-20 hours on an individual one (that includes the coding part, the writing part is mostly trivial).

I have attempted to guess as to the relative time it takes to do a peer reviewed publication based on my past work. I averaged around 5 publications per year, worked on average 50 hours a week while I was an academic, and spent something like I am guessing 60% to 80% (or more) of my time on peer review publications. Say I work 51 weeks a year (I definitely did not take any long vacations!, and definitely still put in my regular 50 hours over the summertime), that is 51*50=2550 hours. So that means around (2550*0.6)/5 ~ 300 or (2550*0.8)/5 ~ 400 so an estimate of 300 to 400 hours devoted to an individual peer review publication over my career. This will be high (as it absorbs things like grants I did not get), but is in the ballpark of what I would guess (I would have guessed 200+).

So this is an average. If I had recorded the time, I may have had a paper only take around 100 hours (I don’t think I could squeeze any out in less than that). I have definitely had some take over 400 hours! (My Mapping RTM using Machine Learning I easily spent over 200 hours just writing computer code, not to brag, it was mostly me being inefficient and chasing a few dead ends. But that is a normal part of the research process.)

So it is hard for me to say, OK here is a good blog post that took me 3 hours. Now I should go and spend another 300 to write a peer review publication. Some of that effort to publish in peer review journals is totally legitimate. For me to turn those blog posts into a peer review article I would need a more substantive real-life application (if not multiple real-life applications), and perhaps detailed simulations and comparisons to other techniques for the methods blog posts. But a bunch is just busy work – the front end lit review and answering petty questions from peer reviewers is a very big chunk of that 300 hours (and has very little value added).

My blog posts typically get many more views than my peer review papers do, so I have very little motivation to get the stamp of approval for peer review. So my blog posts take far less time, are more wide read, and likely more accessible than peer reviewed papers. Since I am not on the tenure track and do not get evaluated by peer reviewed publications anymore, there is not much motivation to continue them.

I do have additional ideas I would like to pursue. Fairness and efficiency in siting CCTV cameras is a big one on my mind. (I know how to do it, I just need to put in the work to do the analysis and write it up.) But again, it will likely take 300+ hours for me to finish that project. And I do not think anyone will even end up using it in the end – peer reviewed papers have very little impact on policy. So my time is probably better spent writing a few blog posts and playing video games with all the extra time.

If you are an editor reading this, I still do quite a few peer reviews (so feel free to send me those). I actually have more time to do those promptly since I am not hustling writing papers! I have actually debated on whether it is worth it to start my own peer reviewed journal, or maybe contribute to editing an already existing journals (just joined the JQC editorial board). Or maybe start writing my own crime analysis or methods text books. I think that would be a better use of my time at this point than pursuing individual publications.

Lit reviews are (almost) functionally worthless

The other day I got an email from ACJS about the most downloaded articles of the year for each of their journals. For The Journal of Criminal Justice Education it was a slightly older piece, How to write a literature review in 2012 by Andrew Denney & Richard Tewksbury, DT from here on. As you can guess by the title of my blog post, it is not my most favorite subject. I think it is actually an impossible task to give advice about how to write a literature review. The reason for this is that we have no objective standards by which to judge a literature review – whether one is good or bad is almost wholly subject to the discretion of the reader.

The DT article I don’t think per se gives bad advice. Use an outline? Golly I suggest students do that too! Be comprehensive in your lit review about covering all relevant work? Well who can argue with that!

I think an important distinction to make in the advice DT give is the distinction between functional actions and symbolic actions. Functional in this context means an action that makes the article better accomplish some specific function. So for example, if I say you should translate complicated regression models to more intuitive marginal effects to make your results more interpretable for readers, that has a clear function (improved readability).

Symbolic actions are those that are merely intended to act as a signal to the reader. So if the advice is along the lines of, you should do this to pass peer review, that is on its face symbolic. DT’s article is nearly 100% about taking symbolic actions to make peer reviewers happy. Most of the advice doesn’t actually improve the content of the manuscript (or in the most charitable interpretation how it improves the manuscript is at best implicit). In DT’s section Why is it important this focus on symbolic actions becomes pretty clear. Here is the first paragraph of that section:

Literature reviews are important for a number of reasons. Primarily, literature reviews force a writer to educate him/herself on as much information as possible pertaining to the topic chosen. This will both assist in the learning process, and it will also help make the writing as strong as possible by knowing what has/has not been both studied and established as knowledge in prior research. Second, literature reviews demonstrate to readers that the author has a firm understanding of the topic. This provides credibility to the author and integrity to the work’s overall argument. And, by reviewing and reporting on all prior literature, weaknesses and shortcomings of prior literature will become more apparent. This will not only assist in finding or arguing for the need for a particular research question to explore, but will also help in better forming the argument for why further research is needed. In this way, the literature review of a research report “foreshadows the researcher’s own study” (Berg, 2009, p. 388).

So the first argument, a lit review forces a writer to educate themselves, may offhand seem like a functional objective. It doesn’t make sense though, as lit. reviews are almost always written ex post research project. The point of writing a paper is not to educate yourself, but educate other people on your research findings. The symbolic motivation for this viewpoint becomes clear in DT’s second point, you need to demonstrate credibility to your readers. In terms of integrity if the advice in DT was ‘consider creating a pre-analysis plan’ or ‘release data and code files to replicate your results’ that would be functional advice. But no, it is important to wordsmith how smart you are so reviewers perceive your work as more credible.

Then the last point in the paragraph, articulating the need for a particular piece of research, is again a symbolic action in DT’s essay. You are arguing to peer reviewers about the need for a particular research question. I understand the spirit of this, but think back to what function does this serve? It is merely a signal to reviewers to say, given finite space in a journal, please publish my paper over some other paper, because my topic is more important.

You actually don’t need a literature review to demonstrate a topic is important and/or needed – you can typically articulate that in a sentence or two. For a paper I reviewed not too long ago on crime reductions resulting from CCTV installations in a European city, I was struck by another reviewers critique saying that the authors “never really motivate the study relative to the literature”. I don’t know about you, but the importance of that study seems pretty obvious to me. But yeah sure, go ahead and pad that citation list with a bunch of other studies looking at the same thing to make some peer reviewers happy. God forbid you simply cite a meta-analysis on prior CCTV studies and move onto better things.

What should a lit review accomplish?

So again I don’t think DT give bad advice – mostly vapid but not obviously bad. DT focus on symbolic actions in lit reviews because as lit reviews are currently performed in CJ/Crim journals, they are almost 100% symbolic. They serve almost no functional purpose other than as a signal to reviewers that you are part of the club. So DT give about the best advice possible navigating a series of arbitrary critiques with no clear standard.

As an example for this position that lit reviews accomplish practically nothing, conduct this personal experiment. The next peer review article you pick up, do not read the literature review section. Only read the abstract, and then the results and conclusion. Without having read the literature review, does this change the validity of a papers findings? It for the most part does not. People get feelings hurt by not being cited (including myself), but even if someone fails to cite some of my work that is related it pretty much never impacts the validity of that persons findings.

So DT give advice about how peer review works now. No doubt those symbolic actions are important to getting your paper published, even if they do not improve the actual quality of the manuscript in any clear way. I rather address the question about what I think a lit review should look like – not what you should do to placate three random people and the editor. So again I think the best way to think about this is via articulating specific functions a lit review accomplishes in terms of improving the manuscript.

Broadening the scope abit to consider the necessity of citations, the majority of citations in articles are perfunctory, but I don’t think people should plagiarize. So when you pull a very specific piece of information from a source, I think it is important to cite that work. Say you are using a survey instrument developed by someone else, citing the work that establishes that instruments reliability and validity, as well as the original population those measures were established on, is certainly useful information to the reader. Sources of information/measures, a recent piece saying the properties of your statistical model are I think other good examples of things to cite in your work. Unfortunately I cannot give a bright line here, I don’t cite Gauss every time I use the normal distribution. But if I am using a code library someone else developed that is important, inasmuch as that if someone wants to do a similar project they could use the same library.

In terms of discussing relevant results in prior studies, again the issue is the boundary of what is relevant is very difficult to articulate. If there is a relevant meta-analysis on a topic, it seems sufficient to me to simply state the results of the meta-analysis. Why do I think that is important though? It helps inform your priors about the current study. So if you say a meta-analysis effect size is X, and the current study has an effect size much larger, it may give you pause. It is also relevant if you are generalizing from the results of the study, it is just another piece of evidence in addition to the meta-analysis, not an island all by itself.

I am not saying discussing prior specific results are not needed entirely, but they do not need to be extensive. So if studies Z, Y, X are similar to yours but all had null results, and you think it was because the sample sizes were too small, that is relevant and useful information. (Again it changes your priors.) But it does not need to be belabored on in detail. The current standard of articulating different theoretical aspects ad-nauseum in Crim/CJ journals does not improve the quality of manuscripts. If you do a hot spots policing experiment, you do not need to review all the different minutia of general deterrence theory. Simply saying this experiment is likely to only accomplish general deterrence, not specific deterrence, seems sufficient to me personally.

When you propose a book you need to say ‘here are some relevant examples’ – I think the same idea would be sufficient for a lit review. OK here is my study, here are a few additional studies I think the reader may be interested in that are related. This accomplishes what contemporary lit reviews do in a much more efficient manner – citing more articles makes it much more difficult to pull out the really relevant related work. So admit this does not improve the quality of the current manuscript in a specific way, but helps the reader identify other sources of interest. (I as a reader typically go through the citation list and note a few articles I am interested in, this helps me accomplish that task much quicker.)

I’ve already sprinkled a few additional pieces of advice in this blog post (marginal effect estimates, pre-analysis plans, sharing data code), although you may say they don’t belong in the lit review. Whatever, those are things that actually improve either the content of the manuscript or the actual integrity of the research, not some spray paint on your flowers.

Relevant Other Work

Mapping attitudes paper published

My paper (joint work with Jasmine Silver, Rob Worden, and Sarah McLean), Mapping attitudes towards the police at micro places, has been published in the most recent issue of the Journal of Quantitative Criminology. Here is the abstract:

Objectives: We examine satisfaction with the police at micro places using data from citizen surveys conducted in 2001, 2009 and 2014 in one city. We illustrate the utility of this approach by comparing micro- and meso-level aggregations of policing attitudes, as well as by predicting views about the police from crime data at micro places.

Methods: In each survey, respondents provided the nearest intersection to their address. Using that geocoded survey data, we use inverse distance weighting to map a smooth surface of satisfaction with police over the entire city and compare the micro-level pattern of policing attitudes to survey data aggregated to the census tract. We also use spatial and multi-level regression models to estimate the effect of local violent crimes on attitudes towards police, controlling for other individual and neighborhood level characteristics.

Results: We demonstrate that there are no systematic biases for respondents refusing to answer the nearest intersection question. We show that hot spots of dissatisfaction with police do not conform to census tract boundaries, but rather align closely with hot spots of crime. Models predicting satisfaction with police show that local counts of violent crime are a strong predictor of attitudes towards police, even above individual level predictors of race and age.

Conclusions: Asking survey respondents to provide the nearest intersection to where they live is a simple approach to mapping attitudes towards police at micro places. This approach provides advantages beyond those of using traditional neighborhood boundaries. Specifically, it provides more precise locations police may target interventions, as well as illuminates an important predictor (i.e., nearby violent crimes) of policing attitudes.

And this was one of my favorites to make maps. We show how to take surveys and create analogs of hot spot maps of negative sentiment towards police. We do this via asking individuals to list their closest intersection (to still give some anonymity), and then create inverse distance weighted maps of negative attitudes towards police.

We also find in this work that nearby crimes are the biggest factor in predicting negative sentiment towards police. This hints that past results aggregating attitudes to neighborhoods is inappropriate, and that police reducing crime is likely to have the best margin in terms of making people more happy with the police in general.

As always, feel free to reach out for a copy of the paper if you cannot access JQC. (Or you could go a view the pre-print.)