My online course lab materials and musings about online teaching

I often refer folks to the courses I have placed online. Just for an update for everyone, if you look at the top of my website, I have pages for each of my courses at the header of my page. Several of these are just descriptions and syllabi, but the few lab based courses I have done over the years I have put my materials entirely online. So those are:

And each of those pages links to a GitHub page where all the lab goodies are stored.

The seminar in research focuses on popular quasi-experimental designs in CJ, and has code in R/Stata/SPSS for the weekly lessons. (Will need to update with python, I may need to write my own python margins library though!)

Grad GIS is mostly old ArcGIS tutorials (I don’t think I will update ArcPro, will see when Eric Piza’s new book comes out and just suggest that probably). Even though the screenshots are perhaps old at this point though the ideas/workflow are not. (It also has some tutorials on other open source tools, such as CrimeStat, Jerry’s Near Repeat Calculator, GeoDa, spatial regression analysis in R, and Mallesons/Andresens SPPT tool are examples I remember offhand.)

Undergrad Crime Analysis is mostly focused on number crunching relevant to crime analysts in Excel, although has a few things in Access (making SQL queries), and making a BOLO in publisher.

So for folks self-learning of course use those resources however you want. My suggestion is to skim through the syllabus, see if you want to learn about any particular lesson, and then jump right to that one. No need to slog through the whole course if you are just interested in one specific thing.

They are also freely available to any instructors who want to adapt those materials for their own courses as well.


One of the things that has disappointed me about the teaching response to Covid is instead of institutions taking the opportunity to really invest in online teaching, people are just running around with their heads cut off and offering poor last minute hybrid courses. (This is both for the kiddos as well as higher education.)

If you have ever taken a Coursera course, they are a real production! And the ones I have tried have all been really well done; nice videos, interactive quizzes with immediate feedback, etc. A professor on their own though cannot accomplish that, we would need investment from the University in filming and in scripting the webpage. But once it is finished, it can be delivered to the masses.

So instead of running courses with a tiny number of students, I think it makes more sense for Universities to actually pony up resources to help professors make professional looking online courses. Not the nonsense with a bad recorded lecture and a discussion board. It is IMO better to give someone a semester sabbatical to develop a really nice online course than make people develop them at the last minute. Once the course is set up, you really only need to administer the course, which takes much less work.

Another interested party may be professional organizations. For example, the American Society of Criminology could make an ad-hoc committee to develop a model curriculum for an intro criminology course. You can see in my course pages I taught this at one point – there is no real reason why every criminology teacher needs to strike out on their own. This is both more work for the individual teacher, as well as introduces quite a bit of variation in the content that crim/cj students receive.

Even if ASC started smaller, say promoting individual lessons, that would be lovely. Part of the difficulty in teaching a broad course like Intro to Criminology is that I am not an expert on all of criminology. So for example if someone made a lesson plan/video for bio-social criminology, I would be more apt to use that. Think instead of a single textbook, leveraging multi-media.


It is a bit ironic, but one of the reasons I was hired at HMS was to internally deliver data science training. So even though I am in the private sector I am still teaching!

Like I said previously, you are on your own for developing teaching content at the University. There is very little oversight. I imagine many professors will cringe at my description, but one of the things I like at HMS is the collaboration in developing materials. So I initially sat down with my supervisor and project manager to develop the overall curricula. Then for individual lessons I submit my slides/lab portion to my supervisor to get feedback, and also do a dry run in front of one of my peers on our data science team to get feedback. Then in the end I do a recorded lecture – we limit to something like 30 people on WebEx so it is not lagging, but ultimately everyone in the org can access the video recording at a later date.

So again I think this is a better approach. It takes more time, and I only do one lecture at a time (so take a month or two to develop one lecture). But I think that in the end this will be a better long term investment than the typical way Uni’s deliver courses.

Checking a Poisson distribution fit: An example with officer involved shooting deaths WaPo data (R functions)

So besides code on my GitHub page, I have a list of various statistic functions I’ve scripted on the blog over the years on my code snippets page. One of those functions I will illustrate today is some R code to check the fit of the Poisson distribution. Many of my crime analysis examples rely on crime data being approximately Poisson distributed. Additionally it is relevant in regression model building, e.g. should I use a Poisson GLM or do I need to use some type of zero-inflated model?

Here is a brief example to show how my R code works. You can source it directly from my dropbox page. Then I generated 10k simulated rows of Poisson data with a mean of 0.2. So I see many people in CJ make the mistake that, OK my data has 85% zeroes, I need to use some sort of zero-inflated model. If you are working with very small spatial/temporal units of analysis and/or rare crimes, it may be the mean of the distribution is quite low, and so the Poisson distribution is actually quite close.

# My check Poisson function
source('https://dl.dropboxusercontent.com/s/yj7yc07s5fgkirz/CheckPoisson.R?dl=0')

# Example with simulated data
set.seed(10)
lambda <- 0.2
x <- rpois(10000,lambda)
CheckPoisson(x,0,max(x),mean(x))

Here you can see in the generated table from my CheckPoisson function, that with a mean of 0.2, we expect around 81.2% zeroes in the data. And since we simulated the data according to the Poisson distribution, that is what we get. The table shows that out of the 10k simulation rows, 8121 were 0’s, 1692 rows were 1’s etc.

In real life data never exactly conform to hypothetical distributions. But we often want to see how close they are to the hypothetical before building predictive models. A real life example as close to Poisson distributed data as I have ever seen is the Washington Post Fatal Use of Force data. Every year WaPo has been collating the data, the total number of Fatal uses of Police Force in the US have been very close to 1000 events per year. And even in all the turmoil this past year, that is still the case.

# Washington Post Officer Involved Shooting Deaths Data
oid <- read.csv('https://raw.githubusercontent.com/washingtonpost/data-police-shootings/master/fatal-police-shootings-data.csv',
                stringsAsFactors = F)

# Year Stats
oid$year <- as.integer(substr(oid$date,1,4))
year_stats <- table(oid$year)[1:6]
year_stats 
mean(year_stats)
var(year_stats)

One way to check the Poison distribution is that the mean and the variance should be close, and here at the yearly level the data have some evidence of underdispersion according to the Poisson distribution (most crime data is overdispersed – the variance is much greater than the mean). If the actual mean is around 990, you would expect typical variations of say around plus/minus 60 per year (~ 2*sqrt(990)). But that only gives us a few observations to check (6 years). We can dis-aggregate the data to smaller intervals and check the Poisson assumption. Here I aggregate to days (note that this includes zero days in the table levels calculation). Then we again check the fit of the Poisson distribution.

#Now aggregating to count per day
oid$date_val <- as.Date(oid$date)
date_range <- paste0(seq(as.Date('2015-01-01'),max(oid$date_val),by='days'))
day_counts <- as.data.frame(table(factor(oid$date,levels=date_range)))
head(day_counts)
pfit <- CheckPoisson(day_counts$Freq, 0, 10, mean(day_counts$Freq))
pfit

According to the mean and the variance, it appears the distribution is a very close fit to the Poisson. We can see in this data we expected to have around 147 days with 0 fatal encounters, and in reality there were 160. I like seeing the overall counts, but another way is via the proportions in the final three columns of the table. You can see for all of the integers, we are less than 2 percentage points off for any particular integer count. E.g. we expect the distribution to have 3 fatal uses of force on about 22% of the days, but in the observed distribution days with 3 events only happened around 21% of the days (or 20.6378132 without rounding). So overall these fatal use of force data of course are not exactly Poisson distributed, but they are quite close.

So the Poisson distribution is motivated via a process in which the inter-arrival dates of events being counted are independent. Or in more simple terms one event does not cause a future event to come faster or slower. So offhand if you had a hypothesis that publicizing officer fatalities made future officers more hesitant to use deadly force, this is not supported in this data. Given that this is officer involved fatal encounters in the entire US, it is consistent with the data generating process that a fatal encounter in one jurisdiction has little to do with fatal encounters in other jurisdictions.

(Crime data we are often interested in the opposite self-exciting hypothesis, that one event causes another to happen in the near future. Self-excitation would cause an increase in the variance, so the opposite process would result in a reduced variance of the counts. E.g. if you have something that occurs at a regular monthly interval, the counts of that event will be underdispersed according to a Poisson process.)

So the above examples just checked a univariate data source for whether the Poisson distribution was a decent fit. Oftentimes academics are interested in whether the conditional distribution is a good fit post some regression model. So even if the marginal distribution is not Poisson, it may be you can still use a Poisson GLM, generate good predictions, and the conditional model is a good fit for the Poisson distribution. (That being said, you model has to do more work the further away it is from the hypothetical distribution, so if the marginal is very clearly off from Poisson a Poisson GLM probably won’t fit very well.)

My CheckPoisson function allows you to check the fit of a Poisson GLM by piping in varying predicted values over the sample instead of just one. Here is an example where I use a Poisson GLM to generate estimates conditional on the day of the week (just for illustration, I don’t have any obvious reason fatal encounters would occur more or less often during particular days of the week).

#Do example for the day of the week
day_counts$wd <- weekdays(as.Date(day_counts$Var1))
mod <- glm(Freq ~ as.factor(wd) - 1, family="poisson", data=day_counts)
#summary(mod), Tue/Wed/Thu a bit higher
lin_pred <- exp(predict(mod))
pfit_wd <- CheckPoisson(day_counts$Freq, 0, 10, lin_pred)
pfit_wd

You can see that the fit is almost exactly the same as before with the univariate data, so the differences in days of the week does not explain most of the divergence from the hypothetical Poisson distribution, but again this data is already quite close to a Poisson distribution.

So it is common for people to do tests for goodness-of-fit using these tables. I don’t really recommend it – just look at the table and see if it is close. Departures from hypothetical can inform modeling decisions, e.g. if you do have more zeroes than expected than you may need a negative binomial model or a zero-inflated model. If the departures are not dramatic, variance estimates from the Poisson assumption are not likely to be dramatically off-the-mark.

But if you must, here is an example of generating a Chi-Square goodness-of-fit test with the example Poisson fit table.

# If you really want to do a test of fit
chi_stat <- sum((pfit$Freq - pfit$PoisF)^2/pfit$PoisF)
df <- length(pfit$Freq) - 2
dchisq(chi_stat, df)

So you can see in this example the p-value is just under 0.06.

I really don’t recommend this though for two reasons. One is that with null hypothesis significance testing you are really put in a position that large data samples always reject the null, even if the departures are trivial in terms of the assumptions you are making for whatever subsequent model. The flipside of this is that with small samples the test is underpowered, so there are never many good scenarios where it is useful in practice. Two, you can generate superfluous categories (or collapse particular categories) in the Chi-Square test to increase the degrees of freedom and change the p-value.

One of the things though that this is useful for is checking the opposite, people fudging data. If you have data too close to the hypothetical distribution (so very high p-values here), it can be evidence that someone manipulated the data (because real data is never that close to hypothetical distributions). A famous example of this type of test is whether Mendel manipulated his data.

I intentionally chose the WaPo data as it is one of the few that out of the box really appears to be close to Poisson distributed in the wild. One of my next tasks though is to do some similar code for negative binomial fits. Like Paul Allison, for crime count data I rarely see much need for zero-inflated models. But while I was working on that I noticed that the parameters in NB fits with even samples of 1,000 to 10,000 observations were not very good. So I will need to dig into that more as well.

Lit reviews are (almost) functionally worthless

The other day I got an email from ACJS about the most downloaded articles of the year for each of their journals. For The Journal of Criminal Justice Education it was a slightly older piece, How to write a literature review in 2012 by Andrew Denney & Richard Tewksbury, DT from here on. As you can guess by the title of my blog post, it is not my most favorite subject. I think it is actually an impossible task to give advice about how to write a literature review. The reason for this is that we have no objective standards by which to judge a literature review – whether one is good or bad is almost wholly subject to the discretion of the reader.

The DT article I don’t think per se gives bad advice. Use an outline? Golly I suggest students do that too! Be comprehensive in your lit review about covering all relevant work? Well who can argue with that!

I think an important distinction to make in the advice DT give is the distinction between functional actions and symbolic actions. Functional in this context means an action that makes the article better accomplish some specific function. So for example, if I say you should translate complicated regression models to more intuitive marginal effects to make your results more interpretable for readers, that has a clear function (improved readability).

Symbolic actions are those that are merely intended to act as a signal to the reader. So if the advice is along the lines of, you should do this to pass peer review, that is on its face symbolic. DT’s article is nearly 100% about taking symbolic actions to make peer reviewers happy. Most of the advice doesn’t actually improve the content of the manuscript (or in the most charitable interpretation how it improves the manuscript is at best implicit). In DT’s section Why is it important this focus on symbolic actions becomes pretty clear. Here is the first paragraph of that section:

Literature reviews are important for a number of reasons. Primarily, literature reviews force a writer to educate him/herself on as much information as possible pertaining to the topic chosen. This will both assist in the learning process, and it will also help make the writing as strong as possible by knowing what has/has not been both studied and established as knowledge in prior research. Second, literature reviews demonstrate to readers that the author has a firm understanding of the topic. This provides credibility to the author and integrity to the work’s overall argument. And, by reviewing and reporting on all prior literature, weaknesses and shortcomings of prior literature will become more apparent. This will not only assist in finding or arguing for the need for a particular research question to explore, but will also help in better forming the argument for why further research is needed. In this way, the literature review of a research report “foreshadows the researcher’s own study” (Berg, 2009, p. 388).

So the first argument, a lit review forces a writer to educate themselves, may offhand seem like a functional objective. It doesn’t make sense though, as lit. reviews are almost always written ex post research project. The point of writing a paper is not to educate yourself, but educate other people on your research findings. The symbolic motivation for this viewpoint becomes clear in DT’s second point, you need to demonstrate credibility to your readers. In terms of integrity if the advice in DT was ‘consider creating a pre-analysis plan’ or ‘release data and code files to replicate your results’ that would be functional advice. But no, it is important to wordsmith how smart you are so reviewers perceive your work as more credible.

Then the last point in the paragraph, articulating the need for a particular piece of research, is again a symbolic action in DT’s essay. You are arguing to peer reviewers about the need for a particular research question. I understand the spirit of this, but think back to what function does this serve? It is merely a signal to reviewers to say, given finite space in a journal, please publish my paper over some other paper, because my topic is more important.

You actually don’t need a literature review to demonstrate a topic is important and/or needed – you can typically articulate that in a sentence or two. For a paper I reviewed not too long ago on crime reductions resulting from CCTV installations in a European city, I was struck by another reviewers critique saying that the authors “never really motivate the study relative to the literature”. I don’t know about you, but the importance of that study seems pretty obvious to me. But yeah sure, go ahead and pad that citation list with a bunch of other studies looking at the same thing to make some peer reviewers happy. God forbid you simply cite a meta-analysis on prior CCTV studies and move onto better things.

What should a lit review accomplish?

So again I don’t think DT give bad advice – mostly vapid but not obviously bad. DT focus on symbolic actions in lit reviews because as lit reviews are currently performed in CJ/Crim journals, they are almost 100% symbolic. They serve almost no functional purpose other than as a signal to reviewers that you are part of the club. So DT give about the best advice possible navigating a series of arbitrary critiques with no clear standard.

As an example for this position that lit reviews accomplish practically nothing, conduct this personal experiment. The next peer review article you pick up, do not read the literature review section. Only read the abstract, and then the results and conclusion. Without having read the literature review, does this change the validity of a papers findings? It for the most part does not. People get feelings hurt by not being cited (including myself), but even if someone fails to cite some of my work that is related it pretty much never impacts the validity of that persons findings.

So DT give advice about how peer review works now. No doubt those symbolic actions are important to getting your paper published, even if they do not improve the actual quality of the manuscript in any clear way. I rather address the question about what I think a lit review should look like – not what you should do to placate three random people and the editor. So again I think the best way to think about this is via articulating specific functions a lit review accomplishes in terms of improving the manuscript.

Broadening the scope abit to consider the necessity of citations, the majority of citations in articles are perfunctory, but I don’t think people should plagiarize. So when you pull a very specific piece of information from a source, I think it is important to cite that work. Say you are using a survey instrument developed by someone else, citing the work that establishes that instruments reliability and validity, as well as the original population those measures were established on, is certainly useful information to the reader. Sources of information/measures, a recent piece saying the properties of your statistical model are I think other good examples of things to cite in your work. Unfortunately I cannot give a bright line here, I don’t cite Gauss every time I use the normal distribution. But if I am using a code library someone else developed that is important, inasmuch as that if someone wants to do a similar project they could use the same library.

In terms of discussing relevant results in prior studies, again the issue is the boundary of what is relevant is very difficult to articulate. If there is a relevant meta-analysis on a topic, it seems sufficient to me to simply state the results of the meta-analysis. Why do I think that is important though? It helps inform your priors about the current study. So if you say a meta-analysis effect size is X, and the current study has an effect size much larger, it may give you pause. It is also relevant if you are generalizing from the results of the study, it is just another piece of evidence in addition to the meta-analysis, not an island all by itself.

I am not saying discussing prior specific results are not needed entirely, but they do not need to be extensive. So if studies Z, Y, X are similar to yours but all had null results, and you think it was because the sample sizes were too small, that is relevant and useful information. (Again it changes your priors.) But it does not need to be belabored on in detail. The current standard of articulating different theoretical aspects ad-nauseum in Crim/CJ journals does not improve the quality of manuscripts. If you do a hot spots policing experiment, you do not need to review all the different minutia of general deterrence theory. Simply saying this experiment is likely to only accomplish general deterrence, not specific deterrence, seems sufficient to me personally.

When you propose a book you need to say ‘here are some relevant examples’ – I think the same idea would be sufficient for a lit review. OK here is my study, here are a few additional studies I think the reader may be interested in that are related. This accomplishes what contemporary lit reviews do in a much more efficient manner – citing more articles makes it much more difficult to pull out the really relevant related work. So admit this does not improve the quality of the current manuscript in a specific way, but helps the reader identify other sources of interest. (I as a reader typically go through the citation list and note a few articles I am interested in, this helps me accomplish that task much quicker.)

I’ve already sprinkled a few additional pieces of advice in this blog post (marginal effect estimates, pre-analysis plans, sharing data code), although you may say they don’t belong in the lit review. Whatever, those are things that actually improve either the content of the manuscript or the actual integrity of the research, not some spray paint on your flowers.

Relevant Other Work

Amending the WDD test to incorporate Harm Weights

So I received a question the other day about amending my and Jerry Ratcliffe’s Weighted Displacement Difference (WDD) test to incorporate crime harms (Wheeler & Ratcliffe, 2018). This is a great idea, but unfortunately it takes a small bit of extra work compared to the original (from the analysts perspective). I cannot make it as simple as just piping in the pre-post crime weights into that previous spreadsheet I shared. The reason is a reduction of 10 crimes with a weight of 10 has a different variance than a reduction of 25 crimes with a weight of 4, even though both have the same total crime harm reduction (10*10 = 4*25).

I will walk through some simple spreadsheet calculations though (in Excel) so you can roll this on your own. HERE IS THE SPREADSHEET TO DOWNLOAD TO FOLLOW ALONG. What you need to do is to calculate the traditional WDD for each individual crime type in your sample, and then combine all those weighted WDD’s estimates in the end to figure out your crime harm weighted estimate in the end (with confidence intervals around that estimated effect).

Here is an example I take from data from Worrall & Wheeler (2019) (I use this in my undergrad crime analysis class, Lab 6). This is just data from one of the PFA areas and a control TAAG area I chose by hand.

So first, go through the motions for your individual crimes in calculating the point estimate for the WDD, and then also see the standard error of that estimate. Here is an example of piping in the data for thefts of motor vehicles. The WDD is simple, just pre-post crime counts. Since I don’t have a displacement area in this example, I set those cells to 0. Note that the way I calculate this, a negative number is a good thing, it means crime went down relative to the control areas.

Then you want to place those point estimates and standard errors in a new table, and in those same rows assign your arbitrary weight. Here I use weights taken from Ratcliffe (2015), but these weights can be anything. See examples in Wheeler & Reuter (2020) for using police cost of crime estimates, and Wolfgang et al. (2006) for using surveys on public perceptions of severity. Many of the different indices though use sentencing data to derive the weights. (You could even use negative weights and the calculations here all work, say you had some positive data on community interactions.)

Now we have all we need to calculate the harm-weighted WDD test. The big thing here to note is that the variance of Var(x*harm_weight) = Var(x)*harm_weight^2. So that allows me to use all the same machinery as the original WDD paper to combine all the weights in the end. So now you just need to add a few additional columns to your spreadsheet. The point estimate for the harm reduction is simply the weight multiplied by the point estimate for the crime reduction. The variance though you need to square the standard error, and square the weight, and then multiply those squared results together.

Once that is done, you can pool the harm weighted stats together, see the calculations below the table. Then you can use all the same normal distribution stuff from your intro stats class to calculate z-scores, p-values, and confidence intervals. Here are what the results look like for this particular example.

I think this is actually a really good idea to pool results together. Many place based police interventions are general, in that you might expect them to reduce multiple crime types. Various harm scores are a good way to pool the results, instead of doing many individual tests. A few caveats though, I have not done simulations like I did in the WDD peer reviewed paper, I believe these normal approximations will do OK under the same circumstances though that we suggest it is reasonable to do the WDD test. You should not do the WDD test if you only have a handful of crimes in each area (under 5 in any cell in that original table is a good signal it is too few of crimes).

These crime count recommendations I think are likely to work as well for weighted crime harm. So even if you give murder a really high weight, if you have fewer than 5 murders in any of those original cells, I do not think you should incorporate it into the analysis. The large harm weight and the small numbers do not cancel each other out! (They just make the normal approximation I use likely not very good.) In that case I would say only incorporate individual crimes that you are OK with doing the WDD analysis to begin with on their own, and then pool those results together.

Sometime I need to grab the results of the hot spots meta-analysis by Braga and company and redo the analysis using this WDD estimate. I think the recent paper by Braga and Weisburd (2020) is right, that modeling the IRR directly makes more sense (I use the IRR to do cost-benefit analysis estimates, not Cohen’s D). But even that is one step removed, so say you have two incident-rate-ratios (IRRs), 0.8 and 0.5, the latter is bigger right? Well, if the 0.8 study had a baseline of 100 crimes, that means the reduction is 100 - 0.8*100 = 20, but if the 0.5 study had a baseline of 30 crimes, that would mean a reduction of 30 - 0.5*30 = 15, so in terms of total crimes is a smaller effect. The WDD test intentionally focuses on crime counts, so is an estimate of the actual number of crimes reduced. Then you can weight those actual crime decreases how you want to. I think worrying about the IRR could even be one step too far removed.

References

Open Source Criminology Related Network Datasets

So I am a big proponent of open source data analysis. There is a problem with using criminal justice data sources though – they often have private information that prevents us from sharing the data. For example, I have posted quite a few of my projects here (mostly spatial data analysis), but there are a few I cannot share. For example, I worked on a paper with chronic offender predictions, and I cannot share that data (Wheeler et al., 2019). The outcome, being a victim or perpetrator of gun violence, is so rare that by itself basically makes it impossible to publicly share the data without exposing the individuals under study.

One good resource all criminologists should be aware of is ICPSR, in particular NACJD. Many datasets on there though anymore are restricted, in that you need to get IRB permission and ICPSR permision to download the dataset to use. (Which typically takes like 2~3 months in my experience doing it a few times, which includes both your local Uni IRB and the ICPSR process.) For example here is one I went through the motions to get to (in the end) validate different survival prediction methods.

ICPSR is a great resource to be able to handle sharing potentially sensitive data. But this falls short in two areas. One is in teaching – you cannot go through the IRB ritual in a timely enough fashion to be able to use those datasets in a course environment. The other is in terms of methods, so for example if you wanted to say your model provides better predictions than some other model, they should be established on the same datasets. Current state of affairs in criminology in this regard is pretty bad to be curt – most everybody uses their own data they have access to. So much of the research on different risk assessment instruments for bail/probation/parole are pretty much impossible to say one is better than another.

One example type of data source that is almost entirely missing from NACJD (that I am aware of) is social network datasets relevant for criminology/criminal justice. So I have started a spreadsheet to collate different open source network datasets relevant for criminologists. So I have some from my work and a few other random examples I have come across on the internet.

SPREADSHEET OF NETWORK DATASETS

I have made that spreadsheet open, so anyone should be able to edit in more sources. (Feel free to include links to ICPSR as well, but if you do edit a note to say whether it is restricted access or not.) For here I would be interested in really large networks, for example would love to try to replicate Marie’s work on gang network transitions (Oullet et al., 2019a).

And also while I am here, Jacob Young has created a very nice introductory course to social network analysis. I have a brief lecture in my advanced research design class, but Jacob’s is much more thorough (and he is more of an expert in this area than I am for sure).

I will add to that spreadsheet over time as well. I have made a separate sheet for survival analysis datasets. I would be particularly keen for example criminal justice examples. So for network analysis we have examples of looking at use-of-force networks (Oullet et al., 2019b), and for survival analysis I would be interested in a time to solve example dataset. Unfortunately for the solved cases, NIBRS is a good resource but has a large confound in they don’t measure whether a case was ever assigned to a detective.

Feel free to add whatever in that spreadsheet, but what I was thinking was oriented towards different methods (again as a main motivation is for teaching). So for example if you knew of datasets for age-period-cohort modelling, or for estimating group-based-trajectory models, I think those would be good examples to start new sheets and collate different data sources.

References

  • Ouellet, M., Bouchard, M., & Charette, Y. (2019a). One gang dies, another gains? The network dynamics of criminal group persistence. Criminology, 57(1), 5-33.
  • Ouellet, M., Hashimi, S., Gravel, J., & Papachristos, A. V. (2019b). Network exposure and excessive use of force: Investigating the social transmission of police misconduct. Criminology & Public Policy, 18(3), 675-704.
  • Wheeler, A. P., Worden, R. E., & Silver, J. R. (2019). The accuracy of the violent offender identification directive tool to predict future gun violence. Criminal Justice and Behavior, 46(5), 770-788.

CrimRxiv, Alt-Journal Contributions, and Mike Maltz’s Retrospective

As I’m sure followers of mine know, I am a big proponent of posting pre-prints. Spearheaded by Scott Jacques, he has started a specifically criminology focused pre-print server title CrimRxiv. It is still in beta but anyone can contribute a paper if they want.

One of the things me and Scott have been jamming about is how to leverage crimrxiv to make a journal that not only takes advantage of all the goodies on the internet, such as being able to embed interactive graphics or other rich media directly in a journal articles. But to really widen the scope of what ‘counts’ in terms of scholarly contribution. Why can’t things like a cool app, or a really good video lecture you edited, or a blog post illustrating code be put on the same level with journal articles?

Part of the reason I am writing this blog post is that I saw Michael Maltz recently publish a retrospective on his career on Academia.edu. This isn’t a typical journal article, but despite that there is no reason why you shouldn’t share such pieces. So I was able to convince Mike to post A Retrospective Look at My Professional Life to crimrxiv. When he first posted it on Academia.edu here was my response on how Mike (despite never having crossed paths) has influenced my career.


Hi Michael and thank you for sharing,

I’ve followed your work since a grad student at Albany. I initially got hooked on data viz based on Tufte’s book. When I looked for examples of criminologists discussing data viz you were the only one I found. That was sometime around 2010, so you had that chapter in the handbook of quantitative crim. You also had another article about drawing glyphs to illustrate life course transitions I was familiar with.

When I finished my classes at SUNY, I then worked at Troy as a crime analyst while finishing my dissertation. I doubt any of the coffee shops were the same from your time, but I did like walking over to Famous hotdogs for lunch every now and then.

Most of my work at the PD was making time series graphs and maps. No regression, so most of my stats training was not particularly useful. Even my mapping course I took focused on areal data analysis was not terribly relevant.

I tried to do similar projects to your glyph life-courses with interval censored crime data, but I was never really successful with that, they always ended up being too complicated with even moderately large crime datasets, see https://andrewpwheeler.com/2013/02/28/interval-graph-for-viz-temporal-overlap-in-crime-events/ and https://andrewpwheeler.com/2014/10/02/stacking-intervals/ for my attempts.

What was much more helpful was simply doing monitoring metrics over time, simple running means, and then I just inverted the PDF of the Poisson to give error bars, e.g. https://andrewpwheeler.com/2016/06/23/weekly-and-monthly-graphs-for-monitoring-crime-patterns-spss/. Then cases that were outside the error bands signified an anomalous pattern. In Troy there was an arrest of a single prolific person breaking into cars, and the trend went from a creeping 10 year high to a 10 year low instantly in those graphs.

So there again we have your work on the Poisson distribution and operations research in that JQC article. Also sometime in there I saw a comment you made on Andrew Gelman’s blog pointing to your work with error bands for BJS. Took that ‘fan chart’ idea later on and provided error bands for city level and USA level homicide trends, e.g. https://apwheele.github.io/MathPosts/FanChart_NewOrleans.html. Most of popular discussion of large scale crime trends is misguided over-interpreting short term noise in my opinion.

So all my degrees are in criminal justice, but I have been focusing more on linear programming over time borrowing from operations researchers as well, https://andrewpwheeler.com/2020/05/29/an-intro-to-linear-programming-for-criminologists/. I’ve found that taking outputs from a predictive model and then applying a decision analysis to specifically articulate strategies CJ agencies should take is much more fruitful than the typical way academic research is done.

Thank you again for sharing your story and best, Andy Wheeler

New paper out: Trauma Center Drive Time Distances and Fatal Outcomes among Gunshot Wound Victims

A recent paper with Gio Circo, Trauma Center Drive Time Distances and Fatal Outcomes among Gunshot Wound Victims, was published in Applied Spatial Analysis and Policy. In this work, me and Gio estimate the marginal effect that drive time distances to the nearest Level 1 trauma center have on the probability a victim dies of a gun shot wound, using open Philadelphia data.

If you do not have access to that published version, here is a pre-print version. (And you can always email me or Gio and ask for a copy.) Also because we use open data, we have posted the data and code used for the analysis. (Gio did most of the work!)

For a bit of the background on the project, Gio had another paper estimating a similar model using Detroit data. But Gio estimated those models with aggregate data. I was familiar with more detailed Philly shooting data, as I used it for an example hot spot cluster map in my GIS crime mapping class.

There are two benefits to leveraging micro data instead of the aggregated data. One is that you can incorporate micro level incident characteristics into the model. The other is that you can get the exact XY coordinates where the incident occurred. And using those exact coordinates we calculate drive time distances to the hospital, which offer a slight benefit in terms of leave-one-out cross-validated accuracy compared to Euclidean distances.

So in terms of incident level characteristics, the biggest factor in determining your probability of death is not the distance to the nearest hospital, but where you physically get shot on your body. Here is a marginal effect plot from our models, showing how the joint effect of injury location (as different colors) and the drive time distance impact the probability of death. So if you get shot in the head vs the torso, you have around a 30% jump in the probability of death from that gun shot wound. Or if you get shot in an extremity you have a very low probability of death as well.

But you can see from that the margins for drive times are not negligible. So if you are nearby a hospital and shot in the torso your probability of dying is around 20%, whereas if you are 30 minutes away your probability rises to around 30%. You can then use this to map out isochrone type survivability estimates over the city. This example map is if you get shot in the torso, and the probability of death based on the drive time distance to the nearest Level 1 trauma location.

Fortunately many shootings do not occur in the northern most parts of Philadelphia, here is a map of the number of shootings over the city for our sample.

You can subsequently use these models to either do hypothetical take a trauma center away or add a trauma center. So given the density of shootings and drive time distances, it might make sense for Philly to invest in a trauma center in the shooting hot spot in the Kensington area (northeast of Temple). (You could technically figure out an ‘optimal’ location given the distribution of shootings, but since you can’t just plop down a hospital wherever it would make more sense to do hypothetical investments in current hospitals.)

For a simplified example, imagine you had 100 shootings in the torso that were an average 20 minutes away. The average probability of death in that case is around 25% (so ~25 homicides). If you hypothetically have a location that is only 5 minutes away, the probability goes down to more like 20% (so ~20 homicides). So in that hypothetical, the distance margin would have prevented 5 deaths.

One future piece of research I would be interested in examining is pre-post Shotspotter. So in that article Jen Doleac is right in that the emipirical evidence for Shotspotter reducing shootings is pretty flimsy, but preventing mortality by getting to the scene faster may be one mechanism that ShotSpotter can justify its cost.

Using Association Rules to Conduct Conjunctive Analysis

I’ve suggested to folks a few times in the past that a popular analysis in CJ, called conjunctive analysis (Drawve et al., 2019; Miethe et al., 2008; Hart & Miethe, 2015), could be automated in a fashion using a popular machine learning technique called association rules. So I figured a blog post illustrating it would be good.

I was motivated by some recent work by Nix et al. (2019) examining officer involved injuries in NIBRS data. So I will be doing a relevant analysis (although not as detailed as Justin’s) to illustrate the technique.

This ended up being quite a bit of work. NIBRS is complicated, and I had to do some rewrites of finding frequent itemsets to not run out of memory. I’ve posted the python code on GitHub here. So this blog post will be just a bit of a nicer walkthrough. I also have a book chapter illustrating geospatial association rules in SPSS (Wheeler, 2017).

A Brief Description of Conjunctive Analysis

Conjunctive analysis is more of an exploratory technique examining high cardinality categorical sets. Or in other words, you search though a database of cases that have many categories to find “interesting” patterns. It is probably easier to see an example than for me to describe it. Here is an example from Miethe et al. (2008):

You can see that here they are looking at characteristics of drug offenders, and then trying to identify particular sets of characteristics that influence the probability of a prison sentence. So this is easy to do in one dimension, it gets very difficult in multiple dimensions though.

Association rules were created for a very different type of problem – identifying common sets of items that shoppers buy together at the same time. But you can borrow that work to aid in conducting conjunctive analysis.

Data Prep for NIBRS

So here I am using 2012 NIBRS data to conduct analysis. Like I mentioned, I was motivated by the Nix and company paper examining officer injuries. They were interested in specifically examining officer involved injuries, and whether the perception that domestic violence cases were more dangerous for officers was justified.

For brevity I only ended up examining five different variable sets in NIBRS (Justin has quite a few more in his paper):

  • assault (or injury) type V4023
  • victim/off relationship V4032
  • ucr type V2006
  • drug use V2009 (also includes computer use!)
  • weapon V2017

All of these variables have three different item sets in the NIBRS codes, and many categories. You will have to dig into the python code, 00_AssocRules.py in the GitHub page to see how I recoded these variables.

Also maybe of interest I have some functions to do one-hot encoding of wide data. So a benefit of NIBRS is that you can have multiple crimes in one incident. So e.g. you can have one incident in which an assault and a burglary occurs. I do the analysis in a way that if you have common co-crimes they would pop out.

Don’t take this as very formal though. Justin’s paper which used 2016 NIBRS data only had 1 million observations, whereas here I have over 5 million (so somewhere along the way me and Justin are using different units of analysis). Also Justin’s incorporates dozens of other different variables into the analysis I don’t here.

It ends up being that with just these four variables (and the reduced sets of codes I created), there still end up being 34 different categories in the data.

Analysis of Frequent Item Sets

The first part of conjunctive analysis (or association rules) is to identify common item sets. So the work of Hart/Miethe is always pretty vague about how you do this. Association rules has the simple approach that you find any item sets, categories in which a particular itemset meets an arbitrary threshold.

So the way you represent the data is exactly how the prior Miethe et al. (2008) data showed, you create a series of dummy 0/1 variables. Then in association rules you look for sets in which for different cases all of the dummy variables take the value of 1.

The code 01_AssocRules.py on GitHub shows this going from the already created dummy variable data. I ended up writing my own function to do this, as I kept getting out of memory errors using the mlextend library. (I don’t know if this is due to my data is large N but smaller number of columns.) You can see my freq_sets function to do this.

Typically in association rules you identify item sets that meet a particular support threshold. Support here just means the proportion of cases that those items co-occur. E.g. if 20% of cases of assault also have a weapon of fists listed. Instead though I wrote the code to have a minimum N, which I choose here to be 1000 cases. (So out of 5 million cases, this is a support of 1/5000.)

I end up finding a total of 411 frequent item sets in the data that have at least 1000 cases (out of the over 5 million). Here are a few examples, with the frequencies to the left. So there are over 2000 cases in the 2012 NIBRS data that had a known relationship between victim/offender, resulted in assault, the weapon used was fists (or kicking), and involved computer use in some way. I only end up finding two itemsets that have 5 categories and that is it, there are no higher sets of categories that have at least 1000 cases in this dataset.

3509    {'rel_Known', 'ucr_Assault', 'weap_Fists', 'ucr_Drug'}
2660    {'rel_Known', 'ucr_Assault', 'weap_Firearm', 'ucr_WeaponViol'}
2321    {'rel_Known', 'ucr_Assault', 'weap_Fists', 'drug_ComputerUse'}
1132    {'rel_Known', 'ucr_Assault', 'weap_Fists', 'weap_Knife'}
1127    {'ucr_Assault', 'weap_Firearm', 'weap_Fists', 'ucr_WeaponViol'}
1332    {'rel_Known', 'ass_Argument', 'rel_Family', 'ucr_Assault', 'weap_Fists'}
1416    {'rel_Known', 'rel_Family', 'ucr_Assault', 'weap_Fists', 'ucr_Vandalism'}

Like I said I was interested in using NIBRS because of the Nix example. One way we can then examine what variables are potentially related to officer involved injuries during a commission of a crime would be to just pull out any itemsets which include the variable of interest, here ass_LEO_Assault.

4039    {'ass_LEO_Assault'}
1232    {'rel_Known', 'ass_LEO_Assault'}
4029    {'ucr_Assault', 'ass_LEO_Assault'}
1856    {'ass_LEO_Assault', 'weap_Fists'}
1231    {'rel_Known', 'ucr_Assault', 'ass_LEO_Assault'}
1856    {'ucr_Assault', 'ass_LEO_Assault', 'weap_Fists'}

So we see there are a total of just over 4000 officer assaults in the dataset. Unsurprisingly almost all of these also had an UCR offense of assault listed (4029 out of 4039).

Analysis of Association Rules

Sometimes just identifying the common item sets is what is of main interest in conjunctive analysis (see Hart & Miethe, 2015 for an example of examining the geographic characteristics of crime events).

But the apriori algorithm is one way to find particular rules that are of the form if A occurs then B occurs quite often, but swap out more complicated itemsets in the antecedent (A) and consequent (B) in the prior statement, and different ways of quantifying ‘quite often’.

I prefer conditional probability notation to the more typical association rule one, but for typical rules we have (here I use A for antecedent and B for consequent):

  • confidence: P(A & B) / P(B). So if the itemset of just B occurs 20% of the time, and the itemset of A and B together occurs 10% of the time, the confidence would be 50%. (Or more simply the probability of B conditional on A, P(B | A)).
  • lift: confidence(A,B) / P(B). This is a ratio of the baseline a category occurs for the denominator, and the numerator is the prior confidence category. So if you have a baseline B occurring 25% of the time, and the confidence of A & B is 50%, you would then have a lift of 2.

There are other rules as well that folks use, but those are the most common two I am interested in.

So for example in this data if I draw out rules that have a lift of over 2, I find rules like {'ucr_Vandalism', 'rel_Family'} -> {'ass_Argument'} produces a lift of over 6. (I can use the mlextend implementation here in this code, it was only the frequent itemsets code that was giving me problems.) So it ends up being arguments are listed in the injury codes around 1.6% of the time, but when you have a ucr crime of vandalism, and the relationship between victim/offender are family members, injury type of argument happens around 10.5% of the time (so 10.5/1.6 ~= 6).

The original use case for this is recommender systems/market analysis (so say if you see someone buy A, give them a coupon for B). So this ends up being not so interesting in this NIBRS example when you have you have more clear cause-effect type relationships criminologists would be interested in. But I describe in the next section some further potential machine learning models that may be more relevant, or how I might in the future amend the apriori algorithm for examining specific outcomes.

Further Notes

If you have a particular outcome you are interested in a specific outcome from the get go (so not so much totally exploratory analysis as here), there are a few different options that may make more sense than association rules.

One is the RuleFit algorithm, which basically just uses a regularized regression to find simple models and low order interactions. An example of this idea using police stop data is in Goel et al. (2016). These are very similar in the end to simple decision trees, you can also have continuous covariates in the analysis and it splits them into binary above/below rules. So you could say do RTM distance analysis, and still have it output a rule if < 1000 ft predict high risk. But they are fit in a way that tend to behave better out of sample than doing simple decision trees.

Another is fitting a more complicated model, say random forests, and then having reduced form summaries to describe those models. I have some examples of using shapely values for spatial crime prediction in Wheeler & Steenbeek (2020), but for a more if-then type sets of rules you could look at Scoped Rules.

I may need to dig into the association rules code some more though, and try to update the code to take the sample sizes and statistical significance into account for a particular outcome variable. So if you find higher lift in a four set predicting a particular outcome, you search the tree for more sets with a smaller support in the distribution. (I should probably also work on some cool network viz. to look at all the different rules.)

References

 

300 blog posts and public good criminology

This isn’t technically my 300th blog post, but the 300th page I’ve constructed on my blog (so e.g. it includes when I’ve made a page for a class). I’ve posted a spreadsheet of the titles and dates of the posts over time (and updating it I noticed I was at 300).

I typically get around 200~300 views per day. Most of these are probably bots, but unless say over 90% are bots this website gets way more views than the cumulative views of all my academic papers combined. Here is a screen shot of the stats wordpress gives to me. My downtick in 2019 I thought was going to spiral into very few views, but it is still holding on.

I kind of have three different types of blog posts. One are example code snippets/data analysis. Often these are things I have done multiple times, so I want to create a record for me to more easily search up later. For example making a hexbin map in ggplot, or a margins plot in Stata. I wrote a recent post because I was talking with a friend about crime weights, and I wanted an example of using regression in python and an error bar plot for my library. (Quite a few birds with that stone.)

Two are questions I repeatedly encounter by students. For example, I made a list of demographic variables I use in the census, and where to find or scrape crime generator variables. Consistently my most popular post is testing the equality of two regression coefficients.

The third are just more generic opinion pieces. For example my notes on (the now late) David Bayley’s writing on the police potential to reduce crime, or Jane Jacob’s take on neighborhoods, or that I don’t think latent trajectories are real things.

Some are multiple of these categories put together, particularly opinion pieces with example code snippets to illustrate the points I am making. Like a simulation of why I like to model individual delinquency items, or how to balance false positives in bail decisions.

On Public Good Criminology

None of these per se fit in the example framework of typical peer review output. So despite no peer review, I think things like deriving optimal treatment allocation with network spillovers, or that conformal predictions intervals for synthetic control estimates are much smaller than permutation tests are a substantive contribution to share!

So that brings me to the public good point. Most criminologists have a default of only valuing a closed peer review system. Despite my blog posts not being peer reviewed (ditto for the pre-prints I post at first), I hope folks can take the time to judge for themselves whether they are valuable or not. We would be much better off as a group if we did things like share code, share class preps, or failed projects by default.

Some of these posts I might write up if we had a short journal for our field akin to Economics Letters, but even that is a lot of work for very little value added to be frank. (If I had infinite time I also might turn my notes on Poisson/Negative Binomial regression into a little Sage green book.) Being a private sector data scientist now without the tenure boot on my neck, I don’t really have any need or desire to go through that process.

If all you value are getting the opinions of a handful of other academics than by all means keep your work close to the chest and only publish in peer reviewed journals. If you want to provide a public good though, your work actually needs to be public.

Conjoint Analysis of Crime Rankings

So part of my recent research mapping crime harm spots uses cost of crime estimates relevant to police departments (Wheeler & Reuter, 2020). But a limitation of this is that cost of crime estimates are always somewhat arbitrary.

For a simple example, those cost estimates are based mostly on people time by the PD to respond to crimes and devote investigative resources. Many big city PDs entirely triage crimes like breaking into vehicles though. So based on PD response the cost of those crimes are basically $0 (especially if PDs have an online reporting system).

But I don’t think the public would agree with that sentiment! So in an act of cognitive dissonance with my prior post, I think asking the public is likely necessary for police to be able to ultimately serve the publics interest when doing valuations. For some ethical trade-offs (like targeting hot spots vs increasing disproportionate minority contact, Wheeler, 2019) I am not sure there is any other reasonable approach than simply getting a bunch of peoples opinions.

But that being said, I suspected that these different metrics would provide pretty similar rankings for crime severity overall. So while it is criminology 101 that official crime and normative perceptions of deviance are not a perfect 1 to 1 mapping, most folks (across time and space) have largely similar agreement on the severity of different crimes, e.g. that assault is worse than theft.

So what I did was grab some survey ranking of crime data from the original source of crime ranking that I know of, Marvin Wolfgang’s supplement to the national crime victimization survey (Wolfgang et al., 2006). I have placed all the code in this github folder to replicate. And in particular check out this Jupyter notebook with the main analysis.

Conjoint Analysis of Crime Ranks

This analysis is often referred to as conjoint analysis. There are a bunch of different ways to conduct conjoint analysis – some ask folks to create a ranked list of items, others ask folks to choose between a list of a few items, and others ask folks to rank problems on a Likert item 1-5 scale. I would maybe guess Likert items are the most common in our field, see for example Spelman (2004) using surveys of asking people about disorder problems (and that data is available to, Taylor, 2008).

The Wolfgang survey I use here is crazy complicated, see the codebook, but in a nutshell they had an anchoring question where they assigned stealing a bike to a value of 10, and then asked folks to give a numeric score relative to that theft for a series of 24 other crime questions. Here I only analyze one version of the questionnaire, and after eliminating missing data there are still over 4,000 responses (in 1977!).

So you could do analyze those metric scores directly, but I am doing the lazy route and just doing a rank ordering (where ties are the average rank) within person. Then conjoint analysis is simply a regression predicting the rank. See the notebook for a more detailed walkthrough, so this just produces the same analysis as looking at the means of the ranks.

About the only thing I do different here than typical conjoint analysis is that I rescale the frequency weights (just changes the degrees of freedom for standard error estimates) to account for the repeated nature of the observations (e.g. I treat it like a sample of 4000 some observations, not 4000*25 observations). (I don’t worry about the survey weights here.)

To test my assertion of whether these different ranking systems will be largely in agreement, I take Jerry’s crime harm paper (Ratcliffe, 2015), which is based on sentencing guidelines, and map them as best I could to the Wolfgang questions (you could argue with me some though on those assements – and some questions don’t have any analog, like a company dumping waste). I rescaled the Wolfgang rankings to be in a range of 1-14, same as Jerry’s, instead of 1-25.

Doing a more deep dive into the Wolfgang questions, there are definately different levels in the nature of the questions you can tease out. Folks clearly take into account both harm to the victim and total damages/theft amounts. But overall the two systems are fairly correlated. So if an analyst wants to make crime harm spots now, I think it is reasonable to use one of these ranking systems, and then worry about getting the public perspective later on down the line.

The Wolfgang survey is really incredible. In this regression framework you can either adjust for other characteristics (e.g. it asks about all the usual demographics) or look at interactions (do folks who were recently victimized up their scores). So this is really just scratching the surface. I imagine if someone redid it with current data many of the metrics would be similar as well, although if I needed to do this I don’t think I would devise something as complicated as this, and would ask people to rank a smaller set of items directly.

References

  • Ratcliffe, J.H. (2015). Towards an index for harm-focused policing. Policing: A Journal of Policy and Practice, 9(2), 164-182.
  • Spelman, W. (2004). Optimal targeting of incivility-reduction strategies. Journal of Quantitative Criminology, 20(1), 63-88.
  • Taylor, R.B. (2008). Impacts of Specific Incivilities on Responses to Crime and Local Commitment, 1979-1994: [Atlanta, Baltimore, Chicago, Minneapolis-St. Paul, and Seattle]. https://doi.org/10.3886/ICPSR02520.v1
  • Wheeler, A.P., & Reuter, S. (2020). Redrawing hot spots of crime in Dallas, Texas. https://doi.org/10.31235/osf.io/nmq8r
  • Wheeler, A.P. (2019). Allocating police resources while limiting racial inequality. Justice Quarterly, Online First.
  • Wolfgang, M.E., Figlio, R.M., Tracy, P.E., and Singer, S.I. (2006). National Crime Surveys: Index of Crime Severity, 1977. https://doi.org/10.3886/ICPSR08295.v1