Admin data should be used more often in policing research

I sometimes wonder if many researchers do not know actually what data police departments regularly collect. I commonly see articles on topics and think to myself “Hey, that is nice you did a survey on XYZ, why did you not confirm the responses with actual admin data on the same topic?”. Or I see topics that can be reasonably addressed using admin data not tackled at all by researchers.

So I decided to write this blog post.

I’ve mostly to date made a career out of analyzing administrative police data (only 2 out of my 30 some peer reviewed papers at this point are using non-regularly collected data as part of the analysis – and both of those link surveys to official crime records). To be honest I’m also motivated to write this as it is common for senior academics (in general in criminology, not just specific to policing researchers) to critique secondary data analysis (some of those folks are curmudgeons though, so maybe not worth stating). Of course you can do bad analysis with whatever data – primary or secondary makes no difference.

I think the default though should be to leverage admin data, so this sentiment I believe is in general misguided, and results in a lot of waste (time and money spent on primary data collection). I have never received research funding directly in my career (only as an RA for Rob Worden), so my work has essentially been for “free” on these projects (just my time). (I was basically subsidized by the university to do research!)

My opinion is based on two key points:

  1. Administrative data has already been collected by police agencies, so it has no additional costs for use by researchers.
  2. Administrative data defines core outcomes to which police agencies strive to reduce.

For 2 in particular this is reducing reported crime and reducing use of force. (Use of force can be conceived of as an “output” instead of an “outcome”, but I tend to think of it as a negative externality that should be minimized to the extent possible.) I’m sure a few folks are thinking here “these don’t define the potential universe of outcomes police departments are interested in” and I agree – permit me to discuss this in more detail in a few paragraphs. The argument I am making is ultimately fuzzy – not that we shouldn’t collect other data, but it should meet a higher threshold than using zero-cost data already collected by PDs.

What is Admin Policing Data?

For folks not familiar, police departments keep electronic records of various things, mostly related to crime and interactions with the public. All police departments I have worked with have these types of records in various tables/databases:

  • calls to 911 (Computer Automated Dispatch)
  • reported crimes and incidents
  • charges & arrests
  • discretionary stops (traffic and pedestrian)
  • use of force

All of these tables you can link to individual officers and/or individual citizens, as well as have a date-time and location stamp of where it happened. So you can do things like see all the cases detective X has been assigned and his specific clearance rate, or all cases in which Y was listed as a victim, or see the stop/use-of-force patterns of officer Z over time, etc.

Other types of admin data that are pretty regular are pysch screenings (especially for newer officers), civilian complaints, plain text detective/case notes, gang related databases (people/tags/incidents), databases of reported/recovered stolen goods, etc. Police collect alot of data! At this point PDs often have this data going back over a decade.

How often is Admin Policing Data Used in Policing Journal Articles?

To illustrate my point about admin data should be used more in policing research, I took the most recent issues of several policing journals and counted up the articles that used admin data. (There are probably more policing journals I missed, sorry, these are the ones I know of/have submitted articles to in the past.)

So this is a total of 14/50 ~28% in this sample. This is actually higher than I expected (I guessed 10%). Looking at the first issue of Police Quarterly for 2020 it is 0/5. The Policing Policy and Practice issue also contained a special sub-issue on recruit training, among them 0/6 likely contained administrative data. The Policing an International journal first issue of 2020 had a special issue on cyber crime, which appears to me have 2/14 papers using admin data. So if I add those stats, it is 16/75 ~ 21%.

I may be undercounting admin data here; for example I assume a survey of recruits is not a regular data collection (it hasn’t been in any police agency I’ve been involved with), but I of course may be wrong.

I’ve included as admin data looking at detective case notes (it is sort of like secondary analysis of a qualitative dataset!). Also counted as admin data one article that used the NCVS – which is regularly collected data (but by the federal govt, not local PD).

So you may squabble with my definitions here, but in broad strokes I don’t think any reasonable definition is likely to push this above ~1/3 papers in policing research use regularly collected admin data (in this sample of policing journals).

For reference I did a Twitter poll asking what proportion of policing research folks thought used admin data, and the distribution of the 86 responses was a slight favor for the right category (under 1/3rd, but almost the same amount guessed over 2/3’s).

So you can see a significant number of folks think that the distribution is opposite what it is in practice – the majority, not the minority, of policing research uses specially collected data and ignores admin data.

Restricting the subset to policing journals is likely to bias the estimate downward somewhat. I bet if I pulled policing articles from say Journal of Experimental Crim or Crime Science they are closer to 100% using admin policing data. But I think that also illustrates a pretty big discord in the current field of policing as well.

Some may think this cuts the research in terms of criminology/criminal justice – policing journals publish work on examining police behavior, whereas other journals tend to more frequently look at crime outcomes more associated with “criminological” research. This may be true, but admin data collected by police departments are pretty relevant for examining police behavior (e.g. proactive stops, use of force). These admin measures are almost always more relevant to police behavior than surveys of opinions! If you do surveys you should often tie it to these other admin measures to provide secondary evidence of different relevant measures.

Whats Wrong with Collecting New Data?

My argument is explicitly value-laden – I don’t know the correct percent of policing research that should use admin police data. But I do think the current swing in which the clear majority of research is oriented to collect primary data is wrong. Those primary data collections have both more costs (above data already collected by police agencies) and, for the most part, ignore core outcomes to which PDs strive for.

For example, the National Institute of Justice has stated they want researchers to move away from admin data. One reason for this is that past researchers have been unsuccessful lowering crime, and so you should collect alternative measures to validate your intervention.

This I believe is an actively harmful perspective called “goal switching,” and in general makes little sense. If crime is so rare a study is ultimately poorly powered, there isn’t much potential benefit to reducing crime in that area even if the intervention does work in practice. Best case you need to do longer interventions. I mean if you want to reduce violent crime you can look at community sentiment if you want; it doesn’t make sense though to entirely drop the ultimate goal of violence reduction in its place though!

And this gets to the crux of core outcomes police should strive for. It is a normative question, but I believe reduced crime and reduced use of force are relatively well agreed upon general goals of police. I think it is OK to have secondary measures – such as say attitudes towards police or fear of crime or measures of police stress. But these measures have several things working against them.

One, they are not regularly collected as administrative datasets. I imagine you can troll up a few examples of PDs who have started to do regular surveys of attitudes towards police (either general public or specific post-PD contact), but vast majority have not. So say you have an intervention intended to improve attitudes towards police. Great! For a police department interested in implementing that program, they not only have to allocate resources to that project, but also put an item in the budget to do the surveys forever. (This isn’t always true though, I think for example Rylan Simpson’s work is strong enough to justify making those low cost appearance changes and you don’t need to forever do surveys to see if it is working.) But for most interventions you can’t just do it once and hope it has improved indefinitely! (Same as you can’t stop measuring crime just because something you did made crime go down one time.)

Two, they are pretty fuzzy as to whether they should be reasonably swapped out for goals of crime reduction and reduced use of force in-and-of themselves. For sake of argument say hot spots policing causes back fire effects that cause increased fear of crime. How exactly do you trade off fear of crime vs actual crime reduction? Personally I think actual crime reductions should take precedence in that scenario. If you want to justify actually measuring fear of crime, you need to make some value based arguments to justify at minimum the cost of doing surveys. You should also probably justify altering police behavior in a particular way to improve that particular metric as well.

So any time you do a secondary data collection, you need to actually valuate the costs of the measures somehow (which I know is very difficult, hence it makes more sense to default to using admin data that is costless in terms of research!) Costless is probably a bit of a misnomer though – police departments have already sunk a lot of resources into collecting that admin data (patrol officers likely spend about equal time on dealing with people as they do with paperwork). But it is costless in terms of capital for me to query a database and say “use of force went down 10% after you instituted this policy”.

I think plenty of research collecting unique measures has potential to meet this threshold. One of the motivations to write this was Lois James articles on EIS – I think her general idea of doing a more deep dive to tease out more detailed interaction measures could be really important work (especially if it can be automated in a particular way, say through BWC footage). Lois’s work is just one example though. I also think measures of say police stressors could be very important in measuring churn of police officers over time. I already stated I think Rylan Simpson’s work on perceptions of police is well justified based on his simple experiments (since they are very low cost interventions, like wear purple gloves instead of black, or no cost e.g. take off your sunglasses when interviewing folks).

So these have potential to be worth the cost for police departments to open up their pocket books and collect those measures, but that is a bridge further than the majority of research currently being publishing in policing journals.

Some Caveats

So this is like I said a value-laden and fuzzy argument. No doubt some folks doing qualitative research or surveys will think this is loathsome, and think “I can’t answer my research question using administrative data”.

I intend the argument to go the other way though – we can be doing so much more quality research for much less cost. It is also the case that folks I believe need in general to do a much better job tying contemporary policing research to actual real life outcomes such as crime and use of force. Like I said I think the default should be basically the opposite proportion of what policing research looks like at the moment.

I’m not saying folks can’t do more basic data measures and collection – but as is the vast majority of this research lacks any semblance of a cost-benefit analysis that would justify the cost to collect those measures. As is, even if folks hypotheses are validated in a one time data collection, they lack the necessary valuation to justify police departments implement those measures going forward in practice. (Many of these same valuation critiques apply to the use of technology in policing, although it is the obverse, not much academic work but plenty of sinking $$ into tech with little return in terms of measurable outcomes.)

One thing I have not touched on is access. Folks may be thinking “I can’t get access to that info!”. You actually probably can though – I don’t know a PD that would let you do a survey or interviews that also wouldn’t share much of this admin data.

Another thing I have not touched on is bias in admin data. That deserves a whole additional blog post. It is a fair critique in part (bias no doubt exists, it is quantifying how large and its impact on the analysis is the question). The majority of the work in these policing journals though is not using alternative measures to get around bias in admin data though, they are measuring totally different things (as I said goal switching to totally different outcomes).