Notes on using UCR data for class projects

Students in my classes often want to use UCR reported data for projects. One thing many don’t realize though is that the UCR data reported to the FBI is only aggregate statistics at regular intervals for the entire jurisdiction. So for example one can’t look at hot spots using reported UCR data.

If you do have a hypothesis that can be reasonably examined using monthly or yearly data at the jurisdiction level, here are a few notes on using UCR data. First is that you can get the most detailed downloads of data from ICPSR. That link has data series going back to 1960, and ends up being about two years behind (e.g. it is close to the end of 2017, and only 2015 data is available).

The datasets on ICPSR have monthly data for Part 1 crime types, as well as some information on arrests and clearances. Also they have all of the individual agencies, along with their ORI code. The ORI code allows you to link agencies over time.

While the FBI does have a page for more up to date UCR data (they just released the 2016 stats, so they are about a year behind), they are much more limited in the types of tables they disseminate. There typically is one table for Part 1 crime rates for individual large cities for each year, but otherwise it is aggregated to different city sizes. So most data analyses need to use the ICPSR data — the data directly from the FBI is not detailed enough.

For those wishing to map the data, it ends up being a bit tricky. Most people in the US are probably under the jurisdiction of at least two police departments — the local PD and the state police. Many people are also under the jurisdiction of a local sheriff. So many of these police agencies have overlapping boundaries. There is no easy source of the geographic boundaries for the police departments, but the ICPSR data does contain the zipcode for the headquarters for the police department. This won’t be accurate for state police — but should be suitable for mapping purposes for local agencies and sheriffs (sheriffs are sometimes organized at the county level). If you want polygon data for jurisdictional boundaries you will need to search for individual agencies and political boundaries — there is no easy source to download them all at once. Many rural areas will have police departments cover multiple towns, but if you stick to more urban areas you might be able to use city boundaries.

The ICPSR data has crime reports aggregated to the county level, so if that level of aggregation is not problematic you may use that data directly. You should be aware of many of the complaints about UCR data quality though. Mike Maltz has written a bit about it, but there are quite a few other folks who have noticed problems with reporting in the UCR data. The main problem to watch out for is missing data being accidentally reported as zero crimes occurring.

To stack datasets from different years from ICPSR is not too difficult if you are not going too far back in time. But if you go back to the older data, ICPSR changed the variable order. The variables are simply listed as V1 TO V100 something, so for example V15 in 1979 is not the same variable as V15 in 2005. My notes say they used the same variable order from 1998-2015, but you will want to check that yourself (I downloaded the SPSS files, it would not surprise me if the datasets differed for some of the years.)

Some additional resources students may want to familiarize themselves with to gather UCR data more quickly are the FBI UCR data tool and Mike Maltz’s cleaned up dataset and notes on how he made it. You should probably just use Mike Maltz’s dataset if you are using data over time.

If you are just interested in yearly homicides, I have provided a dataset of cleaned up homicides that goes back to 1960, see my paper that goes along with that dataset on graphing temporal homicide trends (mapping those trends could be an interesting project as well!)