Using Google Fusion Tables to make some maps!

In the past to share interactive maps with others I’ve used BatchGeo and CartoDB. BatchGeo is super easy to geocode a few incidents, and CartoDB has a few more stylistic options (including some very cool animations). Both of these projects have a limit on the number of points you can map with the free service though. The new Google maps allows you make similar products to BatchGeo and CartoDB, in that you can upload a csv file or kml and then do some light editing of the points, and then embed an iframe in a website if you want (I wish Google Maps had a time slider like Google Earth does). Here is an example from my PhD of a few locations that one of my original models did a very poor job of predicting the amount of crime at the street midpoint or intersection.

But a few recent projects I wanted to place many more geographies on the map than these free versions allow. ArcGIS online is pretty slick in my few tests, but I am settling on Google Fusion tables for the ability to link the geographies and data tables (plus the ability to filter is very nice). Basically you can upload your data table and kml in seperate Fusion tables and then merge them to create your own polygons with associated data. Here is another example from my dissertation and embedded map below.

Basically what I do is make a set of units of analysis based on street mid-points and intersections. I then divide the city up based on the Thiessen polygons of those sets of points for the allocation of different areal measures. E.g. I can calculate the overlap of the Thiessen polygon with the area of sidewalks.

I’m using Google Fusion tables for some other projects in which I want people within the PD to be able to interactively explore the data. My main interest in these slippy maps are that you can pan and zoom – and with a static map it is hard to recreate all of the potential views a consumer of the map wants. I can typically make a nicer overview map of the forest or any general data patterns in a static map, but if I think the user of the map will want to zoom in to particular locations these interactive maps meet that challenge. Pop-ups allow for a brief digging into the data as well, but don’t allow for visualizing patterns. Fusion tables are very limited though it the styling of the geography. (All of these free versions are pretty limited, but the Fusion tables are especially restrictive for point symbology and creating choropleth classes).

Using these maps has a trade off when sharing with the PD though. They are what I would call semi-public, in that if you want others to be able to view the map you can share a link, but anyone with the link can see the map. This prevents sharing of intimate information on the map that might be possibly leaked. (For the ability to have access control to more sensitive information, e.g. a user has to sign on to a secure website, I know Bair analytics offers paid for products like that – probably some of the prior web map apps I mentioned do so as well.) I’ve made them in the past for Troy P.D., but I really have no idea how often they were used – so other analysts let me know in the comments if you’ve had success with maps like these disseminating info. within the police department.

I’m getting devilishly close to finishing my dissertation, and I will post an update and link when the draft is complete. My prospectus can be seen here, and the linked maps are part of some supplemental material I compiled. The supplemental info. should provide a little more details on what the maps are showing.

Taking account of the baseline in kernel density maps using CrimeStat

When making kernel density maps sometimes the phenonema is heavily clustered in particular locations simply because the population at risk is uneven over the study space. xkcd puts this in a bit more of laymans terms than I do:

So how do we take into account the underlying population? It depends on the data, but if you actually have population at risk data as points we can make kernel density maps that are the ratio of the cases to the underlying total population. I will show how you can do this type of raster kernel density estimate in CrimeStat using some data on reported assaults and arrests from the city of Chicago.

To make the necessary smooth estimate of the proportions of arrests in CrimeStat you will need two seperate ones, the first primary file should be all arrests of interest, and the secondary file should be all of the incidents (so the arrests are a subset of all incidents). And of course both files need to have the geocoordinates already.

So CrimeStat has a nice GUI to make our KDE maps. So you will be greeted with the following screen after starting the CrimeStat program.

Now we can enter in our data. First click the Select Files button and then navigate to your data file. Here I saved the seperate files as DBFs, and for the primary file I use all of the arrests associated with an assault incident in Chicago in 2013.

Now that the file is loaded into CrimeStat, we need to specify what fields contain the spatial coordinates in the variables section. Then we set the appropriate options in the bottom panels. Here I am using projected coordinates in feet. I don’t specify a time unit so that option is superflous.

Now we can enter in our secondary file, which will be all of the assault incidents in Chicago in 2013. The Seconday File tab is in the set of minor tabs under the larger Data Setup tab (CrimeStat has an incredible number of routines, hence the many options). It is an equivalent workflow as to that of the primary file, import the spreadsheet, and define the fields.

Now we need to set up the reference grid to which we will write the raster output to. Still on the same Data Setup main tab, we then navigate to the Reference file minor tab. Here we specify a set of coordinates (in the particular projection of use) as a rectangle corresponding to the lower left and upper right corners. Then you can control how fine the grid is by specifying a larger number of columns. Here the cell sizes are 300 square feet. Note you can save the particular reference file for future use.

Now we can finally move onto estimating our kernel density map! Now navigate to the main Spatial Modeling I tab and navigate to the Interpolation I minor tab. To make a ratio of our two rasters (which will be the smoothed proportions of arrests). Here we choose the Dual KDE estimation, and specify the normal kernel. Typically for KDE maps the kernel makes very little difference, choosing an appropriate bandwidth impacts the look of the map to a much greater extent. I typically default to around 300 meters, but here I choose a smaller 500 foot bandwidth (we will see this is seriously undersmoothed – but I rather start with undersmoothed than oversmoothed).

The field Area units: points per ends up being superflous when specifying the ratio of the two densities. Clicking on the Save Result to button we can choose to save the output to various geographic data file formats (both vector and raster). Here I specify ArcGIS’s raster format.

Now we are ready to calculate the KDE raster. Simply click the Compute button at the bottom of CrimeStat, and be alittle patient with a dataset of this size (4,000 some arrests and over 17,500 total incidents). After that runs we can then import the rasters into your favorite GIS and make some maps.

You will notice when you first upload the raster there are several strange artifacts. This is a function of places with very few incidents have a low baseline with with to calculate the smoothed proportion of arrests. Unfortunately it appears CrimeStat specifies 0 where null data values should be (places with zero density in the denominator).

A quick fix to this problem though is to make a separate kernel density map of just the incidents and superimpose that on top of your smoothed arrests. Then you can make the zero density areas the same color as the background map so they are filtered out. Here I filter areas that have a incident density of less than <0.02 (these are absolute densities, so they sum to the total number of incidents used to calculate them to begin with).

So below are the final KDE maps. As you can see from all of them 500 feet is seriously undersmoothed, but the absolute densities of incidents and arrests (the two left most maps) appears to be highly correlated. If you look at the hit rate of arrests though in the right most map, the percent of arrests appear to be spatially random.

Other possibilities for similar analysis are say accidents involving injury or pedestrians where the baseline is all accidents, field stops that result in contraband recovery, or comparison of densities before and after an intervention (although here I may take the absolute difference as opposed to the ratio).

Of course this just scratches the surface of possible analysis. When the population at risk is not so conveniently labelled in the data set, such as coming from census geographies, one may consult the literature on dasymetric mapping (also see the head bang procedure in CrimeStat). Bivand et al. (2008) have an example of calculating the ratio raster along with some statistical tests, and the spatstat library has some more convenient functions to accomplish this and map the results (see the relrisk function). One can also estimate a logistic regression model with the spatial coordinates as non-linear predictors (e.g. using splines) and then plot the predicted probabilities for each grid cell.

I’m not sure of a quick global test of whether the proportion of arrests are random though. I thought off-hand you could use a spatial scan test for the case-control data (e.g. using SatScan or similar functions in the spatstat R library), although I’m not sure if that counts as quick.

Article: Viz. techniques for JTC flow data

My publication Visualization techniques for journey to crime flow data has just been posted in the online first section of the Cartography and Geographic Information Science journal. Here is the general doi link, but Taylor and Francis gave me a limited number of free offprints to share the full version, so the first 50 visitors can get the PDF at this link.

Also note that:

  • The pre-print is posted to SSRN. The pre-print has more maps that were cut for space, but the final article is surely cleaner (in terms of concise text and copy editing) and has slightly different discussion in various places based on reviewer feedback.
  • Materials I used for the article can be downloaded from here. The SPSS code to make the vector geometries for a bunch of the maps is not terribly friendly. So if you have questions feel free – or if you just want a tutorial just ask and I will work on a blog post for it.
  • If you ever want an off-print for an article just send me an email (you can find it on my CV. I plan on continuing to post pre-prints to SSRN, but I realize it is often preferable to cite the final in print version (especially if you take a quote).

The article will be included in a special issue on crime mapping in the CaGIS due to be published in January 2015.

Online Crime Mapping for Troy PD

One of the big projects I have been working on since joining the Troy Police Department as a crime analyst last fall is producing timely geocoded data. I am happy to say that a fruit of this labor is the public crime map, via RAIDS Online, that has finally gone public (and can be viewed here). The credit for the online map mainly goes to BAIR Analytics and their free online mapping platform. I merely serve up the data for them to put on the map.

I’ve come to believe that more open data is the way of the future, and in particular an online crime map is a way to engage and enlighten the public to the realities of crime statistics. Although this comes with some potential negative externalities for the police department, such as complaints about innacurracy, decreasing home prices, and misleading symbology and offset geocoding. I firmly believe though that providing this information empowers the public to be more engaged in matters of crime and safety within their communities.

I thank the Troy Police Department for supporting the project in spite of these potential negative consequences, and Chief Tedesco for his continual support of the project. I also thank Capt. Cooney for arranging for all of the media releases. Below is the current online news stories (will update with CW15 if they post a story).

Here I end with a list of reading materials I consider necessary for any other crime analyst pondering the decision whether to public crime statistics online. And I end by again thanking Troy PD for allowing me to publish this data, and BAIR for providing the online service that makes it possible with a zero dollar budget.

Let me know if I should add any papers to the list! Privacy implications (such as this work by Michael Leitner and colleagues) might be worth a read as well for those interested. See my geomasking tag at CiteUlike for various other references.

Viz. JTC Flow lines – Paper for ASC this fall

Partly because I would go crazy if I worked only on my dissertation, I started a paper about visualizing JTC flow lines awhile back, and I am going to present what I have so far at the American Society of Criminology (ASC) meeting at Atlanta this fall.

My paper is still quite rough around the edges (so not quite up for posting to SSRN), but here is the current version. This actually started out as an answer I gave to a question on the GIS stackexchange site, and after I wrote it up I figured it would be worthwhile endeavor to write an article. Alasdair Rae has a couple of viz. flow data papers currently, but I thought I could extend those papers and write for a different audience of criminologists using journey to crime (JTC) data.

As always, I would still appreciate any feedback. I’m hoping to send this out to a journal in the near future, and so far I have only goated one of my friends into reviewing the paper.

Some more about black backgrounds for maps

I am at it again discussing black map backgrounds. I make a set of crime maps for several local community groups as part of my job as a crime analyst for Troy PD. I tend to make several maps for each group, seperating out violent, property and quality of life related crimes. Within each map I try to attempt to make a hierarchy between crime types, with more serious crimes as larger markers and less severe crimes as smaller markers.

Despite critiques, I believe the dark background can be useful, as it creates greater contrast for map elements. In particular, the small crime dots are much easier to see (and IMO in these examples the streets and street name labels are still easy to read). Below are examples of the white background, a light grey background, and a black background for the same map (only changes are the black point marker is changed to white in the black background map, streets and parks are drawn with a heavy amount of transparency to begin with so don’t need to be changed).

Surprisingly to me, ink be damned, even printing out the black background looks pretty good! (I need to disseminate paper copies at these meetings) I think if I had to place the legend on the black map background I would be less thrilled, but currently I have half the page devoted to the map and the other half devoted to a table listing the events and the time they occurred, along with the legend (ditto for the scale bar and the North arrow not looking so nice).

I could probably manipulate the markers to provide more contrast in the white background map (e.g. make them bigger, draw the lighter/smaller symbols with dark outlines, etc.) But, I was quite happy with the black background map (and the grey background may be a useful in-between the two as well). It took no changes besides changing the background in my current template (and change black circles to white ones) to produce the example maps. I also chose those sizes for markers for a reason (so the map did not appear flooded with crime dots, and more severe and less severe crimes were easily distinguished), and so I’m hesistant to think that I can do much better than what I have so far with the white background maps (and I refuse to put those cheesy crime marker symbols, like a hand gun or a body outline, on my maps).

In terms of differentiating between global and local information in the maps, I believe the high contrast dark background map is nice to identify local points, but does not aid any in identifying general patterns. I don’t think general patterns are a real concern though for the local community groups (displaying so many points on the same map in general isn’t good for distinguishing general patterns anyway).

I’m a bit hesitant to roll out the black maps as of yet (maybe if I get some good feedback on this post I will be more daring). I’m still on the fence, but I may try out the grey background maps for the next round of monthly meetings. I will have to think if I can devise a reasonable experiment to differentiate between the maps and whether they meet the community groups goals and/or expectations. But, all together, the black background maps should certainly be given serious consideration for similar tasks. Again, as I said previously, the high contrast with smaller elements makes them more obvious (brings them more to the foreground) than with the white background, which as I show here can be useful in some circumstances.

JQC paper on the moving home effect finally published!

My first solo publication, The moving home effect: A quasi experiment assessing effect of home location on the offence location, after being in the online first que nearly a year, has finally been published in the Journal of Quantitative Criminology 28(4):587-606. It was the oldest paper in the online first section (along with the paper by Light and Harris published on the same day)!

This paper was the fruits of what was basically the equivalent of my Masters thesis, and I would like to take this opportunity to thank all of the individuals whom helped me with the project, as I accidently ommitted such thanks from the paper (entirely my own fault). I would like to thank my committee members, Rob Worden, Shawn Bushway, and Janet Stamatel. I would also like to thank Robert Apel and Greg Pogarsky for useful feedback I had recieved on in class papers based on the same topic, as well as the folks in the Worden meeting group (for not only feedback but giving me motivation to do work so I had something to say!)

Rob Worden was the chair of my committee, and he deserves extra thanks not only for reviewing my work, but also for giving me a job at the Finn Institute, which otherwise I would have never had access to such data and opportunity to conduct such a project. I would also like to give thanks to the Syracuse PD and Chief Fowler for letting me use the data and reveal the PD’s identity in the publication.

I would also like to thank Alex Piquero and Cathy Widom for letting me make multiple revisions and accepting the paper for publication. For the publication itself I recieved three very excellent and thoughtful peer reviews. The excellence of the reviews were well above the norm for feedback I have otherwise encountered, and demonstrated that the reviewers not only read the paper but read it carefully. I was really happy with the improvements as well as how fair and thoughtful the reviews were. I am also very happy it was accepted for publication in JQC, it is the highest quality venue I would expect the paper to be on topic at, and if it wasn’t accepted there I was really not sure where I would send it otherwise.

In the future I will publish pre-prints online, so the publication before editing can still be publicly available to everyone. But, if you can not get a copy of this (or any of the other papers I have co-authored so far) don’t hesitate to shoot me an email for a copy of the off-print. Hopefully I have some more work to share in the new future on the blog! I currently have two papers I am working on with related topics, one with visualizing journey to crime flow data, and another paper with Emily Owens and Matthew Feedman of Cornell comparing journey to work data with journey to crime data.

For a teaser for this paper here is the structured abstract from the paper and a graph demonstrating my estimated moving home effect.

This study aims to test whether the home location has a causal effect on the crime location. To accomplish this the study capitalizes on the natural experiment that occurs when offender’s move, and uses a unique metric, the distance between sequential offenses, to determine if when an offender moves the offense location changes.

Using a sample of over 40,000 custodial arrests from Syracuse, NY between 2003 and 2008, this quasi-experimental design uses t test’s of mean differences, and fixed effects regression modeling to determine if moving has a significant effect on the distance between sequential offenses.

This study finds that when offenders move they tend to commit crimes in locations farther away from past offences than would be expected without moving. The effect is rather small though, both in absolute terms (an elasticity coefficient of 0.02), and in relation to the effect of other independent variables (such as the time in between offenses).

This finding suggests that the home has an impact on where an offender will choose to commit a crime, independent of offence, neighborhood, or offender characteristics. The effect is small though, suggesting other factors may play a larger role in influencing where offenders choose to commit crime.

Co-maps and Hot spot plots! Temporal stats and small multiple maps to visualize space-time interaction.

One of the problems with visualizing and interpreting spatial data is that there are characteristics of the geographical data that are hard to display on a static, two dimensional map. Friendly (2007) makes the pertinent distinction between map and non-map based graphics, and so the challenge is to effectively interweave them. One way to try to overcome this is to create graphics intended to supplement the map based data. Below I give two examples pertinent to analyzing point level crime patterns with attached temporal data, co-maps (Brunsdon et al., 2009) and the hot spot plot (Townsley, 2008).


The concept of co-maps is an extension of co-plots, a visualization technique for small multiple scatterplots originally introduced by William Cleveland (1994). Co-plots are in essence a series of small multiples scatterplots in which the visualized scatter plot is conditioned on a third (or potentially fourth) variable. What is unique about co-plots are though the conditioning variable(s) is not mutually exclusive between categories, so the conditions overlap.

The point of co-plots is in general to see if the relationship between two variables has an interaction with a third continuous variable. When the conditioning variable is continuous, we wouldn’t expect the interaction to change dramatically with discrete cut-offs of the continuous variable, so we want to examine the interaction effect at varying levels of the conditioning variable. It is also useful in instances in which the data is sparse, and you don’t want to introduce artifactual relationships by making arbitrary cut-offs for the conditioning variable.

Besides the Cleveland paper cited (which is publicly available, link in citations at bottom of post), there are some good examples of coplot scatterplots from the R graphical manual.

Brunsdon et al. (2009) extend the concept to analyzing point patterns, when time is the conditioning variable. Also because the geographic data are numerous, they apply kernel density estimation (kde) to visualize the results (instead of a sea of overlapping points). When visualizing geographic data, too many points are common, and the solutions to visualizing the data are essentially the same as people use for scatterplots (this thread at the stats site gives a few resources and examples concerning that). Below I’ve copied a picture from Brusdon et al., 2009 to show it applied to crime data.

Although the example is conditional on temperature (instead of time), it should be easy to see how it could be extended to make the same plot conditional on time. Also note the bar graph at the top denotes the temperature range, with the lowest bar corresponding to the graphic that is in the panel on the bottom left.

Also of potential interest, the same authors applied the same visualization technique to reported fires in another publication (Corcoran et al., 2007).

the hot spot plot

Another similarly motivated graphical presentation of the interaction of time and space is the hot-spot plot proposed by Michael Townsley (2008). Below is an example.

So the motivation here is having coincident graphics simulataneously depicting long term temporal trends (in a sparkline like graphic at the top of the plot), spatial hot spots depicted using kde, and a lower bar graphic depicting hourly fluctuations. This allows one to identify spatial hot spots, and then quickly assess their temporal nature. The example from the Townsley article I give is a secondary plot showing zoomed in locations of several analyst chosen hot spots, with the cut out remaining events left as a baseline.

Some food for thought when examing space-time trends with point pattern crime data.


Crime Mapping article library at CiteULike

I use the online reference library, CiteULike, to organize my personal bibliography. I have created a group within CiteULike specifically focused on crime mapping relevant articles, and the group is named Crime Mapping. I typically post relevant articles that I place in my own library, as well as suggestions that are placed on the Geography and Crime google group forum. At the moment there is also one other CiteULike member that has posted articles to the group as well, and we have a total of 135 articles as of January, 2012.

Although you need a CiteULike account to add to the library, even without a profile you can still browse the library. If you have any suggestions feel free to either make a comment here or shoot me an email if you can’t post yourself. Also I’m sure the library could use some better thought into the tags for each article, so feel free to update, add, or re-tag any articles currently in the library.

Crime Mapping Friendly Geography Journals

I’m a Criminal Justician/Criminologist by degree, but I think it is important for any academic to be aware of developments not only within their own field, but in outside disciplines as well. It is also good to know of novel outlets with which to publish your work.

Here I have listed a few geography oriented journals I believe any Criminologist interested in crime mapping should be cogniscent of. I chose these for mainly two reasons; 1) the frequency with which they publish directly pertient criminological research, 2) my perceived quality of the journals and articles that are contained within (e.g. it is good just to read articles in the journal in general given their quality). I also list some examples of (mostly) recent criminological work within each journal.

Applied Geography

The Professional Geographer

There was recently a special issue, that was devoted to crime mapping topics. See the below article and the subsequent pertinent articles in that same issue.

Also for other examples see

Annals of the Association of American Geographers

Of course this is not an exhaustive list, but you can check out the articles I have posted in my citeulike library for other hints at crime mapping relevent journals to watch out for (you can search for specific journal titles in the library, see this example of searching for Applied Geography in my library). Also check out the crime mapping citeulike library I upload content to regularly (this will have a more specific crime mapping focus than my general library, although they largely overlap).

If you think I’ve done some injustive leaving a journal off the list, let me know in the comments!