Nearby Analysis Example (Excel)

The other day on Twitter I made a comment to Joel Caplan about how I would solve analysis with multiple buffers and not counting overlaps. A typical GIS workflow would go:

  • take your points of interest and create buffers
  • join the points to the buffer polygons, and get a count of the crimes of interest

I often do the analysis in different way though – I do a spatial join of the location of interest to the point features, in which you get a field that is the distance to the nearest feature, and then subsequently do analysis on that distance field. In that workflow, it makes it much easier to change the size of the buffer for sensitivity analysis, or conduct analysis on different subsets of data.

To start I am going to be working with a set of robberies in Dallas (from the open data, not quite 16k), and DART stations (n = 74). (DART is the Dallas above ground train.) You can access the Excel file I am doing analysis with here. Using excel as I often suggest it for undergrads/masters for projects who aren’t up to speed with programming – so this is a good illustration of that buffer analysis workflow.

Distance to Nearest

To start, I would typically use a GIS system (or R/python/SQL) to calculate the distance to a nearest object. But I don’t have access to Arc anymore, so I am going to show you a way to do this right in Excel. This only works for projected data (not latitude/longitude), and calculating distance from point-to-point.

So first, to figure out the distance between two points in Euclidean space, we can just use the Pythagorean theorem that you learned in grade school, Distance = sqrt( (x1 - x2)^2 + (y1 - y2)^2 ). Because we are doing this in an Excel spreadsheet and want to find the nearest Dart station to the robbery, we will use a little array formula magic. I named my table of Dart locations Dart, and so the array formula to find the nearest distance in Excel is:

=MIN( SQRT( (B2 - Dart[X])^2  + (C2 - Dart[Y])^2))

When you enter this formula, hit Ctrl + Shift + Enter, else it just returns the distance to the first Dart station. If you did this right, you will see the formula have {} brackets around it in the formula bar.

Distance will be defined in whatever the projected units are in – here they are in feet. But by using MIN with the array, it returns the distance to the nearest station. To get the ID of the associated station, we need to do a similar formula (and this only works with numeric ID fields). You can basically do an array IF formula, and the only station this is true for will be the MAX of that array. (Again hit Ctrl + Shift + Enter when finishing off this cell calculation instead of just Enter.)

=MAX(IF(F2=SQRT((B2 - Dart[X])^2  + (C2 - Dart[Y])^2), Dart[DartID],0))

User beware – this runs super fast on my machine (surprisingly) but it is quite a few computations under the hood. For much larger data again use a GIS/database/Stat program to do these calculations.

Using Pivot Tables to do Buffer Analysis

So now that we have those distance fields, it is easy to do a formula along the lines of you want to count up the robberies within 1000 feet. You can do another IF formula that is something like IF([@Distance] < 1000, 1, 0).

And then go ahead and make a pivot table, and put the DartID as the rows, and the Within distance field you just made as the values (to sum in the pivot table).

Then bam, you have your buffer analysis. Here I sorted the pivot table so you can see the highest crime Dart is 12. (I haven’t looked up which one this is, you can use Excel though to map them out).

So say you wanted to change the buffer size? It is as simple as changing out the 1000 in the prior formula to a different value. One thing I like to do though is to make a lookup table to define different bins. You can see I named that table BuffTable (naming the tables makes it easier to refer to them later in array formulas, also I shifted down the pivot table to not accidently overwrite it later).

And now I use a combination of MATCH to find what row it falls into for this table, and INDEX to return the row label I want. So first I have =MATCH([@Distance],BuffTable[Within Bounds],1). This is very similar to VLOOKUP, and will match to the row that the distance is less than.

This just returns the row number of the match though – I want to pipe in those nicer labels I made. To do that, I nest the match results within index, =INDEX(BuffTable, MATCH([@Distance],BuffTable[Within Bounds],1)+1, 2). And voila, I get my binned data.

Now we can do our pivot table so the columns are the new field we just made (make sure to refresh the pivot table).

And we can do our buffer analysis and varying buffers. Just update the tables to however you want the buffers, hit refresh, and everything will be updated. (I should have done the labels so they are ordered a bit more nicely in the pivot table.)

I like this approach for students, as it is easy to pivot/filter on other characteristics as well. Want to get arrest rates around areas? Want to see changes in crimes nearby different DART stations over time? It is just a few formulas/filters and a pivot table away in this format.

Distance to Nearest Analysis for DART stations

Another analysis I think is useful is to look at the cumulative distance analysis. I got this idea from a paper of Jerry Ratcliffe’s.

So what you can do is to round the distance data, e.g. using a formula like this will round the data to every 300 feet.

=ROUND([@Distance]/300,0)*300

And then you can make a pivot table of the rounded counts. Here I also did additional stats to calculate the spatial density of the points, and show the distance decay curve.

Jerry’s paper I linked to looks for change points – I think that idea is somewhat misleading though. It does look like a change point in Jerry’s graphs, but that is a function of the binning I think (see this Xu/Griffiths paper, same method, finer bins, and shows a more smooth decay).

So here I tied the round function to a cell, and again I can just update the value to a different bin size, and everything get auto-updated in the spreadsheet. Here is a bin size of 100 feet, which introduces some volatility in the very nearby locations, but you can see still pretty much follows that smooth distance decay effect.

Actually the Xu/Griffiths paper looks at the street network distance, which I think makes more sense. (And again need a GIS to do that analysis.) The buffer areas can behave funny, and won’t have a direct relationship to the street length exposure, so I think the typical Euclidean analysis can be misleading in some cases. I will need to save that for another blog post though!

Creating an animated heatmap in Excel

I’ve been getting emails recently about the online Carto service not continuing their free use model. I’ve previously used this service to create animated maps heatmaps over time, in particular a heatmap of reported meth labs over time. That map still currently works, but I’m not sure how long it will though. But the functionality can be replicated in recent versions of Excel, so I will do a quick walkthrough of how to make an animated map. The csv to follow along with, as well as the final produced excel file, you can down download from this link.

I split the tutorial into two parts. Part 1 is prepping the data so the Excel 3d Map will accept the data. The second is making the map pretty.

Prepping the Data

The first part before we can make the map in Excel are:

  1. eliminate rows with missing dates
  2. turn the data into a table
  3. explicitly set the date column to a date format
  4. save as an excel file

We need to do those four steps before we can worry about the mapping part. (It took me forever to figure out it did not like missing data in the time field!)

So first after you have downloaded that data, double click to open the Geocoded_MethLabs.csv file in word. Once that sheet is open select the G column, and then sort Oldest to Newest.

It will give you a pop-up to Expand the selection – keep that default checked and click the Sort button.

After that scroll down to the current bottom of the spreadsheet. There are around 30+ records in this dataset that have missing dates. Go ahead and select the row labels on the left, which highlights the whole row. Once you have done that, right click and then select Delete. Again you need to eliminate those missing records for the map to accept the time field.

After you have done that, select the bottom right most cell, L26260, then scroll back up to the top of the worksheet, hold shift, and select cell A1 (this should highlight all of the cells in the sheet that contain data). After that, select the Insert tab, and then select the Table button.

In the pop-up you can keep the default that the table has headers checked. If you lost the selection range in the prior step, you can simply enter it in as =$A$1:$;$26260.

After that is done you should have a nice blue formatted table. Select the G column, and then right click and select Format Cells.

Change that date column to a specific date format, here I just choose the MM/DD/YY format, but it does not matter. Excel just needs to know it represents a date field.

Finally, you need to save the file as an excel file before we can make the maps. To do this, click File in the top left header menu’s, and then select Save As. Choose where you want to save the file, and then in the Save as Type dropdown in the bottom of the dialog select xlsx.

Now the data is all prepped to create the map.

Making an Animated Map

Now in this part we basically just do a set of several steps to make our map recognize the correct data and then make the map look nice.

With the prior data all prepped, you should be able to now select the 3d Map option that you can access via the Insert menu (just to the right of where the Excel charts are).

Once you click that, you should get a map opened up that looks like mine below.

Here it actually geocoded the points based on the address (very fast as well). So if you only have address data you can still create some maps. Here I want to change the data though so it uses my Lat/Lon coordinates. In the little table on the far right side, under Layer 1, I deleted all of the fields except for Lat by clicking the large to their right (see the X circled in the screenshot below). Then I selected the + Add Field option, and then selected my Lng field.

After you select that you can select the dropdown just to the right of the field and set it is Longitude. Next navigate down slightly to the Time option, and there select the DATE field.

Now here I want to make a chart similar to the Carto graph that is of the density, so in the top of the layer column I select the blog looking thing (see its drawn outline). And then you will get various options like the below screenshot. Adjust these to your liking, but for this I made the radius of influence a bit larger, and made the opacity not 100%, but slightly transparent at 80%.

Next up is setting the color of the heatmap. The default color scale uses the typical rainbow, which should be avoided for multiple reasons, one of which is color-blindness. So in the dropdown for colors select Custom, and then you will get the option to create your own color ramp. If you click on one of the color swatches you will then get options to specify the color in a myriad of ways.

Here I use the multi-hue pink-purple color scheme via ColorBrewer with just three steps. You can see in the above screenshot I set the lowest pink step via the RGB colors (which you can find on the color brewer site.) Below is what my color ramp looks like in the end.

Next part we want to set the style of the map. I like the monotone backgrounds, as it makes the animated kernel density pop out much more (see also my blog post, When should we use a black background for a map). It is easy to experiement with all of these different settings though and see which ones you like more for your data.

Next I am going to change the format of the time notation in the top right of the map. Left click to select the box around the time part, and then right click and select Edit.

Here I change to the simpler Month/Year. Depending on how fast the animation runs, you may just want to change it to year. But you can leave it more detailed if you are manually dragging the time slider to look for trends.

Finally, the current default is to show all of the data permanently. There are examples where you may want to do that (see the famous example by Nathan Yau mapping the growth of Wal Mart), but here we do not want that. So navigate back to the Layer options on the right hand side, and in the little tiny clock above the Time field select the dropdown, and change it to Data shows for an instant.

Finally I select the little cog in the bottom of the map window to change the time options. Here I set the animation to run longer at 30 seconds. I also set the transition duration to slightly longer at 5 seconds. (Think of the KDE as a moving window in time.)

After that you are done! You can zoom in the map, set the slider to run (or manually run it forward/backward). Finally you can export the map to an animated file to share or use in presentations if you want. To do that click the Create Video option in the toolbar in the top left.

Here is my exported video


Now go make some cool maps!

IACA Conference 2017 workshop: Monitoring temporal crime trends for outliers (Excel)

This fall at the International Association of Crime Analysts conference I am doing a workshop, Monitoring temporal crime trends for outliers: A workshop using Excel. If you can’t wait (or are not going) I have all my materials already prepared, which you can download here. That includes a walkthrough of my talk/tutorial, as well as a finished Excel workbook. It is basically a workshop to go with my paper, Tables and graphs for monitoring temporal crime trends: Translating theory into practical crime analysis advice.

For some preview, I will show how to make a weekly smoothed chart with error bands:

As well as a monthly seasonal chart:

I use Excel not because I think it is the best tool, but mainly because I think it is the most popular among crime analysts. In the end I just care about getting the job done! (Although I’ve given reasons why I think Excel is more painful than any statistical program.) Even though it is harder to make small multiple charts in Excel, I show how to make these charts using pivot tables and filters, so watching them auto-update when you update the filter is pretty cool.

For those with SPSS I have already illustrated how to make similar charts in SPSS here. You could of course replicate that in R or Stata or whatever if you wanted.

I am on the preliminary schedule currently for Tuesday, September 12th at 13:30 to 14:45. I will be in New Orleans on the 11th, 12th and 13th, so if you want to meet always feel free to send an email to set up a time.