My experience blogging in 2012

I figured I would write a brief post about my experience blogging. I created this blog and published my first post in December of 2011. Since then, in 2012, I published 30 blog posts, and totaled 7,200 views. While I thought the number was quite high (albeit a bit dissapointing compared to the numbers of Larry Wasserman), it is still many more people than would have listened to what I had to say if I didn’t write a blog. When starting out I averaged under 10 views a day, but throughout the year it steadily grew, and now I average about 30 views per day. The post that had the most traffic in one day was When should we use a black background for a map?, and that was largely because of some twitter traffic (a result of Steven Romalewski tweeting it and then it being re-tweeted by Kenneth Field), and it had 73 views.

I started the blog because I really loved reading alot of others blogs, and so I hope to encourage others to do so as well. It is a nice venue to share work and opinions for an academic, as it is more flexible and can be less formal than articles. Also much of what I write about I would just consider helpful tips or generic discussion that I wouldn’t get to discuss otherwise (SPSS programming and graph tips will never make it into a publication). One of my main motivations was actually R-Bloggers and the SAS blog roll; I would like a similarly active community for SPSS, but there is none really that I have found outside of the NABBLE forum (some exceptions are Andy Field, The Analysis Factor, Jon Peck and these few posts by a Louis K I only found through the labyrinth that is the IBM developerworks site (note I think you need to be signed in to even see that site), but they certainly aren’t very active and/or don’t write much about SPSS). I assume the best way to remedy that is to lead by example! Most of my more popular posts are ones about SPSS, and I frequently get web-traffic via general google searches of SPSS + something else I blogged about (hacking the template and comparing continuous distributions are my two top posts).

Also the blog is also just another place to highlight my academic work and bring more attention to it. WordPress tells me how often someone clicks a link on the blog, and someone has clicked the link to my CV close to 40 times since I’ve made the blog. Hopefully I have some pre-print journal articles to share on the blog in the near future (as well as my prospectus). My post on my presentation at ASC did not generate much traffic, but I would love to see a similar trend for other criminologists/criminal justicians in the future. My work isn’t perfect for sure, but why not get it out there at least for it to be judged and hopefully get feedback.

I would like to blog more, and I actively try to write something if I haven’t in a few weeks, but I don’t stress about it too much. I certainly have an infinite pool of posts to write about programming and generating graphs in SPSS. I have also thought about talking about historical graphics in criminology and criminal justice, or generally talking about some historical and contemporary crime mapping work. Other potential posts I’d like to write about are a more formal treatment about why I loathe most difference-in-differences designs, and perhaps about the sillyness that can ensue when using null-hypothesis significance testing to determine racial bias. But they will both take more careful elaboration on, so might not be anytime soon.

So in short, SPSSer’s, crime mapper’s, criminologist’s/criminal justician’s, I want you to start blogging, and I will eagerly consume your work (and in the meantime hopefully produce some more useful stuff on my end)!

Some more about black backgrounds for maps

I am at it again discussing black map backgrounds. I make a set of crime maps for several local community groups as part of my job as a crime analyst for Troy PD. I tend to make several maps for each group, seperating out violent, property and quality of life related crimes. Within each map I try to attempt to make a hierarchy between crime types, with more serious crimes as larger markers and less severe crimes as smaller markers.

Despite critiques, I believe the dark background can be useful, as it creates greater contrast for map elements. In particular, the small crime dots are much easier to see (and IMO in these examples the streets and street name labels are still easy to read). Below are examples of the white background, a light grey background, and a black background for the same map (only changes are the black point marker is changed to white in the black background map, streets and parks are drawn with a heavy amount of transparency to begin with so don’t need to be changed).

Surprisingly to me, ink be damned, even printing out the black background looks pretty good! (I need to disseminate paper copies at these meetings) I think if I had to place the legend on the black map background I would be less thrilled, but currently I have half the page devoted to the map and the other half devoted to a table listing the events and the time they occurred, along with the legend (ditto for the scale bar and the North arrow not looking so nice).

I could probably manipulate the markers to provide more contrast in the white background map (e.g. make them bigger, draw the lighter/smaller symbols with dark outlines, etc.) But, I was quite happy with the black background map (and the grey background may be a useful in-between the two as well). It took no changes besides changing the background in my current template (and change black circles to white ones) to produce the example maps. I also chose those sizes for markers for a reason (so the map did not appear flooded with crime dots, and more severe and less severe crimes were easily distinguished), and so I’m hesistant to think that I can do much better than what I have so far with the white background maps (and I refuse to put those cheesy crime marker symbols, like a hand gun or a body outline, on my maps).

In terms of differentiating between global and local information in the maps, I believe the high contrast dark background map is nice to identify local points, but does not aid any in identifying general patterns. I don’t think general patterns are a real concern though for the local community groups (displaying so many points on the same map in general isn’t good for distinguishing general patterns anyway).

I’m a bit hesitant to roll out the black maps as of yet (maybe if I get some good feedback on this post I will be more daring). I’m still on the fence, but I may try out the grey background maps for the next round of monthly meetings. I will have to think if I can devise a reasonable experiment to differentiate between the maps and whether they meet the community groups goals and/or expectations. But, all together, the black background maps should certainly be given serious consideration for similar tasks. Again, as I said previously, the high contrast with smaller elements makes them more obvious (brings them more to the foreground) than with the white background, which as I show here can be useful in some circumstances.

The leaning tower optical illusion: Is it applicable to statistical graphics?

 

 

Save in the memory banks whether the slope of the lines in the left hand panel appear similar, smaller or larger than the slope of the lines in the right hand panel.

I enjoy reading about optical illusions, both purely because I think they are neat and there applicability to how we present and perceive information in statistical graphics. A few examples I am familiar with are;

  • The Rubin Vase optical illusion in which it is difficult to distinguish between what object is the background and which is the foreground. This is applicable to making clear background/foreground seperation between grid lines and chart elements.
  • Change blindness, which makes it difficult to interpret animated graphics that do not have smooth, continous transitions between chart states.
  • Mach bands, where the color of an object is perceived differently given the context of the surrounding colors. I recently came across one of the most dramatic examples of this at the very cool mighty optical illusion site. I actually went and edited the image in that example to make sure there was no funny business it was so dramatic an effect! Image included below.
 

 

I was recently pointed to a new (to me) example of an optical illusion, the leaning tower illusion, in a paper by Kingdom, Yoonessi & Gheorghiu (2007) (referred via the Freakonometrics blog).

 

 

Although I suggest to read the article (it is very brief) – to sum it up both pictures above are identical, although the tower on the right appears to be leaning more to the right. Although the pictures are seperate (and have some visual distinction) our minds interpret them in the same “plane”. And hence objects that are further away in the distance should not be parallel but should actually converge within the image.

Off-the-cuff this reminded me of the Ponzo illusion, where our minds know that the lines are still running parallel, and our perception of other surrounding elements changes conditional on that dominant parallel lines pattern. Here is another good example of this from the mighty optical illusions site (actually I did not know the name of this effect – and when I googled subway tile illusion this is the site that came up – and I’m glad I found it!)

Is this applicable to statistical graphics though? One of the later images in the Perception article appear to be potentially more reminiscent of a small multiple line chart (and we all know I strongly advocate for the use of small multiple charts).

 

 

We do know that interpreting the distance between sloping lines is difficult (as elaborated on in some of Cleveland’s work), but this is different in that potentially our perception of the parallelness of lines between panels in a small multiple is distorted based the directions of lines within the panel. Off-hand though we may expect that the context doesn’t exactly carry-over, there is no visual schematic in 2d statistical graphics that lines are running further from our perspective. So to test this out I attempted to create some settings in small multiple line panels that might cause similar optical illusions.

So, going back to the picture at the beginning of the article, here are those same lines superimposed on the original picture. My personal objectivity to tell if these result in visual distortions is gone at this point, but at best I could only conjure up perhaps some slight distortion between panels (which is perhaps no worse than our ability to effectively perceive slopes accurately anyway).

I think along these lines one could come up with some more examples where between panel comparisons for line graphs in small multiples produce such distortions, but I was unable to produce anything compelling in some brief tries (so let me know if you come across any examples where such distortions occur!) Simply food for thought though at this point.

I do think though that the Ponzo scheme can be illustrated with essentially the same graphic.

 

 

It isn’t as dramatic as the subway tile example, but I do think it appears the positive sloping line where the negative sloping lines converge at the top of the image appears larger than the line in space and the bottom right of the image.

I suspect this could actually occur in real life graphics in which we have error bars superimposed on a graph with several lines of point estimates. If the point estimates start at a wide interval and then converge, it may produce a similar illusion that the error bars appear larger around the point estimates that are closer together. Again though I produced nothing real compelling in my short experimentation.

Using the edge element in SPSS inline GPL with arbitrary X,Y coordinates

This post is intended to be a brief illustration of how one can utilize the edge element in SPSS’s GGRAPH statements to produce vector flow fields. I post this because the last time I consulted the GGRAPH manual (which admittedly is probably a few versions olds) they only have examples of SPSS’s ability to utilize random graph layouts for edge elements. Here I will show how to specify the algebra for edge elements if you already have X and Y coordinates for the beginning and end points for the edges. Also to note if you want arrow heads for line elements in your plots you need to utilize edge elements (the main original motivation for me to figure this out!)

So here is a simple example of utilizing coordinate begin and end points to specify arrows in a graph, and below the code is the image it produces.

DATA LIST FREE/ X1 X2 Y1 Y2 ID.
BEGIN DATA
1 3 2 4 1
1 3 4 3 2
END DATA.

GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=X1 Y1 X2 Y2 ID MISSING=LISTWISE REPORTMISSING=NO
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: X1=col(source(s), name("X1"))
  DATA: Y1=col(source(s), name("Y1"))
  DATA: X2=col(source(s), name("X2"))
  DATA: Y2=col(source(s), name("Y2"))
  DATA: ID=col(source(s), name("ID"))
  GUIDE: axis(dim(1), label("X"))
  GUIDE: axis(dim(2), label("Y"))
  ELEMENT: edge(position((X1*Y1)+(X2*Y2)), shape(shape.arrow))
END GPL.

The magic happens in the call to the edge element, specifically the graph algebra position statement of (X1*Y1)+(X2*Y2). I haven’t read Wilkinson’s Grammar of Graphics, and I will admit with SPSS’s tutorial on graph algebra at the intro to the GPL manual it isn’t clear to me why this works. I believe the best answer I can say is that different elements have different styles to specify coordinates in the graph. For instance interval elements (e.g. bar charts) can take a location in one dimension and an interval in another dimension in the form of X*(Y1+Y2) (see an example in this post of mine on Nabble – I also just found in my notes an example of specifying an edge element in the similar manner to the interval element). This just happens to be a valid form to specify coordinates in edge elements when you aren’t using one of SPSS’s automatic graph layout rendering. I guess it is generically of the form FromNode(X*Y),ToNode(X*Y), but I haven’t seen any documentation for this and all of the examples I had seen in the reference guide utilize a different set of nodes and edges, and then specify a specific type of graph layout.

Here is another example visualizing a vector flow field. Eventually I would like to be able to superimpose such elements on a map – but that appears to not yet be possible in SPSS.

set seed = 10.
input program.
loop #i = 0 to 10.
loop #j = 0 to 10.
compute X = #i.
compute Y = #j.
*compute or_deg = RV.UNIFORM(0,360).
end case.
end loop.
end loop.
end file.
end input program.
dataset name sim.
execute.

*making orientation vary directly with X & Y attributes.

compute or_deg = 18*X + 18*Y.

*now to make the edge I would figure out the X & Y coordinate with alittle distance added (lets say .01) based on the orientation.
COMPUTE #pi=4*ARTAN(1).
compute or_rad = (or_deg/180)*#pi.
compute distance = .7.
execute.

compute X2 = X + sin(or_rad)*distance.
compute Y2 = Y + cos(or_rad)*distance.
execute.

DATASET ACTIVATE sim.
* Chart Builder.
GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=X Y X2 Y2 or_deg MISSING=LISTWISE REPORTMISSING=NO
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: X=col(source(s), name("X"))
  DATA: Y=col(source(s), name("Y"))
  DATA: X2=col(source(s), name("X2"))
  DATA: Y2=col(source(s), name("Y2"))
  DATA: or_deg=col(source(s), name("or_deg"))
  GUIDE: axis(dim(1), label("X"))
  GUIDE: axis(dim(2), label("Y"))
  ELEMENT: edge(position((X*Y)+(X2*Y2)), shape(shape.arrow))
END GPL.

You can then use other aesthetics in these charts same as usual (color, size, transparency, etc.)

Presentation at ASC – November, 2012

At the American Society of Criminology conference in Chicago in a few weeks I will be presenting (I can’t link to the actual presentation it appears, but you can search the program for Wheeler and my session will come up). Don’t take this as a final product, but I figured I would put out there the working paper/chapters of my dissertation that are the motivation for my presentation and my current set of slides.

Here is my original abstract I submitted a few months ago, The title of the talk is The Measurement of Small Place Correlates of Crime;

This presentation addresses several problems related with attempting to identify correlates of crime at small units of analysis, such as street segments. In particular the presentation will focus on articulating what we can potentially learn from smaller units of analysis compared to larger aggregations, and relating a variety of different measures of the built environment and demographic characteristics of places to theoretical constructs of interest to crime at places. Preliminary results examining the discriminant and convergent validity of theoretical constructs pertinent to theories for the causes of crime using data from Washington, D.C. will be presented.

This was certainly an over-ambitious abstract (I was still in the process of writing my prospectus when I submitted it). The bulk of the talk will be focused on “What we can learn from small units of analysis?”, and as of now after that as time allows I will present some illustrations of the change of support problem. Sorry to dissapoint, but nothing about convergent or divergent validity of spatial constructs will be presented (I have done no work of interest yet, and I don’t think I would have time to present any findings in anymore than a superficial manner anyway).

Note don’t be scared off by how dull the working paper is, the presentation will certainly be more visual and less mathematical (I will need to update my dissertation to incorporate some more graphical presentations).

Maps and graphis at the end of the talk demonstrating the change of support problem are still in the works (and I will continue to update the presentation on here). Here is a preview though of the first map I made that demonstrates how D.C. disseminates geo-date aggregated and snapped to street segments, making it problematic to mash up with census data.

  

 

The presentation time is on Friday at 9:30, and I’m excited to see the other presentations as well. It looks like to me that Pizarro et al.’s related research was recently published in Justice Quarterly, so if you don’t care for my presentation come to see the other presenters!

JQC paper on the moving home effect finally published!

My first solo publication, The moving home effect: A quasi experiment assessing effect of home location on the offence location, after being in the online first que nearly a year, has finally been published in the Journal of Quantitative Criminology 28(4):587-606. It was the oldest paper in the online first section (along with the paper by Light and Harris published on the same day)!

This paper was the fruits of what was basically the equivalent of my Masters thesis, and I would like to take this opportunity to thank all of the individuals whom helped me with the project, as I accidently ommitted such thanks from the paper (entirely my own fault). I would like to thank my committee members, Rob Worden, Shawn Bushway, and Janet Stamatel. I would also like to thank Robert Apel and Greg Pogarsky for useful feedback I had recieved on in class papers based on the same topic, as well as the folks in the Worden meeting group (for not only feedback but giving me motivation to do work so I had something to say!)

Rob Worden was the chair of my committee, and he deserves extra thanks not only for reviewing my work, but also for giving me a job at the Finn Institute, which otherwise I would have never had access to such data and opportunity to conduct such a project. I would also like to give thanks to the Syracuse PD and Chief Fowler for letting me use the data and reveal the PD’s identity in the publication.

I would also like to thank Alex Piquero and Cathy Widom for letting me make multiple revisions and accepting the paper for publication. For the publication itself I recieved three very excellent and thoughtful peer reviews. The excellence of the reviews were well above the norm for feedback I have otherwise encountered, and demonstrated that the reviewers not only read the paper but read it carefully. I was really happy with the improvements as well as how fair and thoughtful the reviews were. I am also very happy it was accepted for publication in JQC, it is the highest quality venue I would expect the paper to be on topic at, and if it wasn’t accepted there I was really not sure where I would send it otherwise.

In the future I will publish pre-prints online, so the publication before editing can still be publicly available to everyone. But, if you can not get a copy of this (or any of the other papers I have co-authored so far) don’t hesitate to shoot me an email for a copy of the off-print. Hopefully I have some more work to share in the new future on the blog! I currently have two papers I am working on with related topics, one with visualizing journey to crime flow data, and another paper with Emily Owens and Matthew Feedman of Cornell comparing journey to work data with journey to crime data.

For a teaser for this paper here is the structured abstract from the paper and a graph demonstrating my estimated moving home effect.

Objectives
This study aims to test whether the home location has a causal effect on the crime location. To accomplish this the study capitalizes on the natural experiment that occurs when offender’s move, and uses a unique metric, the distance between sequential offenses, to determine if when an offender moves the offense location changes.

Methods
Using a sample of over 40,000 custodial arrests from Syracuse, NY between 2003 and 2008, this quasi-experimental design uses t test’s of mean differences, and fixed effects regression modeling to determine if moving has a significant effect on the distance between sequential offenses.

Results
This study finds that when offenders move they tend to commit crimes in locations farther away from past offences than would be expected without moving. The effect is rather small though, both in absolute terms (an elasticity coefficient of 0.02), and in relation to the effect of other independent variables (such as the time in between offenses).

Conclusions
This finding suggests that the home has an impact on where an offender will choose to commit a crime, independent of offence, neighborhood, or offender characteristics. The effect is small though, suggesting other factors may play a larger role in influencing where offenders choose to commit crime.

Using Bezier curves to draw flow lines

As I talked about previously, great circle lines are an effective way to visualize flow lines, as the bending of the arcs creates displacement among over-plotted lines. A frequent question that comes up though (see an example on GIS.stackexchange and on the flowing data forums) is that great circle lines don’t provide enough bend over short distances. Of course for visualizing journey to crime data (one of the topics I am interested in), one has the problem that most known journey’s are within one particular jurisdiction or otherwise short distances.

In the GIS question I linked to above I suggested to utilize half circles, although that seemed like over-kill. I have currently settled on drawing an arcing line utilizing quadratic Bezier curves. For a thorough demonstration of Bezier curves, how to calculate them, and to see one of the coolest interactive websites I have ever come across, check out A primer on Bezier curves – by Mike "Pomax" Kamermans. These are flexible enough to produce any desired amount of bend (and are simple enough for me to be able to program!) Also I think they are more aesthetically pleasing than irregular flows. I’ve seen some programs use hook like bends (see an example of this flow mapping software from the Spatial Data Mining and Visual Analytics Lab), but I’m not all that fond of that for either aesthetic reasons or for aiding the visualization.

I won’t go into too great of details here on how to calculate them, (you can see the formulas for the quadratic equations from the Mike Kamermans site I referenced), but you basically, 1) define where the control point is located at (origin and destination are already defined), 2) interpolate an arbitrary number of points along the line. My SPSS macro is set to 100, but this can be made either bigger or smaller (or conditional on other factors as well).

Below is an example diagram I produced to demonstrate quadratic Bezier curves. For my application, I suggest placing a control point perpindicular to the mid point between the origin and destination. This creates a regular arc between the two locations, and conditional on the direction one can control the direction of the arc. In the SPSS function provided the user then provides a value of a ratio of the height of the control point to the distance between the origin and destination location (so points further away from each other will be given higher arcs). Below is a diagram using Latex and the Tikz library (which has a handy function to calulate Bezier curves).

Here is a simpler demonstration of the controlling the direction and adjusting the control point to produce either a flatter arc or an arc with more eccentricity.

Here is an example displaying 200 JTC lines from the simulated data that comes with the CrimeStat program. The first image are the original straight lines, and the second image are the curved lines using a control point at a height half the distance between the origin and destination coordinate.

Of course, both are most definately still quite crowded, but what do people think? Are my curved lines suggestion benificial in this example?

Here I have provided the SPSS function (and some example data) used to calculate the lines (I then use the ET Geowizards add-on to turn the points into lines in ArcGIS). Perhaps in the future I will work on an R function to calculate Bezier curves (I’m sure they could be of some use), but hopefully for those interested this is simple enough to program your own function in whatever language of interest. I have the starting of a working paper on visualizing flow lines, and I plan on this being basically my only unique contribution (everything else is just a review of what other people have done!)

One could be more fancy as well, and make the curves different based on other factors. For instance make the control point closer to either the origin or destination is the flow amount is assymetrical, or make the control point further away (and subsequently make the arc larger) is the flow is more volumous. Ideas for the future I suppose.

CJ blog watch! Any ones I’m missing?

I follow alot of blogs. Although I don’t personally write alot about criminology or criminal justice related matters (maybe in the future when I have more time or inclination), but I figured I would share some of my favorites and query the crowd for more recommendations.

So a few with general discussion related to criminology and criminal justice matters are;

Both sites are well known criminologists/criminal justicians. I am aware of a few blogs written by current/former police chiefs;

  • Tom Casady’s The Director’s Desk. Tom Casady is currently the director of public safety for Lincoln, Nebraska and was previously the Police Chief at Lincoln’s department for quite some time. Tom is also very active in a variety of criminology/criminal justice organizations (so if you go to a related conference there is a good chance he is around somewhere!)
  • Chief’s Blog by Chief Ramsay of the Duluth Police Dept in Minnesota.

There are also a few that are highly focused on crime mapping & analysis;

  • Location Based Policing by Drew Dasher. He is a crime analyst for the Lincoln Nebraska PD.
  • Saferview – crime, fear and mapping: A blog by a retired police officer who is a student at University College London.
  • Diego Valle-Jones: Although his blog has a wider variety of topics, he has a series of very detailed posts and analysis on violence in Mexico and central american nations. I know crime stats are frequent fodder for generic statistical demonstrations, but this is real insightful analysis. My favorite is his investigation into the validity of homicide data statistics.

Are there others I am missing out on or should know about? Let me know in the comments if you have other suggestions.

FYI – the title of the blog post was motivated by Hans Toch’s new book, Cop Watch.

Making value by alpha maps with ArcMap

I recently finished reading Cynthia Brewer’s Designing better maps: A guide for GIS users. Within the book she had an example of making a bi-variate map legend manually in ArcMap, and then the light-bulb went off in my mind that I could use that same technique to make value by alpha maps in ArcMap.

For a brief intro into what value by alpha maps are, Andy Woodruff (one of the creators) has a comprehensive blog post on them on what they are and their motivation. Briefly though, we want to visualize some variable in a choropleth map, but that variable is measured with varying levels of reliability. Value by alpha maps de-emphasize areas of low reliability in the choropleth values by increasing the transparency of that polygon. I give a few other examples of interest related to mapping reliability in this answer on the GIS site as well, How is margin of error reported on a map?. Essentially those techniques mentioned either only display certain high reliability locations, make two maps, or use technqiues to overlay multiple attributes (like hashings). But IMO the value by alpha maps looks much nicer than the maps with multiple elements, and so I was interested in how to implement them in ArcMap.

What value by alpha maps effectively do is reduce the saturation and contrast of polygons with high alpha blending, making them fade into the background and be less noticable. I presented an applied example of value by alpha maps in my question asking for examples of beautiful maps on the GIS site. You can click through to see further citations for the map and reasons for why I think the map is beautiful. But below I include an image here as well (taken from the same Andy Woodruff blog post mentioned earlier).

Here I will show to make the same maps in ArcMap, and present some discussion about their implementation, in particular suitable choices for the original choropleth colors. Much was already discussed by the value by alpha originators, but I suppose I didn’t really appreciate them until I got my hands alittle dirty and tried to make them myself. Note this question on the GIS site, How to implement value-by-alpha map in GIS? gives other resources for implementing value-by-alpha maps. But as far as I am aware this contribution about how to do them in ArcMap is novel.

Below I present an example displaying the percentage of female heads of households with children (abbreviated PFHH from here on) for 2010 census blocks within Washington, D.C. Here we can consider the reliability of the PFHH dependent on the number of households within the block itself (i.e. we would expect blocks with smaller number of households to have a higher amount of variability in the PFHH). The map below depicts blocks that have at least one household, and so the subsequent PFHH maps will only display those colored polygons (about a third, 2132 out of 6507, have no households).

I chose the example because female headed households are a typical covariate of interest to criminologists for ecological studies. I also chose blocks as they are the smallest unit available from the census, and hence I expected them to show the widest variability in estimates. Below I provide an example on how one might typically display PFHH, while simultaneously incorporating information on the baseline number the maps will be map of.

The first example seperately displays the denominator number of households on the left and the percent of female headed households with children on the right both in a sequential choropleth scheme (darker colors are a higher PFHH and Number of Households).

One can also superimpose the same information on the map. Sun & Wong, 2010 suggest one use cross hatching above the the choropleth colors to depict reliability, but here I will demonstrate using choropleth colors for the baseline number of households and a proportional point symbols for the PFHH. I supplement the map on the right with a scatterplot, that has the number of households on the X axis and the PFHH on the Y axis.

These both do an alright job (if you made me pick one, I think I would pick the side-by-side sets of maps), but lets see if we can do better with value-by-alpha maps! The following tutorial will be broken up into two sections. The first section talks about actually generating the map, and the second section talks about how to make the legend. Neither is difficult, but making the legend is more of a pain in the butt.

How to make the value by alpha map

First one can start out by making the base layer with the desired choropleth classifications and color scheme. Note here I changed what I am visualizing from a sequential color scheme of PFHH to location quotients with only four categories. I will discuss why I did this later on in the post.

Then one can make several copies of that layer (right click -> copy -> paste within hierarchy), based on however many different reliability classifications you want to display. Here I will do 4 different reliability classifications. Note after you make them for management of the TOC it is easier to group them.

Then one uses selection criteria to filter out only those polygons that fall within the specified reliability range. And then sets the transparency for the that level to the desired value.

And voila, you have your value by alpha map. Note if after you make the layers you decide you want a different classification and/or color scheme, you can make the changes to one layer and then apply the changes to all of the other layers.

How to make the legend

Now making the legend is the harder part. If one goes to the layout view, one will see that since in this example one has essentially for layers superimposed on the same map, one has four seperate legend entries. Below is what it looks like with my defaults (plus a vertical rule I have in my map).

What we want in the end is a bivariate scheme, with the PFHH dimension running up and down, and the transparency dimension running from one side to the other (the same as in the example mortality rate map at the beginning of the post). To do this, one has to convert the legends to graphics.

The ungroup the elements so each can be individualy manipulated. Note, sometimes I have to do this operation multiple times.

Then re-arrange the panels and labels into the desired format.

More tedious than making the seperate layers, but not crazy unreasonable if you only have to do it for one (or a small number of maps). If you need to do it for a larger number of maps a better workflow will be needed, like creating a seperate “fake inset” map that replicates the legend, making the legend in a seperate tool, or just making the map entirely in a program where alpha blending is more readily incorporated. For instance in statistical packages it is typically a readily available encoding that can be added to a graphic (they also will allow continous color ramps and continous levels of transparency).

And voila, here is the final map. To follow is some discussion about choosing color schemes and whether you should use a black background or not.

Some discussion about color schemes

The Roth et al. (2010) paper in the cartographic journal and Andy Woodruff’s blog post I cited at the beginning of this post initially talked about color schemes and utilizing a black background, but I didn’t really appreciate the complexity of this choice until I went and made a value-by-alpha map of my own. In the end I decided to use location quotients to display the data, as the bivariate color scheme provides further contrast. I feel weird using a bivariate color scheme for a continous scale (hence the conversion to location quotients), but I feel like I should get over that. Everything has its time and place, and set rules like that aren’t good for anyone but bureaucrats or the mindless.

I certainly picked a complex dataset to start with, and the benifits of the value by alpha map over the two side by side maps (if any) are slight. I suspect why mine don’t look quite as nice as the ones created by Roth, Woodruff and company are partially due to the greater amount of complexity. The map with the SatScan reliabilities I noted as one of my favorite maps is quite striking, but it is partly due to the relibaility having a very spatially contiguous pattern (although the underlying cancer mortality rate map is quite spatially heterogenous). Here the spatial regularity is much weaker, in either the pattern being mapped or the reliability thresholds I had chosen. It does produce a quite pretty map though, FWIW.

For reference, here is the same map utilizing a black background. The only thing different in this map is that the most transparent layer is now set to 80% transparency instead of 90% (it was practically invisible at 90% with black as the modifying background color). Also it was necessary to do the fake inset map for a legend I talked about earlier with black as the background color. This is because the legend generated by ArcGIS always has white as the modifying color. If you refer back to the map with white as the modifying color, you can tell this produces greater contrast among the purples (the location quotient 2.1 – 4 for fully opaque and 4.1 – 12.6 for 40% transparent with white as the modifying color appear very similar).

The Roth Cartographic journal article gives other bivariate and nominal color scheme suggestions, you should take their advice. Hopefully in the future it will be simpler to incorporate bivariate color schemes in ArcMap, as it would make the process much simpler (and hence more useful for exploratory data analysis).

I would love it if people point me to other examples in which value by alpha maps are useful. I think in theory it is a good idea, but the complexity intoduced in the map is a greater burden than I intially estimated until I made a few. I initially thought this would be useful for presenting the results of geographically weighted regression or perhaps cancer atlas maps in general (where sometimes people just filter out results below some population threshold). But maybe not given the greater complexity introduced.

When should we use a black background for a map?

Some of my favorite maps utilize black (or dark) backgrounds. For some examples;

 

 

Steven Romalewski offers a slight critique of them recently in his blog post, Mapping NYC stop and frisks: some cartographic observations;

I know that recently the terrific team at MapBox put together some maps using fluorescent colors on a black background that were highly praised on Twitter and in the blogs. To me, they look neat, but they’re less useful as maps. The WNYC fluorescent colors were jarring, and the hot pink plus dark blue on the black background made the map hard to read if you’re trying to find out where things are. It’s a powerful visual statement, but I don’t think it adds any explanatory value.

I don’t disagree with this, and about all I articulate in their favor so far is essentially “well lit places create a stunning contrast with the dark background” while white background maps just create a contrast and are not quite as stunning!

I think the proof of a black backgrounds usefulness can be seen in the example value-by-alpha maps and the flow maps of James Chesire, where a greater amount of contrast is necessary. IMO in the value by alpha maps the greater contrast is needed for the greater complexity of the bivariate color scheme, and in Chesire’s flow maps it is needed because lines frequently don’t have enough areal gurth to be effectively distinguished from the background.

I couldn’t find any more general literature on the topic though. It doesn’t seem to be covered in any of the general cartography books that I have read. Since it is really only applicable to on-screen maps (you certainly wouldn’t want to print out a map with a black background) perhaps it just hasn’t been addressed. I may be looking in the wrong place though, some text editors have a high contrast setting where text is white on a dark background (for likely the same reasons they look nice in maps), so it can’t be that foreign a concept to have no scholarly literature on the topic.

So in short, I guess my advice is utilize a black background when you want to highly focus attention on the light areas, essentially at the cost of greatly diminishing the contrast with other faded elements in the map. This is perhaps a good thing for maps intended as complex statistical summaries, and the mapnificient travel times map is probably another good example where high focus in one area is sufficient and other background elements are not needed. I’m not sure though for choropleth maps black backgrounds are really needed (or useful), and any more complicated thematic maps certainly would not fit this bill.

To a certain extent I wonder what lessons from black backgrounds can be applied to the backgrounds of statistical graphics more generally. Leave me some comments if you have any thoughts or other examples of black background maps!