Use circles instead of choropleth for MSAs

We are homeschooling the kiddo at the moment (the plunge was reading by Bryan Caplan’s approach, and seeing with online schooling just how poor middle school education was). Wife is going through AP biology at the moment, and we looked up various job info on biomedical careers. Subsequently came across this gem of a map of MSA estimates from the Bureau of Labor Stats (BLS) Occupational Employment and Wage Stats series (OES).

I was actually mapping some metro stat areas (MSAs) at work the other day, and these are just terrifically bad geo areas to show via a choropleth map. All choropleth maps have the issue of varying size areas, but I never realized having somewhat regular borders (more straight lines) makes the state and county maps not so bad – these MSA areas though are tough to look at. (Wife says it scintillates for her if she looks too closely.)

There are various incredibly tiny MSAs next to giant ones that you will just never see in these maps (no matter what color scheme you use). Nevada confused for me quite a bit, until I zoomed in to see that there are 4 areas, Reno is just a tiny squib.

Another example is Boulder above Denver. (Look closely at the BLS map I linked, you can just make out Boulder if you squint, but I cannot tell what color it corresponds to in the legend.) The outline heavy OES maps, which are mostly missing data, are just hopeless to display like this effectively. Reno could be the hottest market for whatever job, and it will always be lost in this map if you show employment via the choropleth approach. So of course I spent the weekend hacking together some maps in python and folium.

The BLS has a public API, but I was not able to find the OES stats in that. But if you go through the motions of querying the data and muck around in the source code for those queries, you can see they have an undocumented API call to generate json to fill the tables. Then using this tool to convert the json calls to python (thank you Hacker News), I was able to get those tables into python.

I have these functions saved on github, so check out that source for the nitty gritty. But just here quickly, here is a replicated choropleth map, showing the total employees for bio jobs (you can go to here to look up the codes, or run my function bls_maps.ocodes() to get a pandas dataframe of those fields).

# Creating example bls maps
from bls_geo import *

# can check out https://www.bls.gov/oes/current/oes_stru.htm
bio = '172031'
bio_stats = oes_geo(bio)
areas = get_areas() # this takes a few minutes
state = state_albers()
geo_bio = merge_occgeo(bio_stats,areas)

ax = geo_bio.plot(column='Employment',cmap='inferno',legend=True,zorder=2)
state.boundary.plot(ax=ax,color='grey',linewidth=0.5,zorder=1)
ax.set_ylim(0.1*1e6,3.3*1e6)
ax.set_xlim(-0.3*1e7,0.3*1e7)   # lower 48 focus (for Albers proj)
ax.set_axis_off()
plt.show()

And that is not much better than BLSs version. For this data, if you are just interested in looking up or seeing the top metro areas, just doing a table, e.g. above geo_bio.to_excel('biojobs.xlsx'), works just as well as a map.

So I was surprised to see Minneapolis pop up at the top of that list (and also surprised Raleigh doesn’t make the list at all, but Durham has a few jobs). But if you insist on seeing spatial trends, I prefer to go the approach of mapping proportion or graduate circles, placing the points at the centroid of the MSA:

att = ['areaName','Employment','Location Quotient','Employment per 1,000 jobs','Annual mean wage']
form = ['',',.0f','.2f','.2f',',.0f']

map_bio = fol_map(geo_bio,'Employment',['lat', 'lon'],att,form)
#map_bio.save('biomap.html')
map_bio #if in jupyter can render like this

I am too lazy to make a legend, you can check out nbviewer to see an interactive Folium map, which I have tool tips (similar to the hover for the BLS maps).

Forgive my CSS/HTML skills, not sure how to make nicer popups. So you lose the exact areas these MSA cover in this approach, but I really only expect a general sense from these maps anyway.

These functions are general enough for whatever wage series you want (although these functions will likely break when the 2021 data comes out). So here is the OES table for data science jobs:

I feel going for the 90th percentile (mapping that to the 10 times programmer) is a bit too over the top. But I can see myself reasonably justifying 75th percentile. (Unfortunately these agg tables don’t have a way to adjust for years of experience, if you know of a BLS micro product I could do that with let me know!). So you can see here the somewhat inflated salaries for the SanFran Bay area, but not as inflated as many might have you think (and to be clear, these are for 2020 survey estimates).

If we look at map of data science jobs, varying the circles by that 75th annual wage percentile, it looks quite uniform. What happens is we have some real low outliers (wages under 70k), resulting in tiny circles (such as Athen’s GA). Most of the other metro regions though are well over 100k.

In more somber news, those interactive maps are built using Leaflet as the backend, which was create by a Ukranian citizen, Vladimir Agafonkin. We can do amazing things with open source code, but we should always remember it is on the backs of someones labor we are able to do those things.

Some more value-by-alpha maps for D.C. Census Blocks

I’ve made some more value-by-alpha maps for my dissertation for percent non-white population in comparison to percentage of female-headed households for Census blocks in 2010 in D.C. See my first post for some background. The choropleth classes for the percents are chosen according to quintiles of the distributions and the alpha classes are arbitrary (note the alpha class uses households as the baseline in both maps, even though percent non-white uses the population counts).

When making these maps I’ve found that the Color Brewer sequential styles that range two colors work out much better than those that span one color. What happens with the one color sequential themes is that the faded out colors end up being confounded with the lighter colors in the fully opaque ranges. When using the two sequential color schemes (here showing Yellow to Red and Yellow to Blue) it provides greater discrepancy between the classes.


I did not try out the black background for these maps (I thought perhaps it would be a bit jarring in the document have a swath of black stand out). The CUNY Center for Urban Research has some other example value-by-alpha maps for New York City elections in 2013. After some discussion with Steven Romalewski they decided they liked the white background better for there maps, and my quick attempts for these examples I think I agree.

Calendar Heatmap in SPSS

Here is just a quick example of making calendar heatmaps in SPSS. My motivation can be seen from similar examples of calendar heatmaps in R and SAS (I’m sure others exist as well). Below is an example taken from this Revo R blog post.

The code involves a macro that can take a date variable, and then calculate the row position the date needs to go in the calendar heatmap (rowM), and also returns a variable for the month and year, which are used in the subsequent plot. It is brief enough I can post it here in its entirety.


*************************************************************************************.
*Example heatmap.

DEFINE !heatmap (!POSITIONAL !TOKENS(1)).
compute month = XDATE.MONTH(!1).
value labels month
1 'Jan.'
2 'Feb.'
3 'Mar.'
4 'Apr.'
5 'May'
6 'Jun.'
7 'Jul.'
8 'Aug.'
9 'Sep.'
10 'Oct.'
11 'Nov.'
12 'Dec.'.
compute weekday = XDATE.WKDAY(!1).
value labels weekday
1 'Sunday'
2 'Monday'
3 'Tuesday'
4 'Wednesday'
5 'Thursday'
6 'Friday'
7 'Saturday'.
*Figure out beginning day of month.
compute #year = XDATE.YEAR(!1).
compute #rowC = XDATE.WKDAY(DATE.MDY(month,1,#year)).
compute #mDay = XDATE.MDAY(!1).
*Now ID which row for the calendar heatmap it belongs to.
compute rowM = TRUNC((#mDay + #rowC - 2)/7) + 1.
value labels rowM
1 'Row 1'
2 'Row 2'
3 'Row 3'
4 'Row 4'
5 'Row 5'
6 'Row 6'.
formats rowM weekday (F1.0).
formats month (F2.0).
*now you just need to make the GPL call!.
!ENDDEFINE.

set seed 15.
input program.
loop #i = 1 to 365.
    compute day = DATE.YRDAY(2013,#i).
    compute flag = RV.BERNOULLI(0.1).
    end case.
end loop.
end file.
end input program.
dataset name days.
format day (ADATE10).
exe.

!heatmap day.
exe.
temporary.
select if flag = 1.
GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=weekday rowM month
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
 SOURCE: s=userSource(id("graphdataset"))
 DATA: weekday=col(source(s), name("weekday"), unit.category())
 DATA: rowM=col(source(s), name("rowM"), unit.category())
 DATA: month=col(source(s), name("month"), unit.category())
 COORD: rect(dim(1,2),wrap())
 GUIDE: axis(dim(1))
 GUIDE: axis(dim(2), null())
 GUIDE: axis(dim(4), opposite())
 SCALE: cat(dim(1), include("1.00", "2.00", "3.00", "4.00", "5.00","6.00", "7.00"))
 SCALE: cat(dim(2), reverse(), include("1.00", "2.00", "3.00", "4.00", "5.00","6.00"))
 SCALE: cat(dim(4), include("1.00", "2.00", "3.00", "4.00", "5.00",
  "6.00", "7.00", "8.00", "9.00", "10.00", "11.00", "12.00"))
 ELEMENT: polygon(position(weekday*rowM*1*month), color.interior(color.red))
END GPL.
*************************************************************************************.

Which produces this image below. You can not run the temporary command to see what the plot looks like with the entire year filled in.

This is nice to illustrate potential day of week patterns for specific events that only rarely occur, but you can map any aesthetic you please to the color of the polygon (or you can change the size of the polygons if you like). Below is an example I used this recently to demonstrate what days a spree of crimes appeared on, and I categorically colored certain dates to indicate multiple crimes occurred on those dates. It is easy to see from the plot that there isn’t a real strong tendency for any particular day of week, but there is some evidence of spurts of higher activity.

In terms of GPL logic I won’t go into too much detail, but the plot works even with months or rows missing in the data because of the finite number of potential months and rows in the plot (see the SCALE statements with the explicit categories included). If you need to plot multiple years, you either need seperate plots or another facet. Most of the examples show numerical information over every day, which is difficult to really see patterns like that, but it shouldn’t be entirely disgarded just because of that (I would have to simultaneously disregard every choropleth map ever made if I did that!)

Making value by alpha maps with ArcMap

I recently finished reading Cynthia Brewer’s Designing better maps: A guide for GIS users. Within the book she had an example of making a bi-variate map legend manually in ArcMap, and then the light-bulb went off in my mind that I could use that same technique to make value by alpha maps in ArcMap.

For a brief intro into what value by alpha maps are, Andy Woodruff (one of the creators) has a comprehensive blog post on them on what they are and their motivation. Briefly though, we want to visualize some variable in a choropleth map, but that variable is measured with varying levels of reliability. Value by alpha maps de-emphasize areas of low reliability in the choropleth values by increasing the transparency of that polygon. I give a few other examples of interest related to mapping reliability in this answer on the GIS site as well, How is margin of error reported on a map?. Essentially those techniques mentioned either only display certain high reliability locations, make two maps, or use technqiues to overlay multiple attributes (like hashings). But IMO the value by alpha maps looks much nicer than the maps with multiple elements, and so I was interested in how to implement them in ArcMap.

What value by alpha maps effectively do is reduce the saturation and contrast of polygons with high alpha blending, making them fade into the background and be less noticable. I presented an applied example of value by alpha maps in my question asking for examples of beautiful maps on the GIS site. You can click through to see further citations for the map and reasons for why I think the map is beautiful. But below I include an image here as well (taken from the same Andy Woodruff blog post mentioned earlier).

Here I will show to make the same maps in ArcMap, and present some discussion about their implementation, in particular suitable choices for the original choropleth colors. Much was already discussed by the value by alpha originators, but I suppose I didn’t really appreciate them until I got my hands alittle dirty and tried to make them myself. Note this question on the GIS site, How to implement value-by-alpha map in GIS? gives other resources for implementing value-by-alpha maps. But as far as I am aware this contribution about how to do them in ArcMap is novel.

Below I present an example displaying the percentage of female heads of households with children (abbreviated PFHH from here on) for 2010 census blocks within Washington, D.C. Here we can consider the reliability of the PFHH dependent on the number of households within the block itself (i.e. we would expect blocks with smaller number of households to have a higher amount of variability in the PFHH). The map below depicts blocks that have at least one household, and so the subsequent PFHH maps will only display those colored polygons (about a third, 2132 out of 6507, have no households).

I chose the example because female headed households are a typical covariate of interest to criminologists for ecological studies. I also chose blocks as they are the smallest unit available from the census, and hence I expected them to show the widest variability in estimates. Below I provide an example on how one might typically display PFHH, while simultaneously incorporating information on the baseline number the maps will be map of.

The first example seperately displays the denominator number of households on the left and the percent of female headed households with children on the right both in a sequential choropleth scheme (darker colors are a higher PFHH and Number of Households).

One can also superimpose the same information on the map. Sun & Wong, 2010 suggest one use cross hatching above the the choropleth colors to depict reliability, but here I will demonstrate using choropleth colors for the baseline number of households and a proportional point symbols for the PFHH. I supplement the map on the right with a scatterplot, that has the number of households on the X axis and the PFHH on the Y axis.

These both do an alright job (if you made me pick one, I think I would pick the side-by-side sets of maps), but lets see if we can do better with value-by-alpha maps! The following tutorial will be broken up into two sections. The first section talks about actually generating the map, and the second section talks about how to make the legend. Neither is difficult, but making the legend is more of a pain in the butt.

How to make the value by alpha map

First one can start out by making the base layer with the desired choropleth classifications and color scheme. Note here I changed what I am visualizing from a sequential color scheme of PFHH to location quotients with only four categories. I will discuss why I did this later on in the post.

Then one can make several copies of that layer (right click -> copy -> paste within hierarchy), based on however many different reliability classifications you want to display. Here I will do 4 different reliability classifications. Note after you make them for management of the TOC it is easier to group them.

Then one uses selection criteria to filter out only those polygons that fall within the specified reliability range. And then sets the transparency for the that level to the desired value.

And voila, you have your value by alpha map. Note if after you make the layers you decide you want a different classification and/or color scheme, you can make the changes to one layer and then apply the changes to all of the other layers.

How to make the legend

Now making the legend is the harder part. If one goes to the layout view, one will see that since in this example one has essentially for layers superimposed on the same map, one has four seperate legend entries. Below is what it looks like with my defaults (plus a vertical rule I have in my map).

What we want in the end is a bivariate scheme, with the PFHH dimension running up and down, and the transparency dimension running from one side to the other (the same as in the example mortality rate map at the beginning of the post). To do this, one has to convert the legends to graphics.

The ungroup the elements so each can be individualy manipulated. Note, sometimes I have to do this operation multiple times.

Then re-arrange the panels and labels into the desired format.

More tedious than making the seperate layers, but not crazy unreasonable if you only have to do it for one (or a small number of maps). If you need to do it for a larger number of maps a better workflow will be needed, like creating a seperate “fake inset” map that replicates the legend, making the legend in a seperate tool, or just making the map entirely in a program where alpha blending is more readily incorporated. For instance in statistical packages it is typically a readily available encoding that can be added to a graphic (they also will allow continous color ramps and continous levels of transparency).

And voila, here is the final map. To follow is some discussion about choosing color schemes and whether you should use a black background or not.

Some discussion about color schemes

The Roth et al. (2010) paper in the cartographic journal and Andy Woodruff’s blog post I cited at the beginning of this post initially talked about color schemes and utilizing a black background, but I didn’t really appreciate the complexity of this choice until I went and made a value-by-alpha map of my own. In the end I decided to use location quotients to display the data, as the bivariate color scheme provides further contrast. I feel weird using a bivariate color scheme for a continous scale (hence the conversion to location quotients), but I feel like I should get over that. Everything has its time and place, and set rules like that aren’t good for anyone but bureaucrats or the mindless.

I certainly picked a complex dataset to start with, and the benifits of the value by alpha map over the two side by side maps (if any) are slight. I suspect why mine don’t look quite as nice as the ones created by Roth, Woodruff and company are partially due to the greater amount of complexity. The map with the SatScan reliabilities I noted as one of my favorite maps is quite striking, but it is partly due to the relibaility having a very spatially contiguous pattern (although the underlying cancer mortality rate map is quite spatially heterogenous). Here the spatial regularity is much weaker, in either the pattern being mapped or the reliability thresholds I had chosen. It does produce a quite pretty map though, FWIW.

For reference, here is the same map utilizing a black background. The only thing different in this map is that the most transparent layer is now set to 80% transparency instead of 90% (it was practically invisible at 90% with black as the modifying background color). Also it was necessary to do the fake inset map for a legend I talked about earlier with black as the background color. This is because the legend generated by ArcGIS always has white as the modifying color. If you refer back to the map with white as the modifying color, you can tell this produces greater contrast among the purples (the location quotient 2.1 – 4 for fully opaque and 4.1 – 12.6 for 40% transparent with white as the modifying color appear very similar).

The Roth Cartographic journal article gives other bivariate and nominal color scheme suggestions, you should take their advice. Hopefully in the future it will be simpler to incorporate bivariate color schemes in ArcMap, as it would make the process much simpler (and hence more useful for exploratory data analysis).

I would love it if people point me to other examples in which value by alpha maps are useful. I think in theory it is a good idea, but the complexity intoduced in the map is a greater burden than I intially estimated until I made a few. I initially thought this would be useful for presenting the results of geographically weighted regression or perhaps cancer atlas maps in general (where sometimes people just filter out results below some population threshold). But maybe not given the greater complexity introduced.

When should we use a black background for a map?

Some of my favorite maps utilize black (or dark) backgrounds. For some examples;

 

 

Steven Romalewski offers a slight critique of them recently in his blog post, Mapping NYC stop and frisks: some cartographic observations;

I know that recently the terrific team at MapBox put together some maps using fluorescent colors on a black background that were highly praised on Twitter and in the blogs. To me, they look neat, but they’re less useful as maps. The WNYC fluorescent colors were jarring, and the hot pink plus dark blue on the black background made the map hard to read if you’re trying to find out where things are. It’s a powerful visual statement, but I don’t think it adds any explanatory value.

I don’t disagree with this, and about all I articulate in their favor so far is essentially “well lit places create a stunning contrast with the dark background” while white background maps just create a contrast and are not quite as stunning!

I think the proof of a black backgrounds usefulness can be seen in the example value-by-alpha maps and the flow maps of James Chesire, where a greater amount of contrast is necessary. IMO in the value by alpha maps the greater contrast is needed for the greater complexity of the bivariate color scheme, and in Chesire’s flow maps it is needed because lines frequently don’t have enough areal gurth to be effectively distinguished from the background.

I couldn’t find any more general literature on the topic though. It doesn’t seem to be covered in any of the general cartography books that I have read. Since it is really only applicable to on-screen maps (you certainly wouldn’t want to print out a map with a black background) perhaps it just hasn’t been addressed. I may be looking in the wrong place though, some text editors have a high contrast setting where text is white on a dark background (for likely the same reasons they look nice in maps), so it can’t be that foreign a concept to have no scholarly literature on the topic.

So in short, I guess my advice is utilize a black background when you want to highly focus attention on the light areas, essentially at the cost of greatly diminishing the contrast with other faded elements in the map. This is perhaps a good thing for maps intended as complex statistical summaries, and the mapnificient travel times map is probably another good example where high focus in one area is sufficient and other background elements are not needed. I’m not sure though for choropleth maps black backgrounds are really needed (or useful), and any more complicated thematic maps certainly would not fit this bill.

To a certain extent I wonder what lessons from black backgrounds can be applied to the backgrounds of statistical graphics more generally. Leave me some comments if you have any thoughts or other examples of black background maps!

Beware of Mach Bands in Continuous Color Ramps

A recent post of mine on the cross validated statistics site addressed how to make kernel density maps more visually appealing. The answer there was basically just adjust the bandwidth until you get a reasonably smoothed surface (where reasonable means not over-smoothed to one big hill or undersmoothed to a bunch of unconnected hills).

Another problem that frequently comes along with the utlizing the default types of raster gradients is that of mach bands. Here is a replicated image I used in the cross validated site post (made utilizing the spatstat R library).

Even though the color ramp is continous, you see some artifacts around the gradient where the hue changes from what our eyes see as green to blue. To be more precise, approximately where the green hue touches the blue hue the blue color appears to be lighter than the rest of the blue background. This is not the case though, and is just an optical illusion (you can even see the mach bands in the legend if you look close). Mark Monmonier in How to Lie with Maps gives an example of this, and also uses that as a reason to not use continous color ramps (also another reason he gives is it is very difficult to map a color to an exact numerical location on the ramp). To note this isn’t just something that happens with this particular color ramp, this happens even when the hue is the same (the wikipedia page gives an example with varying grey saturation).

So what you say? Well, part of the reason it is a problem is because the artifact reinforces unnatural boundaries or groupings in the data, the exact opposite of what one wants with a continuous color ramp! Also the groupings are largely at the will of the computer, and I would think the analyst wants to define the groupings themselves when disseminating the maps (although this brings up another problem with how to define the color breaks). A general principle with how people interpret such maps is that they tend to form homogenous groupings anyway, so for both exploratory purposes and disseminating maps we should keep this in mind.

This isn’t a problem limited to isopleth maps either, the Color Brewer online app is explicitly made to demonstrate this phenonenom for choropleth maps visualizing irregular polygons. What happens is that one county that is spatially outlying compared to its neighbors appears more extreme on the color gradient than when it is surrounded by colors with the same hue and saturation. Below is a screen shot of what I am talking about, with some of the examples circled in red. They are easy to see that they are spatially outlying, but harder to map to the actual color on the ramp (and it gets harder when you have more bins).

Even with these problems I think the default plots in the spatstat program are perfectly fine for exploratory analysis. I think to disseminate the plots though I would prefer discrete bins in many (perhaps most) situations. I’ll defer discussion on how to choose the bins to another time!

Example (good and bad) uses of 3d choropleth maps

A frequent critique of choropleth maps is that, in the process of choosing color bins, one can hide substantial variation within each of the bins . An example of this is in this critique of a map in the Bad maps thread on the GIS stackexchange site.  In particular, Laurent argues that the classification scheme (in that example map) is misleading because China’s population (1.3 billion) and Indonesia’s population (0.2 billion) are within the same color bin although they have noteworthy differences in their population.

I think it is a reasonable note, and such a difference would be noteworthy in a number of contexts. One possible solution to this problem is by utilizing 3d choropleth maps, where the height of the bar maps to a quantitative value.  An example use of this can be found at Alasdair Rae’s blog, Daytime Population in the United States.

The use of 3d allows one to see the dramatic difference in daytime population estimates between the cities (mainly on the east coast).  Whereas a 2d map relying on a legend can’t really demonstrate the dramatic magnitude of differences between legend items like that.

I’m not saying a 3d map like this is always the best way to go. Frequent critiques are that the bars will hide/obstruct data. Also it is very difficult to really evaluate where the bars lie on the height dimension. For an example of what I am talking about, see the screen shot used for this demonstration,  A Historical Snapshot of US Birth Trends, from ge.com (taken from the infosthetics blog).

If you took the colors away, would you be able to tell that Virginia is below average?

Still, I think used sparingly and to demonstrate dramatic differences they can be used effectively.  I give a few more examples and/or reading to those interested below.

References

Ratti, Carlo, Stanislav Sobolevsky, Francesco Calabrese, Clio Andris, Jonathan Reades, Mauro Martino, Rob Claxton & Steven H. Strogatz. (2010) Redrawing the map of Great Britain from a Network of Human Interactions. PLoS ONE 5(12). Article is open access from link.

This paper is an example of using 3d arcs for visualization.

Stewart, James & Patrick J. Kennelly. 2010. Illuminated choropleth maps. Annals of the Association of American Geographers 100(3): 513-534.

Here is a public PDF by one of the same authors demonstrating  the concept. This paper gives an example of using 3d choropleth maps, and in particular is a useful way to utilize a 3d shadow effect that slightly enhances distinguishing differences between two adjacent polygons. This doesn’t technique doesn’t really map height to a continuous variable though, just uses shading to distinguish between adjacent polygons.

Other links of interest

GIS Stackexchange question – When is a 3D Visualisation in GIS Useful?

A cool example of utilizing 3d in kml maps on the GIS site by dobrou, Best practices for visualizing speed.

Alasdair Rae’s blog has several examples of 3d maps besides the one I linked to here, and I believe he was somehow involved in making the maps associated with this Centre for Cities short clip (that includes 3d maps).

If you have any other examples where you thought the use of 3d maps (or other visualizations) was useful/compelling let me know in the comments.

Edit: I see looking at some of my search traffic that this blog post is pretty high up for “3d choropleth” on a google image search already. I suspect that may mean I am using some not-well adopted terminology, although I don’t know what else to call these types of maps.

The thematic mapping blog calls them prism maps (and is another place for good examples). Also see the comment by Jon Peltier for that post, and the subsequent linked blog post by the guys at Axis maps (whose work I really respect), Virtual Globes are a seriously bad idea for thematic mapping.

Edit2: I came across another example, very similar to Alasdair Rae’s map produced by the New York Times, Where America Lives. Below is a screen shot (at the link they have an interactive map). Referred to by the folks at OCSI, and they call this type of map a “Spike Map”.