Remaking a clustered bar chart

Thomas Lumley on his blog had a recent example of remaking a clustered bar chart that I thought was a good idea. Here is a screenshot of the clustered bar chart (the original is here):

And here is Lumley’s remake:

In the original bar chart it is hard to know what is the current value (2017) and what are the past values. Also the bar chart goes to zero on the Y axis, which makes any changes seem quite small, since the values only range from 70% to 84%. Lumley’s remake clearly shows the change from 2016 to 2017, as well as the historical range from 2011 through 2016.

I like Lumley’s remake quite alot, so I made some code in SPSS syntax to show how to make a similar chart. The grammar of graphics I always thought is alittle confusing when using clustering, so this will be a good example demonstration. Instead of worrying about the legend I just added in text annotations to show what the particular elements were.

One additional remake is instead of offsetting the points and using a slope chart (this is an ok use, but see my general critique of slopegraphs here) is to use a simpler dotplot showing before and after.

One reason I do not like the slopes is that slope itself is dictated by the distance from 16 to 17 in the chart (which is arbitrary). If you squeeze them closer together the slope gets higher. The slope itself does not encode the data you want, you want to calculate the difference from beginning to end. But it is not a big difference here (my main complaints for slopegraphs are when you superimpose many different slopes that cross one another, in those cases I think a scatterplot is a better choice).

Jonathan Schwabish on his blog often has similar charts (see this one example).

Pretty much all clustered bar charts can be remade into either a dotplot or a line graph. I won’t go as far as saying you should always do this, but I think dot plots or line graphs would be a better choice than a clustered bar graph for most examples I have seen.

Here like Lumley said instead of showing the ranges likely a better chart would just be a line chart over time of the individual years, that would give a better since of both trends as well as typical year-to-year changes. But these alternatives to a clustered bar chart I do not think turned out too shabby.


SPSS Code to replicate the charts. I added in the labels for the elements manually.

**********************************************************************************************.
*data from https://www.stuff.co.nz/national/education/100638126/how-hard-was-that-ncea-level-1-maths-exam.
*Motivation from Thomas Lumley, see https://www.statschat.org.nz/2018/01/18/better-or-worse/.

DATA LIST FREE / Type (A10) Low Y2017 Y2016 High (4F3.1).
BEGIN DATA
Tables 78.1 71.2 80.5 84
Geo 71.5 73.5 72 75.6
Chance 74.7 78.4 80.2 80.2
Algebra 72.2 78.3 82 82
END DATA.
DATASET NAME Scores.
VALUE LABELS Type
  'Tables' 'Tables, equations, and graphs'
  'Geo' 'Geometric Reasoning'
  'Chance' 'Chance and data'
  'Algebra' 'Algebraic procedures'
.
FORMATS Low Y2017 Y2016 High (F3.0).
EXECUTE.

*In this format I can make a dot plot.
GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=Y2017 Y2016 Low High Type 
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: Y2017=col(source(s), name("Y2017"))
  DATA: Y2016=col(source(s), name("Y2016"))
  DATA: Low=col(source(s), name("Low"))
  DATA: High=col(source(s), name("High"))
  DATA: Type=col(source(s), name("Type"), unit.category())
  GUIDE: axis(dim(1), delta(1), start(70))
  GUIDE: axis(dim(1), label("Percent Students with a grade of 'Achieved' or better"), opposite(), delta(100), start(60))
  GUIDE: axis(dim(2))
  SCALE: cat(dim(2), include("Algebra", "Chance", "Geo", "Tables"))
  ELEMENT: edge(position((Low+High)*Type), size(size."30"), color.interior(color.grey), 
           transparency.interior(transparency."0.5"))
  ELEMENT: edge(position((Y2016+Y2017)*Type), shape(shape.arrow), color(color.black), size(size."2"))
  ELEMENT: point(position(Y2016*Type), color.interior(color.black), shape(shape.square), size(size."10"))
END GPL.

*Now trying a clustered bar graph.
GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=Y2017 Y2016 Low High Type 
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: Y2017=col(source(s), name("Y2017"))
  DATA: Y2016=col(source(s), name("Y2016"))
  DATA: Low=col(source(s), name("Low"))
  DATA: High=col(source(s), name("High"))
  DATA: Type=col(source(s), name("Type"), unit.category())
  TRANS: Y17 = eval("2017")
  TRANS: Y16 = eval("2016")
  COORD: rect(dim(1,2), cluster(3,0))
  GUIDE: axis(dim(3))
  GUIDE: axis(dim(2), label("% Achieved"), delta(1), start(70))
  ELEMENT: edge(position(Y16*(Low+High)*Type), size(size."30"), color.interior(color.grey), 
           transparency.interior(transparency."0.5"))
  ELEMENT: edge(position((Y16*Y2016*Type)+(Y17*Y2017*Type)), shape(shape.arrow), color(color.black), size(size."2"))
  ELEMENT: point(position(Y16*Y2016*Type), color.interior(color.black), shape(shape.square), size(size."10"))
END GPL.

*This can get tedious if you need to make a line for many different years.
*Reshape to make a clustered chart in a less tedious way (but cannot use arrows this way).
VARSTOCASES /MAKE Perc FROM Y2016 Y2017 /INDEX Year.
COMPUTE Year = Year + 2015.
DO IF Year = 2017.
  COMPUTE Low = $SYSMIS.
  COMPUTE High = $SYSMIS.
END IF.
EXECUTE.

GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=Type Perc Year Low High MISSING=VARIABLEWISE REPORTMISSING=NO
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: Type=col(source(s), name("Type"), unit.category())
  DATA: Perc=col(source(s), name("Perc"))
  DATA: Low=col(source(s), name("Low"))
  DATA: High=col(source(s), name("High"))
  DATA: Year=col(source(s), name("Year"), unit.category())
  COORD: rect(dim(1,2), cluster(3,0))
  GUIDE: axis(dim(3))
  GUIDE: axis(dim(2), label("% Achieved"), delta(1), start(70))
  SCALE: cat(dim(3), include("Algebra", "Chance", "Geo", "Tables"))
  ELEMENT: edge(position(Year*(Low+High)*Type), color.interior(color.grey), size(size."20"), transparency.interior(transparency."0.5"))
  ELEMENT: path(position(Year*Perc*Type), split(Type))  
  ELEMENT: point(position(Year*Perc*Type), size(size."8"), color.interior(color.black), color.exterior(color.white))
END GPL.
**********************************************************************************************.

 

Rejected!

My Critique of Slope Graphs paper was recently rejected as a short article from The American Statistician. I’ve uploaded the new paper to SSRN with the suggested critiques and my responses to them (posted here).

I ended up bugging Nick Cox for some pre peer-review feedback and he actually agreed! (A positive externality of participating at the Cross Validated Q/A site.) The main outcome of Nick’s review was a considerably shorter paper. The reviews from TAS were pretty mild (and totally reasonable), but devoid of anything positive. The main damning aspect of the paper is that the reviewers (including Cox) just did not find the paper very interesting or well motivated.

My main motivation was the recent examples of slope graphs in the popular media, most of which are poor statistical graphics (and are much better suited as a scatterplot). The most obvious being Cairo’s book cover, which I thought in and of itself deserved a critique – but maybe I should not have been so surprised about a poor statistical graphic on the cover. This I will not argue is a rather weak motivation, but one I felt was warranted given the figures praising the use of slopegraphs in inappropriate situations.

In the future I may consider adding in more examples of slopegraphs besides the cover of Albert Cairo’s book. In my collection of examples I may pull out a few more examples from the popular media and popular data viz books (besides Cairo’s there are blog post examples from Ben Fry and Andy Kirk – haven’t read their books so I’m unsure if they are within them.) For a preview, pretty much all of the examples I consider bad except for Tufte’s original ones. Part of the reason I did not do this is that I wrote the paper as a short article for TAS — and I figured adding these examples would make it too long.

I really had no plans to submit it anywhere besides TAS, so this may sit as just a pre-print for now. Let me know if you think it may be within the scope of another journal that I may consider.

A critique of slopegraphs

I’ve recently posted a pre-print of an article, A critique of slopegraphs, on SSRN. In the paper I provide a critique of the use of slopegraphs and present alternative graphics to use in their place, using the slopegraph displayed on the cover of Albert Cairo’s The Functional Art as motivation – below is my rendering of that slopegraph.

Initially I wanted to write a blog post about the topic – but I decided to give all of the examples and full discussion I wanted it would be far too long. So I ended up writing a (not so short) paper. Below is the abstract, and I will try to summarize it in a few quick points (but obviously I encourage you to read the full paper!)

Slopegraphs are a popular form of graphic depicting change along two independent axes by means of a connecting line. The critique here lists several reasons why interpreting the slopes may be misleading and suggests alternative plots depending on the goals of the visualization. Guidelines as to appropriate situations to use slopegraphs are discussed.

So the three main points I want to make are:

  • The slope is not the main value of interest in a slopegraph. The slope is itself an arbitrary function of how far away the axes are placed from one another.
  • Slopegraphs are poor for judging correlation and seeing a functional relationship between the two values. Scatterplots or just graphing the change directly are often better choices.
  • Slopegraphs are difficult to judge when the variance between axes changes (which produce either diverging or converging slopes) and when the relationship is negative (which produces many crossings in the slopes).

I’ve catalogued a collection of articles, examples and other critiques of slopegraphs at this location. Much of what I say is redundant with critiques of slopegraphs already posted in other blogs on the internet.

I’m pretty sure my criminal justice colleagues will not be interested in the content of the paper, so I may need to cold email someone to review it for me before I send it off. So if you have comments or a critique of the paper I would love to hear it!