Overview of DataViz books

Keith McCormick the other day on LinkedIn the other day made a post/poll on his favorite data viz books. (I know Keith because I contributed a chapter on geospatial data analysis in SPSS in Keith and Jesus Salcedo’s book, SPSS Statistics for Data Analysis and Visualization, and Jon Peck contributed a chapter as well.)

One thing about this topical area is that there isn’t a standard Data Viz 101 curriculum. So if you pick up Statistics 101 books, they will cover pretty much all the same material (normal distribution, central limit theorem, t-tests, regression). It isn’t 100% overlap (some may spend more time on elementary probability, and others may cover ANOVA), but for someone learning the material there isn’t much point in reading multiple introductory stats books.

This is not so with the Data Viz books in Keith’s picture – they are very different in content. As I have read quite a few different books on the topic over the years I figured I would give my breakdown of the various books.

Albert Cairo’s The Functional Art

While my list is not in rank order, I am putting Cairo’s book first for a reason. Although there is not a Data Viz 101 curriculum, this book is the closest thing to it. Cairo goes through in short order various cognitive aspects on how we view the world that are fundamental to building good data visualizations. This includes things like it is easier to compare lengths along a common axis, and that we can perceive rank order to color saturation, but not to a color’s hue.

It is also enjoyable to read because of all the great journalistic examples. I did not care so much for the interviews at the back, and I don’t like the cover. But if I did a data viz course for undergrads in social sciences (Cairo developed this for journalism students), I would likely assign this book. But despite being very accessible, he covers a broad spectrum of both simple graphs and complicated scientific diagrams.

For this review many of these authors have other books. So I haven’t read Cairo’s The Truthful Art, so I cannot comment on it.

Edward Tufte’s The Visual Display of Quantitative Information

Tufte’s book was the first data viz book I bought in grad school. I initially invested in it as he had a chapter on a critique of powerpoint presentations, which is very straightforward and provides practical advice on what not to do. Most of the critiques of this book are that it is mostly just a collection of Tufte’s opinions about creating minimalist, dense, scientific graphs. So while Cairo dives into the science of perception, Tufte is just riffing his opinions. His opinions are based on his experience though, and they are good!

I believe I have read all of Tufte’s other books as well, but this is the only one that made much of an impression on me (some of his others go beyond graphs, and talk about UI design). I gobbled it up in only two days when I first started reading it, and so if I were stuck on an island with one book scenario I would choose this one over the others I list here (although again think Cairo’s book is the best to start with for most folks). So for scientists I think it is a good investment and an enjoyable read overall.

Nathan Yau’s Visualize This

Of all the books I review, Yau’s is the only how-to actually make graphs in software. Unfortunately, much of Yau’s programmatic advice was outdated already when it was published (e.g. flash was already going by the wayside). So while he has many great examples of creating complicated and beautiful data visualizations, the process he outlines to make them are overly complicated IMO (such as using python to edit parts of a pre-made SVG map). It is a good book for examples no doubt, and maybe you can pick up a few tricks in terms of post editing charts in a vector graphics program, such as Illustrator or Inkscape (most examples are making graphs in base R and then exporting to edit finishing touches).

In terms of making a how-to book it is really hard. Yau I am sure has updates on his Flowing Data website to make charts (and maybe his newer book is better). But I don’t think I would recommend investing in this book for anything beyond looking at pretty examples of data viz.

Stephen Kosslyn’s Graph Design for the Eye and Mind

The prior books all contained complicated, dense, scientific graphs. Kosslyn’s book is specifically oriented to making corporate slide decks/powerpoints, in which the audience is not academic. But his advice is mostly backed on his understanding of the psychology (he relegates extensive endnotes to point to scientific lit, to avoid cluttering up the basic book). He has as few gems of advice I admit, such as it isn’t the number of lines in a graph that make it complicated, but really the number of unique profiles. But then he has some pieces I find bizarre, such as saying pie charts are OK because they are so popular (so have survived a Darwinian survival process in terms of being presented to business people).

I would stick with Tufte’s powerpoint advice (and later will mention a few other books related to giving presentations), as opposed to recommending this book.

Alan MacEachren How maps work: Representation, visualization, and design

MacEachren’s book is encyclopedic in terms of scientific literature on design aspects of both cartography, as well as the psychological literature. So it is like reading an encyclopedia (not 100% sure if I ever finished it front to back to be honest). I would start here if you are interested in designing cognitive experiments to test certain graphs/maps. I think MacEachren pooling from cartography and psychology ends up being a better place to start than say Colin Ware’s Information Visualization (but it is close). They are both very academically oriented though.

Leland Wilkinson’s The Grammar of Graphics

I used SPSS for along time when I read this book, so I was already quite familiar with the grammar of graphics in terms of creating graphs in SPSS. That pre-knowledge helped me digest Wilkinson’s material I believe. Nick Cox has a review of this book, and for this one he notes that the audience for this book is hard to pin down. I agree, in that you need to be pretty far along already in terms of making graphs to be able to really understand the material, and as such it is not clear what the benefit is. Even for power users of SPSS, much of the things Wilkinson talks about are not implemented in SPSS’s GGRAPH language, so they are mostly just on paper.

(Note Nick has a ton of great reviews on Amazon as well for various data viz books. He is a good place to start to decide if you want to purchase a book. For example the worst copy-edited book I have ever seen is Andy Kirk’s via Packt publishing, and Nick notes how poorly it is copy-edited in his review.)

Here is an analogy I think is apt for Wilkinson’s book – if we are talking about cars, you may have a book on the engineering of the car, and another on how to actually drive the car. Knowing how pistons work in a combustible engine does not help you drive a car, but helps you build one. Wilkinson’s book is more about the engineering of a graph from an algebraic perspective. At the fringes it helps in thinking about the components of graphs, but doesn’t really give any advice about what graph to make in-and-of itself, nor what is a good graph or a bad graph.

Note that the R library ggplot2, is actually quite a bit different than Leland’s vision. It is simpler, in that Wickham essentially drops the graph algebra part, so you specify the axes directly, whereas in Wilkinson’s you just say X*Y*Z, and depending on other aspects of the grammer this may produce a 3d scatterplot, a facet gridded scatterplot, a clustered bar chart, etc. I think Wickham was right to make that design choice, but in doing so it really isn’t an implementation of what Wilkinson was talking about in this book.

Jacques Bertin’s Semiology of Graphics: Diagrams, Networks, Maps

Bertin’s book is an attempt to make a dictionary of terms for different aspects of graphs. So it is a bit in the weeds. One unique aspect of Bertin is that he discusses titles and labels for graphs, although I wouldn’t go as far as saying that his discussion leads to straightforward advice. I find Wilkinson’s grammer of graphics a more useful way to overall think about the components of a graph, although Bertin is more encyclopedic in coverage of different types of graphs and maps in the wild.

Short notes on various other books

Most of these books (with the exception of Nathan Yau’s) are not how-to actually write code to make graphs. For those that use R, there are two good options though. Hadley Wickham’s ggplot2: Elegant Graphics for Data Analysis (Use R!) was really good at the time (I am not sure if the newer version is more up to date though, like any software it changes over time so the older one I know is out of date for many different code examples). And though I’ve only skimmed it, Kieran Healy’s Data Visualization: A practical introduction is free and online and looks good (and also for those interested in criminal justice examples Jacob Kaplan has examples in R as well, Crime by the Numbers). So those later two I know are good in terms of being up to date.

For python I just suggest using google (Jake VanderPlas has a book that looks good, and his website is really good). For excel I really like Jorge Camões work (his book is Data at Work, which I don’t think I’ve read, but have followed his website for along time).

In terms of scientific presentations (which covers both graphs and text), I’ve highly suggested in the past Trees, maps, and theorems. This is similar in spirit to Tufte’s minimalist style, but gives practical advice on slides, writing, and presentations. Jon Schwabish’s book, Better Presentations: A Guide for Scholars, Researchers, and Wonks, is very good as well in terms of direct advice. I think for folks in academia I would say go for Doumont’s book, and for those in corporate environment go for Schwabish’s.

Stephen Few’s books deserve a mention here as well, such as Show me the numbers. Stephen is the only one to do a deep dive into the concept of dashboards. Stephen’s advice is very straightforward and more oriented towards a corporate type environment, not so much a scientific one (although it isn’t bad advice for scientists, ditto for Schwabish, just stating more so for an understanding of the intended audience).

I could go on forever, Tukey’s EDA, Calvin Schmid’s book on how to draw graphs with actual splines! How to lie with statistics and how to lie with maps. So many to choose from. But I think if you are starting out in a data oriented role in which you need to make graphs, I would suggest starting with Cairo’s book, then get Tufte to really get some artistic motivation and a good review of bad powerpoint practices. The rest are more advanced material for study though.

Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: