Presentation at IACA 2014 – Making Field Stops Smart

Part of the work I am doing with the Finn Institute in collaboration with the Albany Police Department was accepted as a presentation at the upcoming IACA conference in Seattle next week. The NIJ used to have a separate Crime Mapping conference, but they folded it into the yearly IACA conference. So this is one of the NIJ Crime Mapping presentations.

The title of the presentation is Making Field Stops Smart, and below is the abstract:

Mapping hot spots of crime incidents for use in allocating patrol resources has become commonplace. This research is intended to extend the logic to mapping locations of field interviews. The project has two specific spatial analysis components; 1) are most of the stops being conducted a high crime locations, and 2) are locations with the most stops the locations with the most productive stops (in terms of arrests, contraband recovery, stopping chronic offenders). Making stops smart is being conducted as a research partnership between the Albany, NY police department and the Finn Institute of Public Safety.

The time of the presentation is at 15:30 on Thursday 9/11. Two other presenters, Eric Paull from Akron, Ohio and Christian Peterson from Portland, Oregon have presentations on the panel as well (see the IACA agenda for their talk abstracts).

I am uncomfortable publicly releasing the pre-print white papers given the collaboration (Rob Worden and Sarah McLean are co-authors) and because that APD’s name is directly attached to the work. But if you send me an email I can forward the white paper for this presentation and related work we are doing.

If you see me at IACA feel free to come up and say hi. I do not have any other plans while I am in town besides going to presentations.

 

Rejected!

My Critique of Slope Graphs paper was recently rejected as a short article from The American Statistician. I’ve uploaded the new paper to SSRN with the suggested critiques and my responses to them (posted here).

I ended up bugging Nick Cox for some pre peer-review feedback and he actually agreed! (A positive externality of participating at the Cross Validated Q/A site.) The main outcome of Nick’s review was a considerably shorter paper. The reviews from TAS were pretty mild (and totally reasonable), but devoid of anything positive. The main damning aspect of the paper is that the reviewers (including Cox) just did not find the paper very interesting or well motivated.

My main motivation was the recent examples of slope graphs in the popular media, most of which are poor statistical graphics (and are much better suited as a scatterplot). The most obvious being Cairo’s book cover, which I thought in and of itself deserved a critique – but maybe I should not have been so surprised about a poor statistical graphic on the cover. This I will not argue is a rather weak motivation, but one I felt was warranted given the figures praising the use of slopegraphs in inappropriate situations.

In the future I may consider adding in more examples of slopegraphs besides the cover of Albert Cairo’s book. In my collection of examples I may pull out a few more examples from the popular media and popular data viz books (besides Cairo’s there are blog post examples from Ben Fry and Andy Kirk – haven’t read their books so I’m unsure if they are within them.) For a preview, pretty much all of the examples I consider bad except for Tufte’s original ones. Part of the reason I did not do this is that I wrote the paper as a short article for TAS — and I figured adding these examples would make it too long.

I really had no plans to submit it anywhere besides TAS, so this may sit as just a pre-print for now. Let me know if you think it may be within the scope of another journal that I may consider.

New paper – Testing for Randomness in Day of Week Crime Sprees with Small Samples

I’ve uploaded a new paper, Testing for Randomness in Day of Week Crime Sprees with Small Samples, to SSRN. The abstract is below:

This paper discusses exact tests for evaluating whether a series of offences are randomly distributed across days of the week for small sample sizes. The given context is if a crime analyst has identified a series of events that are committed by the same offender(s), can the analyst determine if those events are randomly distributed with respect to the day of week given only a few offences? I detail how one can develop exact reference distributions since the number of potential permutations are small. I also discuss the power of the chi^2 and Kuiper’s V test for small sample sizes, and give an example of hypothesis testing when crimes have uncertain days of occurrence.

Here is an example graph of the power of the tests under different alternate hypotheses of crimes only having probability of being committed on a certain subset of days in the week. Comments on the paper would be wonderful (I will have to bug someone to give me some pre peer-review feedback) – so feel free.

Poisson regression and crazy predictions

Here is a problem I’ve encountered a few times in my own work (and others) with Poisson regression models and the exponential link function. It came up recently in some discussions on the scatterplot blog by Jeremy Freese (see 1 & 2) critiquing the PNAS paper on the effect of female named hurricanes on death tolls, so I figured I would expand up those thoughts a little here.

So the problem is when you estimate a Poisson regression model is that the exponential link function can become explosive for explanatory variables that have a large range. So to be clear, we have a Poisson regression model of the form (here E[Y] mean the expected value of Y):

log(E[Y]) = B1*(X)
    E[Y]  = e^(B1*X)

If X has a small range this may be fine, but if X has a large range it can become problematic. Consider if Y are hurricane deaths, X is monetary damage of the hurricane, and B1 = 0.01. Lets say the monetary damage ranges from 1 to 1000 (imagine these are in thousands of dollars, so range between $1,000 and $1 million). What happens with the predictions?

E[Deaths] = e^(0.01*   1) =     1.01
E[Deaths] = e^(0.01*   5) =     1.05
E[Deaths] = e^(0.01*  10) =     1.11
E[Deaths] = e^(0.01*  50) =     1.65
E[Deaths] = e^(0.01* 100) =     2.71
E[Deaths] = e^(0.01* 500) =   148
E[Deaths] = e^(0.01*1000) = 22026

These predictions are invariant to linear transformations – that is Z-scoring X in the original units doesn’t change the predictions (the same as expressing X in [dollars/1000] doesn’t make any difference than just by including [X] on the right hand side). The linear predictor of B1 will simply be scaled by the appropriate inverse transformation. Also I’d note that expressed in terms of incident rate ratios the effect would be e^0.01=1.01. This appears on its face a totally innocuous effect, and only in consideration of the variation in X does it appear to be absurd.

You can see that if the range of X were smaller, say between 1 and 100, the predictions might be fine. The predictions between 1 and 100 only vary by 1.7 deaths. The problem with these explosive predictions at larger values is that they are nonsense for most of social scientific research. A simple sanity check to see if this is occurring is to check the predicted value from your Poisson regression equation at the low end of X versus the high end (and just pretend all of the other explanatory variables are set to 0) exactly as I have done here. If the high end is crazy, you will need to consider some alternative model specification (or be very clear that the model can not be extrapolated to the larger values of X).

A useful alternate parametrization is simply to log X, and in this case when exponentiating the right hand side, it will make the predictor a power of the original metric.

So imagine we fit the model:

log(E[Y]) = B2*(log(X))
    E[Y]) = e^(B2*log(X)) 
          = x^B2

Lets say here that B2 = 0.5. What happens to our predictions again?

E[Deaths] =    1^0.5 =  1
E[Deaths] =    5^0.5 =  2.2
E[Deaths] =   10^0.5 =  3.2
E[Deaths] =   50^0.5 =  7.1
E[Deaths] =  100^0.5 = 10
E[Deaths] =  500^0.5 = 22
E[Deaths] = 1000^0.5 = 32

Those predictions look a little bit easier to swallow at the larger ranges. Notice also the differences in predictions in the smaller stages? There is more discrimination for the smaller values than on the original scale, but the larger values are suppressed. Lets consider the predictions side by side for easier comparison.

   X      B1   B2
   -      --   --
   1       1    1
   5       1    2
  10       1    3
  50       2    7
 100       3   10
 500     148   22
1000   22026   32

A frequent problem with logging the explanatory variables is that they contain zeroes. A simple alternative is to treat log(0) as 0 and then have a separate dummy variable equal to 1 when X = 0. This model may not make Occam happy, as it implies a discontinuity at 0, but it is in my opinion a small price to pay. Also if there are a lot of zeroes this doesn’t strike me as totally unrealistic to have a mixture of what happens at 0 and then what happens at the higher values. So the full model written out would be:

log(E[Y]) = B3*D + B4*(log(X))

But the model is essentially discontinuous. When X=0, we treat log(X)=0 and D=1, so the model reduces to;

log(E[Y]) = B3*D :When X = 0

When X>0, D=0 and the model reduces to:

log(E[Y]) = B4*(log(X)) :When X > 0

Now, it certainly would be weird if B3>>0, as this would imply a high spike at 0, and then at the X value of 1 Y goes back down to 1 and then increases with X. If we expect B4 to be positive, then a negative value of B3 (or very close to 0) would make the most sense. It is still a discontinuity in the function, but one that may make theoretical sense. So imagine we fit the equation log(E[Y]) = B3*D + B4*(log(X)), lets say B4 is 0.5 (the same as B2), and that B3 is equal to -0.1. This would then make the set of predictions go:

E[Deaths] =  e^-0.01 =  0.9
E[Deaths] =    1^0.5 =  1.0
E[Deaths] =    5^0.5 =  2.2
E[Deaths] =   10^0.5 =  3.2
E[Deaths] =   50^0.5 =  7.1
E[Deaths] =  100^0.5 = 10
E[Deaths] =  500^0.5 = 22
E[Deaths] = 1000^0.5 = 32

So in this made up example the discontinuity pretty much fits right in with the rest of the function. We may consider other non-linear transformations of X as well (splines or higher powers) but frequently an additional problem is that the bulk of the data lie in the lower end of the range. So for our dollars if it was highly right skewed, there may only be a few values at 100 or higher. These can be highly influential if you use powers of X (e.g. include X^2, X^3 etc. on the right hand side) – so splines are a better choice – but essentially no matter how you fit the function it will be hard to verify the fit at these values or extrapolate to those tails. So the fit in the original function may be fine – but it just implies unrealistic marginal effects in the tails.

So how do we verify one equation over the other? Visualizing count data in scatterplots tend to be harder than visualizing continuous data, especially if there is a stock pile of data at 0. The problem is simply exacerbated if the explanatory variable has a similar right skew, there will be a large mass near the origin of the plot and very sparse everywhere else.

My simple suggesting is to just bin the data at X values, which is very simple if X is integer valued, and then plot the mean and standard error of the Y value within those bins. As Poisson regression and its variants rely on asymptotic properties, if the error bars are too variable to deduce a pattern you should be concerned your sample size isn’t large enough to begin with.

If the bulk of the data only have a small range over X, then it will be hard in practice to differentiate between the two model parametrizations I suggest here (with the typical noisy data we have in the social sciences). So you may prefer logging the X variable simply to prevent the dramatic explosions in the tails of the data right from the start.

I do feel comfortable saying that if the ratio of the smallest to largest value for the independent variable is over 100, you should check the predictions of the exponential link function very closely (if the smallest value is 0 just estimate the ratio as if the smallest value is 1). If this ratio is 100 or larger, unless the linear predictor in the Poisson regression equation is very small (<<.01) the predictions may explode into very implausible ranges for the larger X values.

What is up with 3d graphics for book covers?

The other day in Google books I noticed Graphics for Statistics and Data Analysis with R by Kevin Keen in the related book section. What caught my eye was not the title (there have to be 100+ related R books at this point) but the really awful 3d pie chart.

Looking at the preview on google books this appears to be an unfortunate substitution. The actual cover has a much more reasonable set of surface plots and other online book stores (e.g. Amazon) appear to have the correct cover.

I suspect someone at CRC Press used some stock imagery for the cover, and unfortunately the weird 3d pie graph has been propagated to the google book preview without correction.

This reminded me of a few other book covers in cartography and data visualization though that I find less than appealing. Now, I’m not saying here to judge a book by its cover, and I have not read all of the books I will point to here. But I find the use of 3d graphics in book covers in the data visualization field to be strange and bordering cognitive dissonance with the advice most of the authors give.

First I’ll start with a book I have read, and would suggest to everyone, Thematic cartography and geographic visualization by Slocum et al. I have the 2005 version, and it is dawned by this 3d landscape. (Sorry this is the largest image I can find online – other editions I believe have different covers.)

The multivariate display of data is admirable – so for exploratory analysis you could make a reasonable argument for the use of proportional sized circles superimposed on the choropleth map. The use of 3d in this circumstance though is gratuitous, and the extreme perspective hides much of the data while highlighting the green hills in the background.

The second mapping book I have slight reservations about critiquing the cover (I am on the job market!). I have not read the book, so I can not say anything about its contents. But roaming the book displays at an ASC conference I remember seeing this cover, GIS and Spatial Analysis for the Social Sciences: Coding, Mapping, and Modeling by Nash and Asencio.

This probably should not count in the other 3d graphics I am showing. The bar columns do have shading for 3d perspective – but the map otherwise is 2d. But the spectral color scheme is an awful choice. The red in the map tends to stand out the most – which places with zero crimes I don’t think you want to make that impression. The choropleth colors appear to be displaying the same data as the point data. The point data are so clustered that the choropleth can only be described as misleading – which may be a good point in text for side by side maps – but on the cover? Bar locations seem to be unrelated (as we might expect for juvenile crime) but they are again aggregated to the (probably) census units – making me question if the aggregation obfuscates the relationship. Bars are not available from the census – so it is likely this aggregation was intentional. I have no idea about the content of the book and I will likely get it and do an overall review of all crime mapping books sometime. But the cover is unambiguously a bad map.

The last book cover with 3d graphics (related to data-visualization) that I immediately remembered was R For Dummies by Meys and de Vries.

Now this when you look close really is not bad. It is not a graph on the cover, but a set of winding, hexagon cylinder stairsteps. So the analogy of taking small steps is fine – but the visual similarity to other statistical 3d graphics is clear. Consider the SPSS For Dummies book by Griffith.

Now that is an intentional, 3d chart made up of tiny blocks, with a trend line if you look closely, shadowed by cigarette like red bars in the background. At least this is so strange (and not possible in statistical software) that this example would never be confused with an actual reasonable statistical graphic. The Dummies series has such brand recognition as well that the dominant part of the cover might be the iconic yellow and type, as opposed to the inset graphic.

Not wanting to leave other software out of the loop, I looked for examples for SAS and Stata. SAS has a reasonable 3d cover in SAS System for Statistical Graphics by Friendly.

Short sidetrack story: I first learned statistical programming using SAS back in undergrad days at Bloomsburg University. Default graphics for SAS at that point (04-08) I believe were still the ASCII art looking things (at least that is what I remember). During our last meeting for my last statistics class – one of the other students showed me you could turn on the ODS output to html – and tables and graphs were by default pretty nice. I since have not had a need to use SAS.

This 3d cover by Friendly is arguably a reasonable use of 3d. 3d graphs are hard to navigate, and the use the anchors connecting the observations to the non-linear surface more easily associate a point with below or above the surface. It is certainly difficult though to understand the fit of the function – so likely a series of bivariate graphs would be more intuitive – especially given the meager number of points. I suspect the 3d on the cover is for the same reason 3d graphics were used in the other covers – because it looks cooler to book marketers!

Stata managed to debunk the 3d graph trends – I could not find any example Stata books with 3d graphics. Nick Cox’s newer collection of his Speaking Stata series though has some interesting embellishments.

While in isolation all of the graphs are fine – I’m sure Cox would not endorse the gratuitous use of color gradients in the graphics (I don’t think svg like gradients like that are even possible in Stata graphics). The ternary diagrams show nothing but triangles as well – so I don’t think such gradients are a good idea in any case for simply the background of the plot. Such embellishments could actually decode data, but in the case of bar graphs do not likely hurt or help with understanding the plot. When such gradients are used as the background though they likely compete with the actual data in the plot. Stata apparently can do 3d graphs – so I might suggest I write a book on crime modelling (published by Stata press) and insert a 3d graph on the cover (as this is clearly a niche in the market not currently filled!) I might have to make room for Chernoff faces somewhere on the front or back cover as well.

So maybe I am just seeing things in the examples of 3d covers. If anyone has any insight into how these publishers choose the covers let me know – or if you have other examples of bad book cover examples of data vizualization! Since most of my maps and graphs are pretty dull in 2d I might just outsource the graphic design if I made a book.

Learning to write badly in the social sciences

I recently finished Michael Billig’s book, Learn to Write Badly: How to Succeed in the Social Sciences, and I was largely convinced of Billig’s thesis so I shall reiterate it briefly here. Billig argues that much of social scientific writing is difficult to understand because of the excessive use of nouns instead of verbs. If we (ironically) use the word nominalization to describe the process of turning verbs into nouns, it would sound pretty similar to old hat of avoiding the use of jargon in scientific writing.

Billig goes a bit further though then the usual avoid jargon advice (which is uncontroversial), but gives many examples of where this change from verbs to nouns has negative consequences on how writing is interpreted. Two of these are:

  • Verbs are often much more clear about what actor is performing what actions (and in turning the verb to a noun both the actor and the action can become ambigious)
  • Replacing verbs with nouns gives a false sense of authority that the noun actually exists

The first is important for social scientists because we are pretty much always describing the actions of humans. The second Billig likens to marketing strategies to promote ones work (similar to how advertisements promote products), which I imagine the analogy turns a few academics stomachs.

As an example from my own work, I will use the title to one of my papers, The Moving Home Effect: A Quasi Experiment Assessing Effect of Home Location on the Offence Location. The title is really awful, and in a bit of self-deprecation the few times I presented the work I would make fun of my title making skills at the opening of my talk. The first part of the title before the semi-colon, "The Moving Home Effect", is an example of an ambiguous use of nouns. First, describing my findings as the moving home effect is rather ambiguous, it could mean effecting anything and everything. My particular study is much more restricted, I examined the distance between crimes before and after offenders move. Second, the effect of the added distance between crimes when moving I found to be rather small, which is probably one of the more interesting points of the paper. So saying "The Moving Home Effect" places unwanted emphasis that it exists and is real.

The same exercise can be used for the second part of the title. The use of quasi-experiment is simply econometrics jargon and is unneeded. It is an appeal to associate with a particular camp of analysis, and really does nothing to describe the nature of the work. So, I propose possible rewrites of my title to be:

  • Moving one’s home slightly changes the average distance between offences
  • When an offender moves, it slightly changes the location of where they offend

These are much more descriptive (and shorter) than the original title. Reviewing my own work such examples are rampant, so I have a bit of work to do to live up to Billig’s ideal, but I am convinced it will lead to improved writing. Also it seems to me it is a good exercise to make ones writing more concise, which is always welcome.

A critique of slopegraphs

I’ve recently posted a pre-print of an article, A critique of slopegraphs, on SSRN. In the paper I provide a critique of the use of slopegraphs and present alternative graphics to use in their place, using the slopegraph displayed on the cover of Albert Cairo’s The Functional Art as motivation – below is my rendering of that slopegraph.

Initially I wanted to write a blog post about the topic – but I decided to give all of the examples and full discussion I wanted it would be far too long. So I ended up writing a (not so short) paper. Below is the abstract, and I will try to summarize it in a few quick points (but obviously I encourage you to read the full paper!)

Slopegraphs are a popular form of graphic depicting change along two independent axes by means of a connecting line. The critique here lists several reasons why interpreting the slopes may be misleading and suggests alternative plots depending on the goals of the visualization. Guidelines as to appropriate situations to use slopegraphs are discussed.

So the three main points I want to make are:

  • The slope is not the main value of interest in a slopegraph. The slope is itself an arbitrary function of how far away the axes are placed from one another.
  • Slopegraphs are poor for judging correlation and seeing a functional relationship between the two values. Scatterplots or just graphing the change directly are often better choices.
  • Slopegraphs are difficult to judge when the variance between axes changes (which produce either diverging or converging slopes) and when the relationship is negative (which produces many crossings in the slopes).

I’ve catalogued a collection of articles, examples and other critiques of slopegraphs at this location. Much of what I say is redundant with critiques of slopegraphs already posted in other blogs on the internet.

I’m pretty sure my criminal justice colleagues will not be interested in the content of the paper, so I may need to cold email someone to review it for me before I send it off. So if you have comments or a critique of the paper I would love to hear it!

My Blogging in Review in 2013

2013 was my second year in blogging. I published 40 posts in 2013 (for a total of 72), and my cumulative site views were just a few shy of a 21,000 for the year. I only recieved 7,200 site views in 2012, so the blog has seen a fair bit of growth. The below chart aggregates the site views per month since the beginning (in December 2011) until December 2013. December has been a bit of a dip with only around an average of 60 views per day, but I was up to an average of 78 and 75 views per day in October and November respectively.

The large uptick in March was due to the Junk Charts Challenge being mentioned by Kaiser Fung. I got over 500 site views that day, and have totalled 765 referrals from the JunkCharts domain. This is pretty similar to the bursty behavior I noted on the CV blog, and that one good tweet or mention by a prominent figure will boost visibility by a large margin.

Most of the regular traffic though comes from generic internet searches, mainly for SPSS related material. A few of my earlier posts of Comparing continuous distributions of unequal size groups in SPSS (2,468 total views), Hacking the default SPSS chart template (2,237), and Avoid Dynamite Plots! Visualizing dot plots with super-imposed confidence intervals in SPSS and R (1,542) are some of my most popular posts. The Junk Charts Challenge post has a total of 1,804 views, but it seems to me that it was more of a flood initially and then a trickle as oppossed to the steady views the other posts bring.

Last year I said I would blog about a few topics and failed to write a post about any of them, so I won’t do that again this year. I will however state that I am currently on the job market, as I recently defended my prospectus. If you are aware of a job opportunity you think I would be interested in, or would like to talk to me about a consulting project feel free to send me an email (you can see my CV for my qualifications and brief discussion of past and current consulting services I have provided).

Some sites give advice about maintaining a blog and attracting visitors (such as writing posts so often). My advice is to write quality material, and the rest is just icing on the cake. Hopefully I have more cake for you in the near future.

A Festschrift (blog post) for Lord, his paradox and Novick’s prediction

Lord’s paradox is a situation in which analyzing change scores between two time points results in different treatment effect estimates than analyzing the treatment effect of the second time point conditional on the first time point. In terms of regression equations we have the following as the change score model:

Y_2 - Y_1 = \beta_a \cdot T

And the following as the conditional model:

Y_2 = \beta_b \cdot T + \gamma \cdot Y_1

Lord’s paradox is the fact that \beta_a and \beta_b won’t always be the same. I won’t go into too many details on why that is the case, and I would suggest the reader to review Allison (1990) and Holland and Rubin (1983) for some treatments of the problem. The traditional motivation for the change score model (which is pretty similar to fixed effects in panel regressions) is to account for any time invariant omitted variables that may be correlated with a unit being exposed to the treatment.

So lets say that we have an equation predicting Y_2

Y_2 = \beta \cdot T + \delta \cdot X

Lets also say that we cannot observe X, we know that it is correlated with T, but that X does not vary in time. For an example lets say that the treatment is a diet regimen for freshman college students and the outcome of interest is body fat content, and if they sign up they get discounts on specific cafeteria meals. Students voluntarily sign up to take the treatment though, so one may think that certain student characteristics (like being in better shape or have more self control with eating) are correlated with selecting to sign up for the diet. So how can we account for those pre-treatment characteristics that are likely correlated with selection into the treatment?

If we happen to have pre-treatment measures of Y, we can see that:

Y_1 = \delta \cdot X

And so we can subtract the latter equation from the former to cancel out the omitted variable effect:

Y_2 - Y_1 = \beta \cdot T + \delta \cdot X - \delta \cdot X = \beta \cdot T

Now, a frequent critique of the change score model is that it assumes that the autoregressive effect of the baseline score on the post score is 1. See Frank Harrell’s comment on this answer on the Cross Validated site (also see my answer to that question as to why change scores that include the baseline on the right hand side don’t make sense). Holland and Rubin (1983) make the same assertion. To make it clear, these critiques say that change scores are only justified when in the below equation \rho is equal to 1.

Y_2 = \beta \cdot T + \delta \cdot X + \rho \cdot Y_1

This caused me some angst though. As you can see in my original formulation there is no \rho \cdot Y_1 term at all, so it would seem that if anything I assume it is 0. But it seems that my description of time constant ommitted variables is making the same presumption. To show this lets go back one further step in time:

Y_0 = \delta \cdot X

We can see that we could just replace \delta \cdot X with the lagged value. Substituting this into the equation predicting Y_1 we would then have.

Y_1 = \rho \cdot Y_0 = Y_0

Which is the same as saying \rho=1. So my angst is resolved and Frank Harrell, Don Rubin and Paul Holland are correct in their assertions and doubting such a group of individuals surely makes me crazy! This does bring other questions though as to when the change score model is appropriate. Obviously our models are never entirely correct, and the presumption of \rho = 1 is on its face ridiculous in most situations. It is akin to saying the outcome is a random walk that is only guided by various exogenous shocks.

As always, the model one chooses should be balanced against alternatives in an attempt to reduce bias in the effect estimates we are interested in. When the unobserved and omitted X is potentially very large and have a strong correlation with being given the treatment, it seems the change score model should be preferred. I presume someone smarter than me can give better quantitative estimates as to when the bias of assuming \rho=1 is a better choice than making the assumption of no other unobserved time invariant omitted variables.


I end this post on a tangent. I recently revisited the material as I wanted to read Holland and Rubin (1983) which is a chapter in the reader Principals of moderns Psychological Measurement: A Festschrift for Frederic M. Lord. I also saw in that same reader a chapter by Melvin Novick, The centrality of Lord’s paradox and exchangeability for all statistical inference. At the end he was pretty daring in making some predictions for the state of statistics as of November 12, 2012 – so I am a year late with my Festschrift but they are still interesting fodder none-the-less. I’ll leave the reader to judge the extent Novick was correct in his following predictions:

  1. be less dependent on constricting models such as the normal and will primarily use more general classes of distributions, for example, the exponential power distribution;
  2. be fully Bayesian with full emphasis on the psychometric assessment of proper prior distributions;
  3. be fully decision theoretic with emphasis on the pyschometric assessment of individual and institutional utilities;
  4. use robust classes of prior distributions and utility functions as well as robust model distributions;
  5. rely completely on full-rank Bayesian univariate and multivariate analyses of variance and covariance using fully exchangeable, informative prior distributions as appropriate;
  6. emphasize exchangeability through careful modeling, blocking, and covariation with randomization playing a residual role;
  7. emphasize the use of posterior predictive distributions using the lessons of Lord’s paradox, exchangeability, and appropriate conditional probabilities;
  8. place great emphasis on numerical solutions when exact Bayesian solutions prove intractable;
  9. still use some pseudo Bayesian methods when both theoretical and computational fully Bayesian solutions remain intractable. (This prevision is subject to modification if I can convince Rubin, Holland and their associates to devote their impressive skills to the quest for fully Bayesian solutions. Should this happen, there may be no need for any pseudo Bayesian methods.)

Citations

  • Allison, Paul. 1990. Change scores as dependent variables in regression analysis. Sociological methodology 20: 93-114.
  • Holland, Paul & Donald Rubin. 1983. On Lord’s Paradox. In Principles of modern psychological measurement: A festchrift for Frederic M. Lord edited by Wainer, Howard & Samuel Messick pgs:3-25. Lawrence Erlbaum Associates. Hillsdale, NJ.
  • Novick, Melvin. 1983. The centrality of Lord’s paradox and exchangeability for all statistical inference. In Principles of modern psychological measurement: A festchrift for Frederic M. Lord edited by Wainer, Howard & Samuel Messick pgs:3-25. Lawrence Erlbaum Associates. Hillsdale, NJ.
  • Wainer, Howard & Samuel Messick. 1983. Principles of modern psychological measurement: A festchrift for Frederic M. Lord. Lawrence Erlbaum Associates. Hillsdale, NJ.

A Comment on Data visualization in Sociology

Kieran Healy and James Moody recently posted a pre-print, Data visualization in sociology that is to appear in a forthcoming issue of the Annual Review of Sociology. I saw awhile ago on Kieran’s blog that he was planning on releasing a paper on data viz. in sociology and was looking forward to it. As I’m interested in data visualization as well, I’m glad to see the topic gain exposure in such a venue. After going through the paper I have several comments and critiques (all purely of my own opinion). Take it with a grain of salt, I’m neither a sociologist nor a tenured professor at Duke! (I will refer to the paper as HM from here on.)

Sociological Lags

The first section, Sociological Lags, is intended to set the stage for the contemporary use of graphics in sociology. I find both the historical review lacking in defining what is included, and the segway into contemporary use uncompelling. I break the comment in this section into three sections of my own making, the historical review, the current state of affairs & reasoning for the current state of affairs.

Historical Review of What Exactly?

It is difficult to define the scope when reviewing historical applications of visualization on sociology; both because sociology can be immensely broad (near anything to do with human behavior) and defining what counts as a visual graph is difficult. As such it ends up being a bit of a fools errand to go back to the first uses of graphics in sociology. Often Playfair is considered the first champion of graphing numeric summaries (although he has an obvious focus on macro-economics), but if one considers maps as graphics people have made those before we had words and numbers. Tufte often makes little distinction between any form of writing or diagram that communicates information, e.g. in-text sparklines, Hobbe’s visual table of contents to Leviathan on the frontpiece, or Galileo’s diagrams of the rings of Saturn. Paul Lewi in Speaking of Graphics describes the quipu as a visual tool used for accounting by the Incas. If one considers ISOTYPE counting picture diagrams as actual graphs, the Mayan hieroglphys for counting values would seemingly count (forgive my ingorance of anthropology, as I’m sure other examples that fit these seemingly simple criteria exist in other and older societies). Tufte gives an example of using mono-spaced type font, which in our base 10 counting system approximates a logarithmic chart if the numbers are right aligned in a column.

10000
 1000
  100
   10
    1

So here we come to a bit of a contradiction, in that one of the main premises of the article is to advocate the use of graphics over tables, but a table can be a statistical graphic under certain definitions. The word semi-graphic is sometimes used to describe these hybrid tables and graphics (Feinberg, 1979).

So this still leaves us the problem of defining the scope for a historical review. Ignoring the fuzzy definition of graphics laid out prior, we may consider the historical review to contain examples of visualizations from sociologists or applications of visualization to sociological inquiry in general. HM cite 6 examples at the top of page 4, which are a few examples of visualizations from sociologists (pictures please of the examples!) I would have liked to seen the work of Aldophe Quetelet mentioned as a popular historical figure in sociology, and his maps of reported crimes would be considered an example of earlier precedence for data visualization. (I distinctly remember a map at the end of Quetelet (1984) and it appears he used graphs in other works, so I suspect he has other examples I am not familiar with.)

Other popular historical figures that aren’t sociologists but analyzed sociological relevant data were geographers such as Booth’s poverty maps, Minard’s migration flows (Friendly, 2002a), and Guerry/Balbi & Fletcher’s maps of moral statistics (Cook & Wainer, 2012; Friendly, 2007). Other popular champions in public health are John Snow and Florence Nightingale (Brasseur, 2005). I would have liked HM to either had a broader review of historical visualization work pertinent to sociologists, or (IMO even better) more clearly defined the scope of the historical examples they talk about. Friendly (2008) is one of the more recent reviews that I know of talking about the historical examples, so I’d prefer just referring the reader there and then having a more pertinent review of applications within sociology. Although no bright line exists for who is a sociologist, this seems to me to be an easier way to limit the scope of a historical review. Perhaps restrict the discussion to its use in major journals or well known texts (especially methodology oriented ones). The lack of graphs in certain influential works is perhaps more telling than their inclusion. I don’t think Weber or Durkheim used any graphs or maps that I remember.

The Current State of Affairs

The historical discussion is pertinent as a build up to the current state of affairs, but the segway between examples in the early 1900’s and contemporary research is difficult. The fact that the work of Calvin Schmid isn’t mentioned anywhere in the paper is remarkable (or perhaps just indicative of his lack of influence – he still should be mentioned though!) Uses of mapping crime seems to be a strong counter-example of the lack of graphs in either olden or modern sociology; Shaw and McKay’s Juvenile Delinquency in Urban Areas has many exemplary maps and statistical graphics. Sampson’s Great American City follows suit with a great many scatterplots and maps. (Given my background I am prejudiced to be more familiar with applications in criminology.)

So currently in the article we go from a few examples of work in sociology around the turn of the 20th century to the lack of use of graphics in sociology compared to the hard sciences. This is well known, and Cleveland (1984) should have been cited in addition to the anecdotal examples of comparison to articles in Science. (And is noted later on that this in and of itself is a poor motivation for the use of more graphs.) What is potentially more interesting (and pertinent to sociologists) is the variation within sociology itself over time or between different journals in sociology. For instance; is more page space devoted to figures now that one does not need to draw your own graphs by hand compared to say in the 1980s? Do journals such as Sociological Methodology and Sociological Methods & Research have clearly more examples of using graphics compared to ASR or AJS? Has the use of graphics laggard behind the adoption of quantitative analysis in the field? Answering these questions IMO provides a better sense of the context of the contemporary use of graphs in sociology than simply repeating old hat comparing to other fields. It also provides a segway between the historical and contemporary use of graphics within sociology. Indeed it is "easy to find" examples of graphs in contemporary journal articles in sociology, and I would prefer to have some sort of quantitative estimate of the prevalence of graphics within sociological articles over time. This also allows investigation of the cultural impediment hypothesis in this section versus the technical impediment discussions later on. HM also present some hypotheses about the adoption of quantitative modelling in sociology in reference to other social science fields that would lend itself to empirical verification of this kind.

Reasoning For The Current State of Affairs

I find the sentences by HM peculiar:

But, somewhere along the line, sociology became a field where sophisticated statistical models were almost invariably represented by dense tables of variables along rows and model numbers along columns. Though they may signal scientific rigor, such tables can easily be substantively indecipherable to most readers, and perhaps even at times to authors. The reasons for this are beyond the scope of this review, although several possibly complementary hypotheses suggest themselves.

And can be interpreted in two different ways given the context. It can be interpreted as posing the question why are graphics preferable to tables, or why are graphics in relative disuse in sociology. For either I disagree it is outside the scope of the article!

If interpreted in the latter (why disuse of graphics in sociology) HM don’t follow their own advice, and give an excellent discussion of the warnings of graphics provided by Keynes. This section is my favorite in the paper, and I would have liked to see discussion on either training curricula in sociology or discussion of the role in graphs of popular sociological methodology text books. Again, I don’t believe Durkheim or Weber use graphs at all (although I provide other examples of prior scholars they have been exposed to did). Fisher has a chapter on graphs, so the concept isn’t foreign, and obviously the use of ANOVA and regression was adopted within sociology with open arms – why not graphical methods? Why is Schmid (1954) seemingly ignored? The discussion of Keynes is fascinating, but did Keynes really have that much influence on the behavior of sociologists? (I’m reminded of this great quote on the CV site on Wooldrige, 776 pages without a single graph!) This still doesn’t satisfactorily (to me) explain the current state of affairs. For instance a great counter example is Yule (1926); which was a pretty popular paper that used (22!) graphs to explain the behavior of time series. We are left to speculation about historical inertia of an economist as the reasoning for lack of discourse and use of graphics in contemporary sociological publications. I enjoyed the speculation, but I am unconvinced. Again having estimates of the proportion of page space devoted to graphs in sociology over time (and/or in comparison to the other social science fields mentioned) would lend credence to the hypotheses about cultural and technological impediments.

If you interpret the quoted sentence as posing the question why are graphics preferable to tables, then that seems to be crucial discussion to motivate the article to begin with, and I will argue is on topic in the authors next section, visualization in principle. HM miss a good opportunity to relate the quote of Keynes to when we want to use graphs (making relative comparisons) versus tables (which are best when we are just looking up one value, but we are not often interested in just looking up one value!)

Visualization in Principle

The latter sections of the paper, visualization in principle and practice, I find much more reasonable as reviews and well organized into sections (albeit the scope of the sections are still ill-defined). Most of my dissapointments from here on are ommissions that I feel are important to the discussion.

I was disappointed in this particular section of the article, as I believed it would have been a great opportunity to introduce concepts of graphical perception (e.g. Cleveland’s hierarchy), connect the piece to more contemporary work examining graphical perception, and even potentially provide more actionable advice about improving graphics for publication.

This section begins with a hod-podge list of popular viz. books. I was surprised to see Wilkinson’s Grammar of Graphics mentioned, and I was surprised to see Stephen Kosslyn’s books (which are more how to cook book like) and Calvin Schmid ommitted. IMO MacEachren’s How Maps Work should also be included on the shelf, but I’m not sure if that or any of those listed can be objectively defined as influential. Influential is a seemingly ill-defined criteria, and I would have liked to seen citation counts for any of these in sociology journals (I would bet they are all minimal). I presume this is possible as Neal Caren’s posts on scatterplot or Kieran’s philosophy citation network are examples using synonmous data. I find the list strange in that the books are very different in content (see Kosslyn (1985) for a review of several of them), and a reader may be misled in that they cover redundant material. The next part then discusses Tufte and his walkthrough of the famous March of Napoleon graphic by Minard and tries to give useful tidbits of advice along the way.

Use of Example Minard Graphic

While I don’t deny that the Minard example is a wonderful, classical example of a great graph, I would prefer to not perpetuate Tufte’s hyperbole as it beeing the best statistical graphic ever drawn. It is certainly an interesting application, but does time-stamped data of the the number of troops exemplify any type of data sociologists work with now? I doubt it. Part of why I’m excited by field specific interest in visualization is because in sociology (and criminology) we deal with data that isn’t satisfactorily discussed in many general treatments of visualization.

A much better example, in this context IMO, would have been Kieran Healy’s blog post on finding Paul Revere. It is as great an example as the Minard march, has direct relevance to sociologists, and doesn’t cover ground previously covered by other authors. It also is a great example of the use of graphical tools for direct analysis of sociological data. It isn’t a graph to present the results of the model, it is a graph to show us the structure of the network in ways that a model never could! Of course not everyone works with social network data either, but if the goal is to simply have a wow thats really cool example to promote the use of graphics, it would have sufficed and been more clearly on topic for sociology.

The use of an exemplar graphic can provide interest to the readers, but it isn’t clear from the onset of this section that this is the reasoning behind presenting the graphic. Napoleon’s March (nor any single graphic) can not be substantive enough fodder for description of guides to making better graphics. The discussion of topics like layering and small multiples are too superficial to constitute actionable advice (layering needs to be shown examples really to show what you are talking about).

If someone were asking me for the simple introduction to the topic, I would introduce Cleveland’s hierarchy JASA paper (Cleveland & McGill, 1984). For making prettier graphs, I would just suggest to The Visual Display of Quantitative Information and Stephen Few’s short papers. Cleveland’s hierarchy should really be mentioned somewhere in the paper.

This section ends with a mix of advice and more generic discussion on graphical methods, framing the discussion in terms of reproducible research.

Reproducible Research

While I totally endorse the encouragement of reproducible research, and I agree the technical programming skills for producing graphs are the same, I’m not sure I see as strong a connection to data visualization. In much of the paper the connections of direct pertinence to sociologists are not explicit, but HM miss a golden opportunity here to make one; data we often use is confidential in nature, providing problems of both sharing code and displaying graphics in publications. Were not graphing scatterplots of biological assays, but of peoples behavior and sensitive information.

IMO a list of criteria to improve the aesthetics of most graphs that are disseminated in social science articles are; print the graph as a high resolution PNG file or a vector file format (much software defaults to JPEG unfortunately), sensible color choices (how is ColorBrewer not mentioned in the article!) or sensible choices for line formats, and understanding how to make elements of interest come to the foreground in the plot (e.g. not too heavy of gridlines, use color selectively to highlight areas of interest – discussed under the guise of layering earlier in the Minard graphic section). This list of simple advice though for aesthetics can be accomplished in any modern statistical software (and has been possible for years now). The section ends up being a mix of advice about aesthetics of plots with advice about how to make complicated plots more understandable (e.g. talk about small multiples). Discussing these concepts is best kept seperated. Although good advice extends to both, a quick and dirty scatterplot doesn’t have the same expectations as a plot in a publication. (The model of clarity residual plots from R would not be so clear if they had 10,000 points although it still may be sufficient for most purposes.)

Visualization in Practice

This section is organized into exploratory data analysis and presenting the results of models. HM touch on pertinent examples to sociologists (mainly large datasets with high variance to signals that are too complicated for bivariate scatterplots to tell the whole story). I enjoyed this section, but will mention additional examples and discussion that IMO should have been included.

EDA

HM make the great point that EDA and confirmatory analysis are never seperate, and that we can use thoughts from EDA to evaluate regression models. This IMO isn’t anything different than what Tukey talks about when he takes out row and column effects, it is just the scale of the number of data points and ability to fit models is far beyond any examples in Tukey’s EDA.

For the EDA section on categorical variables notable omissions are mosaic plots (Friendly, 1994) – which are mentioned but not shown in the pairs plot example and later on page 23 – and techniques for assessing residuals in logistic regression models (Greenhill et al., 2011; Esarey & Pierce, 2012). When discussing parallel coordinate plots mention should also be made of extensions to categorical data (Bendix et al. 2005; Dang et al. 2010; Hofmann & Vendettuoli, 2013). Later on mention is made that mosaic plots one needs to develop gestalt to interpret them (which I guess we are born with the ability to interpret bar graphs and scatterplots?) The same is oft mentioned for interpreting parallel coordinate plots, and anything that is novel will take some training to be able to interpret.

For continous models partial residual plots should be mentioned (Fox, 1991), and ellipses and caterpillar plots should be mentioned in relation to multi-level modelling (Afshartous & Wolf, 2007; Friendly et al. 2013; Loy & Hofmann 2013) as well as corrgrams for simple data reduction in SPLOMS (Friendly, 2002b). From the generic discussion of Bourdieu it sounds like they are discussing bi-plots or some sort of similar data reduction technique.

I should note that this is just my list of things I think deserve mention for application to sociologists, and the task of what to include is essentially an impossible one. Of course what to include is arbitrary, but I base it mainly on tasks I believe are more common for quantitative sociologists. This is mainly regression analysis and evaluating causal relationships between variables. So although parallel coordinate plots are a great visualization tool for anyone to be familiar with, I doubt it will be as interesting as tools to evaluate the fit or failure of regression models. HM mention that PCP plots are good for identifying clusters and outliers (they aren’t so good for evaluating correlations). Another example (that HM don’t mention) would be treemaps. It is an interesting tool to visualize hierarchical data, but I agree that it shouldn’t be included in this paper as such hierarchical data in sociology is rare (or at least many applications of interest aren’t immediately obvious to me). The Buja et al. paper on using graphs for inference is really neat idea, (and I’m glad to see it mentioned) but I’m more on the fence as to whether it should have taken precedence to some of the other visualizations ideas I mention.

EDA for examining the original data and for examining model residuals are perhaps different enough that it should have its own specific section (although HM did have a nice discussion of this distinction). Examining model residuals is always preached but is seemingly rarely performed. More examples for the parade of horribles I would enjoy (Figure 1 presents a great one – maybe find one for a more complicated model would be good – I know of a few using UCR county data in criminology but none off-hand in sociology). The quote from Gelman (2004) is strange. Gelman’s suggestions for posterior simulations to check the model fit can be done with frequentist models, see King et al. (2000) for basically the same advice (although the citation is at an appropriate place). Also King promotes these as effective displays to understand models, so they are not just EDA then but reasonable tools to summarize models in publications (similar to effect plots later shown).

Presenting model results

The first two examples in this section (Figure 7 histogram and Figure 8 which I’m not sure what it is displaying) seem to me that they should be in the EDA section (or are not cleary defined as presenting results versus EDA graphics). Turning tables into graphs (Cook & Teo, 2011; Gelman et al. 2002; Feinberg & Wainer, 2011; Friendly & Kwan, 2011; Kastellec & Leoni, 2007) is a large omission in this section. Since HM think that tables are used sufficiently, this would have been a great opportunity to show how some tables in an article are better presented as graphs and make more explicit how graphs are better at displaying regression results in some circumstances (Soyer & Hogarth, 2012). It also would have presented an opportunity to make the connection that data visualization can even help guide how to make tables (see my blog post Some notes on making effective tables and Feinberg & Wainer (2011)).

Examples of effects graphs are great. I would have liked mentions of standardizing coefficients to be on similar or more interpretable scales (Gelman & Pardoe, 2007; Gelman, 2008) and effectively displaying uncertainty in the estimates (see my blog post and citations Viz. weighted regression in SPSS and some discussion).

Handcock & Morris (1999) author names are listed in the obverse order in the bibliography in case anyone is looking for it like I was!

In Summary and Moving Forward

As oppossed to ending with a critique of the discussion, I will simply use this as a platform to discuss things I feel are important for moving forward the field of data visualization within sociology (and more broadly the social sciences). First, things I would like to see in the social sciences moving forward are;

  • More emphasis on the technical programming skills necessary to make quality graphs.
  • Encourage journals to use appropriate graphical methods to convey results.
  • Application of the study of data viz. methods as a study worthy in sociology unto itself.

The first suggestion, emphasis on technical programming skills, is in line with the push towards reproducible research. I would hope programs are teaching the statistical computing skills necessary to be an applied quantitative sociologist, and teaching graphical methods should be part and parcel. The second suggestion, encourage journals to use appropriate graphical methods, I doubt is objectionable to most contemporary journals. But I doubt reviewers regularly request graphs instead of tables even where appropriate. It is both necessary for people to submit graphs in their articles and for reviewers to suggest graphs (and journals to implement and enforce guidelines) to increase usage in the field.

When use of graphs becomes more regular and widespread in journal articles, I presume the actual discussion and novel applications will become more regular within sociological journals as well. James Moody is a notable exception with some of his work on networks, and I hope more sociologists are motivated to develop tools unique to their situation and test the efficacy of particular displays. Sociologists have some unique circumstances (spatial and network data, mostly categorical dimensions, low signal/high variance) that call for not just transporting ideas from other fields, but attention and development within sociology itself.


Citations

  • Afshartous, D. and Wolf, M. (2007). Avoiding ‘data snooping’ in multilevel and mixed effects models. Journal of the Royal Statistical Society: Series A (Statistics in Society), 170(4):1035-1059.
  • Bendix, F., Kosara, R., and Hauser, H. (2005). Parallel sets: visual analysis of categorical data. In IEEE Symposium on Information Visualization, 2005. INFOVIS 2005., pages 133-140. IEEE.
  • Brasseur, L. (2005). Florence nightingale’s visual rhetoric in the rose diagrams. Technical Communication Quarterly, 14(2):161-182.
  • Cleveland, W. S. and McGill, R. (1984). Graphical perception: Theory, experimentation, and application to the development of graphical methods. Journal of the American Statistical Association, 79(387):531-554.
  • Cleveland, W. S. (1984). Graphs in scientific publications. The American Statistician, 38(4):261-269.
  • Cook, Alex R. & Shanice W. Teo. (2011) The communicability of graphical alternatives to tabular displays of statistical simulation studies. PLoS ONE 6(11): e27974.
  • Cook, R. and Wainer, H. (2012). A century and a half of moral statistics in the united kingdom: Variations on joseph fletcher’s thematic maps. Significance, 9(3):31-36.
  • Dang, T. N., Wilkinson, L., and Anand, A. (2010). Stacking graphic elements to avoid Over-Plotting. IEEE Transactions on Visualization and Computer Graphics, 16(6):1044-1052.
  • Esarey, Justin & Andrew Pierce. 2012. Assessing fit quality and testing for misspecification in binary-dependent variable models. Political Analysis 20(4): 480-500. Preprint PDF Here
  • Feinberg, R. A. and Wainer, H. (2011). Extracting sunbeams from cucumbers. Journal of Computational and Graphical Statistics, 20(4):793-810.
  • Fienberg, S. E. (1979). Graphical methods in statistics. The American Statistician, 33(4):165-178.
  • Fox, J. (1991). Regression diagnostics. Number no. 79 in Quantitative applications in the social sciences. Sage.
  • Friendly, M. (1994). Mosaic displays for Multi-Way contingency tables. Journal of the American Statistical Association, 89(425):190-200.
  • Friendly, M. (2002a). Visions and Re-Visions of charles joseph minard. Journal of Educational and Behavioral Statistics, 27(1):31-51.
  • Friendly, M. (2002b). Corrgrams: Exploratory displays for correlation matrices. The American Statistician, 56(4):316-324.
  • Friendly, M. (2007). A.-M. guerry’s moral statistics of france: Challenges for multivariable spatial analysis. Statistical Science, 22(3):368-399.
  • Friendly, M. (2008). The golden age of statistical graphics. Statistical Science, 23(4):502-535.
  • Friendly, M., Monette, G., and Fox, J. (2013). Elliptical insights: Understanding statistical methods through elliptical geometry. Statistical Science, 28(1):1-39.
  • Friendly, Michael & Ernest Kwan. 2011. Comment on Why tables are really much better than graphs. Journal of Computational and Graphical Statistics 20(1): 18-27.
  • Gelman, A. (2008). Scaling regression inputs by dividing by two standard deviations. Statistics in medicine, 27(15):2865-2873.
  • Gelman, A. and Pardoe, I. (2007). Average predictive comparisons for models with nonlinearity, interactions, and variance components. Sociological Methodology, 37(1):23-51.
  • Gelman, Andrew, Cristian Pasarica & Rahul Dodhia (2002). Let’s practice what we preach. The American Statistician 56(2):121-130.
  • Greenhill, Brian, Michael D. Ward & Audrey Sacks. 2011. The separation plot: A new visual method for evaluating the fit of binary models. American Journal of Political Science 55(4):991-1002.
  • Hofmann, H. and Vendettuoli, M. (2013). Common angle plots as Perception-True visualizations of categorical associations. Visualization and Computer Graphics, IEEE Transactions on, 19(12):2297-2305.
  • Kastellec, Jonathan P. & Eduardo Leoni. (2007). Using graphs instead of tables in political science. Perspectives on Politics 5(4):755-771.
  • King, G., Tomz, M., and Wittenberg, J. (2000). Making the most of statistical analyses: Improving interpretation and presentation. American Journal of Political Science, 44(2):347-361.
  • Kosslyn, S. M. (1985). Graphics and human information processing: A review of five books. Journal of the American Statistical Association, 80(391):499-512.
  • Kosslyn, S. M. (1994). Elements of graph design. WH Freeman New York.
  • Lewi, P. J. (2006). Speaking of graphics. http://www.datascope.be/sog.htm
  • Loy, A. and Hofmann, H. (2013). Diagnostic tools for hierarchical linear models. Wiley Interdisciplinary Reviews: Computational Statistics, 5(1):48-61.
  • MacEachren, A. M. (2004). How maps work: representation, visualization, and design. Guilford Press.
  • Sampson, R. J. (2012). Great American city: Chicago and the enduring neighborhood effect. University of Chicago Press.
  • Schmid, C. F. (1954). Handbook of graphic presentation. Ronald Press Company. http://archive.org/details/HandbookOfGraphicPresentation
  • Shaw, C. R. and McKay, H. D. (1972). Juvenile delinquency and urban areas: a study of rates of delinquency in relation to differential characteristics of local communities in American cities. A Phoenix Book. University of Chicago Press.
  • Soyer, E. and Hogarth, R. M. (2012). The illusion of predictability: How regression statistics mislead experts. International Journal of Forecasting, 28(3):695-711.
  • Tufte, E. R. (1983). The visual display of quantitative information. Graphics Press.
  • Quetelet, A. (1984). Adolphe Quetelet’s Research on the propensity for crime at different ages. Criminal justice studies. Anderson Pub. Co.
  • Yule, G. U. (1926). Why do we sometimes get Nonsense-Correlations between Time-Series?-a study in sampling and the nature of Time-Series. Journal of the Royal Statistical Society, 89(1):1-63.