All posts by Andy Wheeler

Presentation at ASC – November, 2012

At the American Society of Criminology conference in Chicago in a few weeks I will be presenting (I can’t link to the actual presentation it appears, but you can search the program for Wheeler and my session will come up). Don’t take this as a final product, but I figured I would put out there the working paper/chapters of my dissertation that are the motivation for my presentation and my current set of slides.

Here is my original abstract I submitted a few months ago, The title of the talk is The Measurement of Small Place Correlates of Crime;

This presentation addresses several problems related with attempting to identify correlates of crime at small units of analysis, such as street segments. In particular the presentation will focus on articulating what we can potentially learn from smaller units of analysis compared to larger aggregations, and relating a variety of different measures of the built environment and demographic characteristics of places to theoretical constructs of interest to crime at places. Preliminary results examining the discriminant and convergent validity of theoretical constructs pertinent to theories for the causes of crime using data from Washington, D.C. will be presented.

This was certainly an over-ambitious abstract (I was still in the process of writing my prospectus when I submitted it). The bulk of the talk will be focused on “What we can learn from small units of analysis?”, and as of now after that as time allows I will present some illustrations of the change of support problem. Sorry to dissapoint, but nothing about convergent or divergent validity of spatial constructs will be presented (I have done no work of interest yet, and I don’t think I would have time to present any findings in anymore than a superficial manner anyway).

Here is the working paper, based on two chapters of my dissertation.
Here is a copy of the presentation and here is the same copy with my notes attached.

Note don’t be scared off by how dull the working paper is, the presentation will certainly be more visual and less mathematical (I will need to update my dissertation to incorporate some more graphical presentations).

Maps and graphis at the end of the talk demonstrating the change of support problem are still in the works (and I will continue to update the presentation on here). Here is a preview though of the first map I made that demonstrates how D.C. disseminates geo-date aggregated and snapped to street segments, making it problematic to mash up with census data.

The presentation time is on Friday at 9:30, and I’m excited to see the other presentations as well. It looks like to me that Pizarro et al.’s related research was recently published in Justice Quarterly, so if you don’t care for my presentation come to see the other presenters!

Leave a comment

by Andy Wheeler on November 4, 2012 • Permalink

Posted in Mapping, Papers, scholarly

Tagged mapping, paper, presentation, scholarly

Posted by Andy Wheeler on November 4, 2012

https://andrewpwheeler.com/2012/11/04/presentation-at-asc-november-2012/

JQC paper on the moving home effect finally published!

My first solo publication, The moving home effect: A quasi experiment assessing effect of home location on the offence location, after being in the online first que nearly a year, has finally been published in the Journal of Quantitative Criminology 28(4):587-606. It was the oldest paper in the online first section (along with the paper by Light and Harris published on the same day)!

This paper was the fruits of what was basically the equivalent of my Masters thesis, and I would like to take this opportunity to thank all of the individuals whom helped me with the project, as I accidently ommitted such thanks from the paper (entirely my own fault). I would like to thank my committee members, Rob Worden, Shawn Bushway, and Janet Stamatel. I would also like to thank Robert Apel and Greg Pogarsky for useful feedback I had recieved on in class papers based on the same topic, as well as the folks in the Worden meeting group (for not only feedback but giving me motivation to do work so I had something to say!)

Rob Worden was the chair of my committee, and he deserves extra thanks not only for reviewing my work, but also for giving me a job at the Finn Institute, which otherwise I would have never had access to such data and opportunity to conduct such a project. I would also like to give thanks to the Syracuse PD and Chief Fowler for letting me use the data and reveal the PD’s identity in the publication.

I would also like to thank Alex Piquero and Cathy Widom for letting me make multiple revisions and accepting the paper for publication. For the publication itself I recieved three very excellent and thoughtful peer reviews. The excellence of the reviews were well above the norm for feedback I have otherwise encountered, and demonstrated that the reviewers not only read the paper but read it carefully. I was really happy with the improvements as well as how fair and thoughtful the reviews were. I am also very happy it was accepted for publication in JQC, it is the highest quality venue I would expect the paper to be on topic at, and if it wasn’t accepted there I was really not sure where I would send it otherwise.

In the future I will publish pre-prints online, so the publication before editing can still be publicly available to everyone. But, if you can not get a copy of this (or any of the other papers I have co-authored so far) don’t hesitate to shoot me an email for a copy of the off-print. Hopefully I have some more work to share in the new future on the blog! I currently have two papers I am working on with related topics, one with visualizing journey to crime flow data, and another paper with Emily Owens and Matthew Feedman of Cornell comparing journey to work data with journey to crime data.

For a teaser for this paper here is the structured abstract from the paper and a graph demonstrating my estimated moving home effect.

Objectives
This study aims to test whether the home location has a causal effect on the crime location. To accomplish this the study capitalizes on the natural experiment that occurs when offender’s move, and uses a unique metric, the distance between sequential offenses, to determine if when an offender moves the offense location changes.

Methods
Using a sample of over 40,000 custodial arrests from Syracuse, NY between 2003 and 2008, this quasi-experimental design uses t test’s of mean differences, and fixed effects regression modeling to determine if moving has a significant effect on the distance between sequential offenses.

Results
This study finds that when offenders move they tend to commit crimes in locations farther away from past offences than would be expected without moving. The effect is rather small though, both in absolute terms (an elasticity coefficient of 0.02), and in relation to the effect of other independent variables (such as the time in between offenses).

Conclusions
This finding suggests that the home has an impact on where an offender will choose to commit a crime, independent of offence, neighborhood, or offender characteristics. The effect is small though, suggesting other factors may play a larger role in influencing where offenders choose to commit crime.

Leave a comment

by Andy Wheeler on November 2, 2012 • Permalink

Posted in Crime Mapping, Mapping, Papers, scholarly

Tagged crime-mapping, data visualization, mapping, paper, scholarly

Posted by Andy Wheeler on November 2, 2012

https://andrewpwheeler.com/2012/11/02/jqc-paper-on-the-moving-home-effect-finally-published/

Using Bezier curves to draw flow lines

As I talked about previously, great circle lines are an effective way to visualize flow lines, as the bending of the arcs creates displacement among over-plotted lines. A frequent question that comes up though (see an example on GIS.stackexchange and on the flowing data forums) is that great circle lines don’t provide enough bend over short distances. Of course for visualizing journey to crime data (one of the topics I am interested in), one has the problem that most known journey’s are within one particular jurisdiction or otherwise short distances.

In the GIS question I linked to above I suggested to utilize half circles, although that seemed like over-kill. I have currently settled on drawing an arcing line utilizing quadratic Bezier curves. For a thorough demonstration of Bezier curves, how to calculate them, and to see one of the coolest interactive websites I have ever come across, check out A primer on Bezier curves – by Mike "Pomax" Kamermans. These are flexible enough to produce any desired amount of bend (and are simple enough for me to be able to program!) Also I think they are more aesthetically pleasing than irregular flows. I’ve seen some programs use hook like bends (see an example of this flow mapping software from the Spatial Data Mining and Visual Analytics Lab), but I’m not all that fond of that for either aesthetic reasons or for aiding the visualization.

I won’t go into too great of details here on how to calculate them, (you can see the formulas for the quadratic equations from the Mike Kamermans site I referenced), but you basically, 1) define where the control point is located at (origin and destination are already defined), 2) interpolate an arbitrary number of points along the line. My SPSS macro is set to 100, but this can be made either bigger or smaller (or conditional on other factors as well).

Below is an example diagram I produced to demonstrate quadratic Bezier curves. For my application, I suggest placing a control point perpindicular to the mid point between the origin and destination. This creates a regular arc between the two locations, and conditional on the direction one can control the direction of the arc. In the SPSS function provided the user then provides a value of a ratio of the height of the control point to the distance between the origin and destination location (so points further away from each other will be given higher arcs). Below is a diagram using Latex and the Tikz library (which has a handy function to calulate Bezier curves).

Here is a simpler demonstration of the controlling the direction and adjusting the control point to produce either a flatter arc or an arc with more eccentricity.

Here is an example displaying 200 JTC lines from the simulated data that comes with the CrimeStat program. The first image are the original straight lines, and the second image are the curved lines using a control point at a height half the distance between the origin and destination coordinate.

Of course, both are most definately still quite crowded, but what do people think? Are my curved lines suggestion benificial in this example?

Here I have provided the SPSS function (and some example data) used to calculate the lines (I then use the ET Geowizards add-on to turn the points into lines in ArcGIS). Perhaps in the future I will work on an R function to calculate Bezier curves (I’m sure they could be of some use), but hopefully for those interested this is simple enough to program your own function in whatever language of interest. I have the starting of a working paper on visualizing flow lines, and I plan on this being basically my only unique contribution (everything else is just a review of what other people have done!)

One could be more fancy as well, and make the curves different based on other factors. For instance make the control point closer to either the origin or destination is the flow amount is assymetrical, or make the control point further away (and subsequently make the arc larger) is the flow is more volumous. Ideas for the future I suppose.

Leave a comment

by Andy Wheeler on October 16, 2012 • Permalink

Posted in Crime Mapping, Data Visualization, SPSS

Tagged cartography, data visualization, flow-data, mapping, SPSS

Posted by Andy Wheeler on October 16, 2012

https://andrewpwheeler.com/2012/10/16/using-bezier-curves-to-draw-flow-lines/

CJ blog watch! Any ones I’m missing?

I follow alot of blogs. Although I don’t personally write alot about criminology or criminal justice related matters (maybe in the future when I have more time or inclination), but I figured I would share some of my favorites and query the crowd for more recommendations.

So a few with general discussion related to criminology and criminal justice matters are;

James Alan Fox’s Crime and Punishment opinion pieces on boston.com.
Public Criminology, a blog by Chris Uggen, Michelle Inderbitzin, and Sara Wakefield.

Both sites are well known criminologists/criminal justicians. I am aware of a few blogs written by current/former police chiefs;

Tom Casady’s The Director’s Desk. Tom Casady is currently the director of public safety for Lincoln, Nebraska and was previously the Police Chief at Lincoln’s department for quite some time. Tom is also very active in a variety of criminology/criminal justice organizations (so if you go to a related conference there is a good chance he is around somewhere!)
Chief’s Blog by Chief Ramsay of the Duluth Police Dept in Minnesota.

There are also a few that are highly focused on crime mapping & analysis;

Location Based Policing by Drew Dasher. He is a crime analyst for the Lincoln Nebraska PD.
Saferview – crime, fear and mapping: A blog by a retired police officer who is a student at University College London.
Diego Valle-Jones: Although his blog has a wider variety of topics, he has a series of very detailed posts and analysis on violence in Mexico and central american nations. I know crime stats are frequent fodder for generic statistical demonstrations, but this is real insightful analysis. My favorite is his investigation into the validity of homicide data statistics.

Are there others I am missing out on or should know about? Let me know in the comments if you have other suggestions.

FYI – the title of the blog post was motivated by Hans Toch’s new book, Cop Watch.

Leave a comment

by Andy Wheeler on September 28, 2012 • Permalink

Posted in scholarly, social networking

Tagged resources, scholarly, social-networking

Posted by Andy Wheeler on September 28, 2012

https://andrewpwheeler.com/2012/09/28/cj-blog-watch-any-ones-im-missing/

Making value by alpha maps with ArcMap

I recently finished reading Cynthia Brewer’s Designing better maps: A guide for GIS users. Within the book she had an example of making a bi-variate map legend manually in ArcMap, and then the light-bulb went off in my mind that I could use that same technique to make value by alpha maps in ArcMap.

For a brief intro into what value by alpha maps are, Andy Woodruff (one of the creators) has a comprehensive blog post on them on what they are and their motivation. Briefly though, we want to visualize some variable in a choropleth map, but that variable is measured with varying levels of reliability. Value by alpha maps de-emphasize areas of low reliability in the choropleth values by increasing the transparency of that polygon. I give a few other examples of interest related to mapping reliability in this answer on the GIS site as well, How is margin of error reported on a map?. Essentially those techniques mentioned either only display certain high reliability locations, make two maps, or use technqiues to overlay multiple attributes (like hashings). But IMO the value by alpha maps looks much nicer than the maps with multiple elements, and so I was interested in how to implement them in ArcMap.

What value by alpha maps effectively do is reduce the saturation and contrast of polygons with high alpha blending, making them fade into the background and be less noticable. I presented an applied example of value by alpha maps in my question asking for examples of beautiful maps on the GIS site. You can click through to see further citations for the map and reasons for why I think the map is beautiful. But below I include an image here as well (taken from the same Andy Woodruff blog post mentioned earlier).

Here I will show to make the same maps in ArcMap, and present some discussion about their implementation, in particular suitable choices for the original choropleth colors. Much was already discussed by the value by alpha originators, but I suppose I didn’t really appreciate them until I got my hands alittle dirty and tried to make them myself. Note this question on the GIS site, How to implement value-by-alpha map in GIS? gives other resources for implementing value-by-alpha maps. But as far as I am aware this contribution about how to do them in ArcMap is novel.

Below I present an example displaying the percentage of female heads of households with children (abbreviated PFHH from here on) for 2010 census blocks within Washington, D.C. Here we can consider the reliability of the PFHH dependent on the number of households within the block itself (i.e. we would expect blocks with smaller number of households to have a higher amount of variability in the PFHH). The map below depicts blocks that have at least one household, and so the subsequent PFHH maps will only display those colored polygons (about a third, 2132 out of 6507, have no households).

I chose the example because female headed households are a typical covariate of interest to criminologists for ecological studies. I also chose blocks as they are the smallest unit available from the census, and hence I expected them to show the widest variability in estimates. Below I provide an example on how one might typically display PFHH, while simultaneously incorporating information on the baseline number the maps will be map of.

The first example seperately displays the denominator number of households on the left and the percent of female headed households with children on the right both in a sequential choropleth scheme (darker colors are a higher PFHH and Number of Households).

One can also superimpose the same information on the map. Sun & Wong, 2010 suggest one use cross hatching above the the choropleth colors to depict reliability, but here I will demonstrate using choropleth colors for the baseline number of households and a proportional point symbols for the PFHH. I supplement the map on the right with a scatterplot, that has the number of households on the X axis and the PFHH on the Y axis.

These both do an alright job (if you made me pick one, I think I would pick the side-by-side sets of maps), but lets see if we can do better with value-by-alpha maps! The following tutorial will be broken up into two sections. The first section talks about actually generating the map, and the second section talks about how to make the legend. Neither is difficult, but making the legend is more of a pain in the butt.

How to make the value by alpha map

First one can start out by making the base layer with the desired choropleth classifications and color scheme. Note here I changed what I am visualizing from a sequential color scheme of PFHH to location quotients with only four categories. I will discuss why I did this later on in the post.

Then one can make several copies of that layer (right click -> copy -> paste within hierarchy), based on however many different reliability classifications you want to display. Here I will do 4 different reliability classifications. Note after you make them for management of the TOC it is easier to group them.

Then one uses selection criteria to filter out only those polygons that fall within the specified reliability range. And then sets the transparency for the that level to the desired value.

And voila, you have your value by alpha map. Note if after you make the layers you decide you want a different classification and/or color scheme, you can make the changes to one layer and then apply the changes to all of the other layers.

How to make the legend

Now making the legend is the harder part. If one goes to the layout view, one will see that since in this example one has essentially for layers superimposed on the same map, one has four seperate legend entries. Below is what it looks like with my defaults (plus a vertical rule I have in my map).

What we want in the end is a bivariate scheme, with the PFHH dimension running up and down, and the transparency dimension running from one side to the other (the same as in the example mortality rate map at the beginning of the post). To do this, one has to convert the legends to graphics.

The ungroup the elements so each can be individualy manipulated. Note, sometimes I have to do this operation multiple times.

Then re-arrange the panels and labels into the desired format.

More tedious than making the seperate layers, but not crazy unreasonable if you only have to do it for one (or a small number of maps). If you need to do it for a larger number of maps a better workflow will be needed, like creating a seperate “fake inset” map that replicates the legend, making the legend in a seperate tool, or just making the map entirely in a program where alpha blending is more readily incorporated. For instance in statistical packages it is typically a readily available encoding that can be added to a graphic (they also will allow continous color ramps and continous levels of transparency).

And voila, here is the final map. To follow is some discussion about choosing color schemes and whether you should use a black background or not.

Some discussion about color schemes

The Roth et al. (2010) paper in the cartographic journal and Andy Woodruff’s blog post I cited at the beginning of this post initially talked about color schemes and utilizing a black background, but I didn’t really appreciate the complexity of this choice until I went and made a value-by-alpha map of my own. In the end I decided to use location quotients to display the data, as the bivariate color scheme provides further contrast. I feel weird using a bivariate color scheme for a continous scale (hence the conversion to location quotients), but I feel like I should get over that. Everything has its time and place, and set rules like that aren’t good for anyone but bureaucrats or the mindless.

I certainly picked a complex dataset to start with, and the benifits of the value by alpha map over the two side by side maps (if any) are slight. I suspect why mine don’t look quite as nice as the ones created by Roth, Woodruff and company are partially due to the greater amount of complexity. The map with the SatScan reliabilities I noted as one of my favorite maps is quite striking, but it is partly due to the relibaility having a very spatially contiguous pattern (although the underlying cancer mortality rate map is quite spatially heterogenous). Here the spatial regularity is much weaker, in either the pattern being mapped or the reliability thresholds I had chosen. It does produce a quite pretty map though, FWIW.

For reference, here is the same map utilizing a black background. The only thing different in this map is that the most transparent layer is now set to 80% transparency instead of 90% (it was practically invisible at 90% with black as the modifying background color). Also it was necessary to do the fake inset map for a legend I talked about earlier with black as the background color. This is because the legend generated by ArcGIS always has white as the modifying color. If you refer back to the map with white as the modifying color, you can tell this produces greater contrast among the purples (the location quotient 2.1 – 4 for fully opaque and 4.1 – 12.6 for 40% transparent with white as the modifying color appear very similar).

The Roth Cartographic journal article gives other bivariate and nominal color scheme suggestions, you should take their advice. Hopefully in the future it will be simpler to incorporate bivariate color schemes in ArcMap, as it would make the process much simpler (and hence more useful for exploratory data analysis).

I would love it if people point me to other examples in which value by alpha maps are useful. I think in theory it is a good idea, but the complexity intoduced in the map is a greater burden than I intially estimated until I made a few. I initially thought this would be useful for presenting the results of geographically weighted regression or perhaps cancer atlas maps in general (where sometimes people just filter out results below some population threshold). But maybe not given the greater complexity introduced.

4 Comments

by Andy Wheeler on August 24, 2012 • Permalink

Posted in Data Visualization, Mapping

Tagged choropleth, data visualization, mapping

Posted by Andy Wheeler on August 24, 2012

https://andrewpwheeler.com/2012/08/24/making-value-by-alpha-maps-with-arcmap/

When should we use a black background for a map?

Some of my favorite maps utilize black (or dark) backgrounds. For some examples;

Several of the maps in the Examples of Beautiful Maps thread utilize dark backgrounds. These include value-by-alpha maps, the facebook friends connection map, and the Mapnificent London travel times map has a nightime setting which changes the background map to dark.
I think many of James Chesire’s maps of London flows are quite nice. For two examples see one with a darker grey background and an animation with a black background. Below is a picture taken from the second post.

Steven Romalewski offers a slight critique of them recently in his blog post, Mapping NYC stop and frisks: some cartographic observations;

I know that recently the terrific team at MapBox put together some maps using fluorescent colors on a black background that were highly praised on Twitter and in the blogs. To me, they look neat, but they’re less useful as maps. The WNYC fluorescent colors were jarring, and the hot pink plus dark blue on the black background made the map hard to read if you’re trying to find out where things are. It’s a powerful visual statement, but I don’t think it adds any explanatory value.

I don’t disagree with this, and about all I articulate in their favor so far is essentially “well lit places create a stunning contrast with the dark background” while white background maps just create a contrast and are not quite as stunning!

I think the proof of a black backgrounds usefulness can be seen in the example value-by-alpha maps and the flow maps of James Chesire, where a greater amount of contrast is necessary. IMO in the value by alpha maps the greater contrast is needed for the greater complexity of the bivariate color scheme, and in Chesire’s flow maps it is needed because lines frequently don’t have enough areal gurth to be effectively distinguished from the background.

I couldn’t find any more general literature on the topic though. It doesn’t seem to be covered in any of the general cartography books that I have read. Since it is really only applicable to on-screen maps (you certainly wouldn’t want to print out a map with a black background) perhaps it just hasn’t been addressed. I may be looking in the wrong place though, some text editors have a high contrast setting where text is white on a dark background (for likely the same reasons they look nice in maps), so it can’t be that foreign a concept to have no scholarly literature on the topic.

So in short, I guess my advice is utilize a black background when you want to highly focus attention on the light areas, essentially at the cost of greatly diminishing the contrast with other faded elements in the map. This is perhaps a good thing for maps intended as complex statistical summaries, and the mapnificient travel times map is probably another good example where high focus in one area is sufficient and other background elements are not needed. I’m not sure though for choropleth maps black backgrounds are really needed (or useful), and any more complicated thematic maps certainly would not fit this bill.

To a certain extent I wonder what lessons from black backgrounds can be applied to the backgrounds of statistical graphics more generally. Leave me some comments if you have any thoughts or other examples of black background maps!

8 Comments

by Andy Wheeler on July 28, 2012 • Permalink

Posted in Data Visualization, Mapping

Tagged cartography, choropleth, data visualization, mapping

Posted by Andy Wheeler on July 28, 2012

https://andrewpwheeler.com/2012/07/28/when-should-we-use-a-black-background-for-a-map/

Why great circle lines look nicer in flow maps

I got sick of working on my dissertation the other day so I started writing a review article on visualizing flow lines for journey to crime data. Here I will briefly illustrate why great circle lines tend to look nicer in flow maps than do straight lines.

Flow maps tend to be very visually complicated, and so what happens (to a large extent) is what happens in Panel B in the above graphic. Bending the lines, as is done with great circles, tends to displace the lines from one another to a greater extent. Although perfect overlap as is demonstrated in the picture doesn’t necessarily happen that frequently, the same logic applies to nearly overlapped lines. One of the nicest examples of this you can find is the facebook friends map that made the internet rounds (note there are many other aesthetic elements in the plot that make it look nice besides just the great circle lines).

Of course with great circle lines you don’t get the bending in the other direction for reciprocal flows I demonstrate in my first figure (the great circle line is the same regardless of direction). Because of this, and because when using a local projection great circles lines don’t really provide enough eccentricity in the bend to produce the desired displacement of the lines, I suggested to utilize half circles and discuss how to calculate them given a set of origin-destination coordinates at this question on the GIS site.

I need to test this out in the wild some more though. I suspect a half-circle is too much, but my attempts to script a version where the eccentricity is less pronounced has befuddled me so far. I will post an update on here if I come to a better solution, and when the working paper is finished I will post a copy of that as well. Preferably I would like the script to take an arbitrary parameter to control the amount of bend in the arc, so if you have suggestions feel free to shoot me an email or leave a comment here.

For those interested in the topic I would suggest to peruse one of my other answers at the GIS site. Therein I give a host of references and online mapping examples of visualizing flows.

2 Comments

by Andy Wheeler on July 12, 2012 • Permalink

Posted in Data Visualization, Mapping

Tagged data visualization, flow-data, mapping

Posted by Andy Wheeler on July 12, 2012

https://andrewpwheeler.com/2012/07/12/why-great-circle-lines-look-nicer-in-flow-maps/

Visualization techniques for large N scatterplots in SPSS

When you have a large N scatterplot matrix, you frequently have dramatic over-plotting that prevents effectively presenting the relationship. Here I will give a few quick examples of simple ways to alter the typical default scatterplot to ease the presentation. I give examples in SPSS, although I suspect any statistical packages contains these options to alter the default scatterplot. At the end of the post I will link to SPSS code and data I used for these examples. For a brief background of the data, these are UCR index crime rates for rural counties by year in Appalachia from 1977 to 1996. This data is taken from the dataset Spatial Analysis of Crime in Appalachia, 1977-1996 posted on ICPSR (doi:10.3886/ICPSR03260.v1). While these scatterplots ignore the time dimension of the dataset, they are sufficient to demonstrate techniques to visualize big N scatterplots, as they result in over 7,000 county years to visualize.

So what is the problem with typical scatterplots for such large data? Below is an example default scatterplot in SPSS, plotting the Burglary Rate per 100,000 on the X axis versus the Robbery Rate per 100,000 on the Y axis. This uses my personal default chart template, but the problem is with the large over-plotted points in the scatter, which is the same for the default template that comes with installation.

The problem with this plot is that the vast majority of the points are clustered in the lower left corner of the plot. For the most part, the graph is expanded simply due to a few outliers in both dimesions (likely due to in part hetereoskedascity that comes with rates in low population areas). While the outliers will certainly be of interest, we kind of lose the forest for the trees in this particular plot.

Two simple suggestions to the base default scatterplot are to utilize smaller points and/or makes the points semi-transparent. On the left is an example of making the points smaller, and on the right is an example utilizing semi-transparency and small points. This de-emphasizes the outlier points (which could be good or bad depending on how you look at it), but allows one to see the main point cloud and the correlation between the two rates within it. (Note: you can open up the images in a new window to see them larger)

Note if you are using SPSS, to define semi-transparency you need to define it in the original GPL code (or in a chart template if you wanted), you can not do it post-hoc in the editor. You can make the points smaller in the editor, but editing charts with this many elements tends to be quite annoying, so to the extent you can specify the aesthetics in GPL I would suggest doing so. Also note making the elements smaller and semi-transparent can also be effectively utilized to visualize line plots, and I gave an example at the SPSS IBM forum recently.

Another option is to bin the elements, and SPSS has the options to either utilze rectangular bins or hexagon bins. Below is an example of each.

One thing that is nice about this technique and how SPSS handles the plot, a bin is only drawn if at least one point falls within it. Thus the outliers and the one high leverage point in the plot are still readily apparent. Other ways to summarize distributions (that are currently not available in SPSS) are sunflower plots or contour plots. Sunflower plots are essentially another way to display and summarize multiple overlapping points (see Carr et al., 1987 or an example from this blog post by Analyzer Assistant). Contour plots are drawn by smoothing the distribution and then plotting lines of equal density. Here is an example of a contour plot using ggplot2 in R on the Cross Validated Q/A site).

This advice can also be extended to scatterplot matrices. In fact such advice is more important in such plots, as the relationship is shrunk in a much smaller space. I talk about this some in my post on the Cross Validated blog, AndyW says Small Multiples are the Most Underused Data Visualization when I say reducing information into key patterns can be useful.

Below on the left is an example of the default SPSS scatter plot matrix produced through the Chart Builder, and on the right after editing the GPL code to make the points smaller and semi-transparent.

I very briefly experimented with adding a loess smooth line or using the binning techniques in SPSS but was not sucessful. I will have to experiment more to see if it can be effectively done in scatterplot matrices. I would like to extend some of the example corrgrams I previously made to plot the loess smoother and bivariate confidence ellipses, and you can be sure I will post the examples here on the blog if I ever get around to it.

The data and syntax used to produce the plots can be found here.

6 Comments

by Andy Wheeler on June 17, 2012 • Permalink

Posted in Data Visualization, SPSS

Tagged data visualization, small-multiples, SPSS

Posted by Andy Wheeler on June 17, 2012

https://andrewpwheeler.com/2012/06/17/visualization-techniques-for-large-n-scatterplots-in-spss/

Not participating in SPSS google group anymore, recommend using other SPSS forums

I have participated in asking and answering questions at the SPSS google group for close to two years now. I am going to stop though, as I would recommend going other places to ask and answer questions. The SPSS google group has become over-ridden with spam. Much more spam than actual questions, and I have spent much more time for at least a few months marking spam rather than answering questions.

I have diligently been marking the spam that arises for quite some time, but unfortunately google has not taken any obvious action to prevent it from occurring. I noted this mainly because the majority of recent spam has repeatedly come from only two email addresses. It is sad, mainly because I have a gmail account and I am fairly sure such obvious spam emails would not make it through my gmail account, so I don’t know why they make it through to google groups.

I believe amounts of spam and actual questions have waxed and waned in the past, but I see little reason to continue to use the forum when other (better) alternatives exist. Fortunately I can recommend alternatives to ask questions related to conducting analysis in SPSS, and the majority of the same SPSS experts answer questions at many of the forums.

First I would recommend the Nabble SPSS group. I recommend it first mainly because it has the best pool of SPSS experts answering questions. It has an alright interface, that includes formatting responses in html and uploading attachments. One annoyance I have is that many automatic email replies get through when you post something on this list (so people listening set your automatic email replies to not reply to list serve addresses). To combat this I have a mass email filter, and if I get an out of office reply in response to a posting your email address automatically goes to my email trash. In the future I just may send all responses from the list serve to the trash, as I follow questions/answers in the group using RSS feeds anway.

Second I would recommend CrossValidated if it has statistical content, and Stackoverflow if it is strictly related to programming. SPSS doesn’t have many questions on StackOverflow, but it is one of the more popular tags on CrossValidated. These have the downside that the expert community that answers questions related to SPSS specifically is smaller than Nabble (although respectable), but it has the advantage of a potential greater community input on other tangential aspects, especially related to advice about statistical analysis. Another advantage is the use of tags, and the ability to write posts in markdown as well as embed images. A downside is you can not directly attach files.

Third I would recommend the SPSS developerworks forum. There has been little to no community build up there, and so it is basically ask Jon Peck a question. For this reason I would recommend the other forums over the IBM provided forum. While the different sub sections for different topics is a nice idea, the Stackoverflow style of tags IMO works so much better it is hard to say nice things about the other list-serve or forum types of infrastructure. Jon Peck answers questions at all of the above places as well, so it is not like you are missing his input by choosing stack overflow as opposed to the IBM forum.

I hope in the future the communities will all just pick one place (and I would prefer all the experts migrate to CrossValidated and StackOverflow for many reasons). But at least no one should miss the google group with the other options available.

Leave a comment

by Andy Wheeler on June 7, 2012 • Permalink

Posted in SPSS

Tagged resources, SPSS, stackexchange

Posted by Andy Wheeler on June 7, 2012

https://andrewpwheeler.com/2012/06/07/not-participating-in-spss-google-group-anymore-recommend-using-other-spss-forums/

Dressing for success in academic interviews

On the academia stack exchange site a recent question came up about how to dress for academic interviews. The verdict was over-dressing is better than under-dressing. I had come across similar (although somewhat discordant) advice previously, but it is always good to have second opinions.

I will mention the previous blog posts I had come across, as they have good advice for giving academic job talks in general;

Matt Might’s blog post, Advice for Academic Job Hunt, suggests more causual dress.
Portrait of a Supposed Scholar, Academic Job Interview Part 1, suggests a suit.

I think my advice is just give a talk so awesome no one cares about what you wear! But, take that advice with care, it is coming from a person who not only doesn’t know what he wore yesterday, but has on occasion wore shirts backwards (in public) all day long. I would probably only notice if you showed up dressed like Dilbert in this strip;

If you have other questions related to academia head on over and check out the Academia stack exchange site. Here a few examples of some of my favorite discussions so far;

1 Comment

by Andy Wheeler on May 27, 2012 • Permalink

Posted in scholarly

Tagged scholarly, stackexchange

Posted by Andy Wheeler on May 27, 2012

https://andrewpwheeler.com/2012/05/27/dressing-for-success-in-academic-interviews/

Search for:
Recent Posts
Categories
Categories
Site RSS Feeds
- RSS - Posts
- RSS - Comments
Follow Blog via Email

Enter your email address to follow this blog and receive notifications of new posts by email.

Email Address:

Join 399 other subscribers
aoristic big-data cartography census choropleth citeulike color consulting cost-benefit courses crime-mapping crime-trends Crime Analysis Criminal Justice data-manipulation data visualization deep-learning ESRI excel flow-data folium geocoding github google-streetview-api grammar of graphics group-based-trajectory gun-violence healthcare homicide-rates hot spots hypothesis-testing linear programming logistic-regression machine-learning MACRO mapping matplotlib meta network NetworkX officer-involved-shooting open-science paper Papers peer-review Poisson prediction Predictive-Policing preprint presentation Python Python-programability pytorch quasi-experiment r recidivism regression resources scholarly scraping seaborn shootings simulation small-multiples social-media social-networking SPSS stackexchange Stata statistics survey time-series uncertainty wdd web-scraping
Top Posts & Pages
Stack Exchange

All posts by Andy Wheeler

How to make the value by alpha map

How to make the legend

Some discussion about color schemes

Recent Posts

Categories

Site RSS Feeds

Follow Blog via Email

Top Posts & Pages

Stack Exchange