My Blogging in Review in 2013

2013 was my second year in blogging. I published 40 posts in 2013 (for a total of 72), and my cumulative site views were just a few shy of a 21,000 for the year. I only recieved 7,200 site views in 2012, so the blog has seen a fair bit of growth. The below chart aggregates the site views per month since the beginning (in December 2011) until December 2013. December has been a bit of a dip with only around an average of 60 views per day, but I was up to an average of 78 and 75 views per day in October and November respectively.

The large uptick in March was due to the Junk Charts Challenge being mentioned by Kaiser Fung. I got over 500 site views that day, and have totalled 765 referrals from the JunkCharts domain. This is pretty similar to the bursty behavior I noted on the CV blog, and that one good tweet or mention by a prominent figure will boost visibility by a large margin.

Most of the regular traffic though comes from generic internet searches, mainly for SPSS related material. A few of my earlier posts of Comparing continuous distributions of unequal size groups in SPSS (2,468 total views), Hacking the default SPSS chart template (2,237), and Avoid Dynamite Plots! Visualizing dot plots with super-imposed confidence intervals in SPSS and R (1,542) are some of my most popular posts. The Junk Charts Challenge post has a total of 1,804 views, but it seems to me that it was more of a flood initially and then a trickle as oppossed to the steady views the other posts bring.

Last year I said I would blog about a few topics and failed to write a post about any of them, so I won’t do that again this year. I will however state that I am currently on the job market, as I recently defended my prospectus. If you are aware of a job opportunity you think I would be interested in, or would like to talk to me about a consulting project feel free to send me an email (you can see my CV for my qualifications and brief discussion of past and current consulting services I have provided).

Some sites give advice about maintaining a blog and attracting visitors (such as writing posts so often). My advice is to write quality material, and the rest is just icing on the cake. Hopefully I have more cake for you in the near future.

Why I feel SPSS (or any statistical package) is better than Excel for this particular job

I debated on pulling an Andrew Gelman and adding a ps to my prior Junk Charts Challenge post, but it ended up being too verbose, so I just made an entirely new follow-up. To start, the discussion has currently evolved from this series of posts;

  • The original post on remaking a great line chart by Kaiser Fung, with the suggestion that the task (data manipulation and graphing) is easier in Excel.
  • My response on how to make the chart in SPSS.
  • Kaiser’s response to my post, in which I doubt I swayed his opinion on using Excel for this task! It appears to me based on the discussion so far the only real quarrel is whether the data manipulation is sufficiently complicated enough compared to the ease of pointing and clicking in Excel to justify using Excel. In SPSS to recreate Kaiser’s chart is does take some advanced knowledge of sorting and using lags to identify the pit and recoveries (the same logic could be extended to the data manipulations Kaiser says I skim over, as long as you can numerically or externally define what is a start of a recession).

All things considered for the internet, discussion has been pretty cordial so far. Although it is certainly sprinkled in my post, I didn’t mean for my post on SPSS to say that the task of grabbing data from online, manipulating it, and creating the graph was in any objective way easier in SPSS than in Excel. I realize pointing-and-clicking in Excel is easier for most, and only a few really adept at SPSS (like myself) would consider it easier in SPSS. I write quite a few tutorials on how to do things in SPSS, and that was one of the motivations for the tutorial. I want people using SPSS (or really any graphing software) to make nice graphs – and so if I think I can add value this way to the blogosphere I will! I hope my most value added is through SPSS tutorials, but I try to discuss general graphing concepts in the posts as well, so even for those not using SPSS it hopefully has some other useful content.

My original post wasn’t meant to discuss why I feel SPSS is a better job for this particular task, although it is certainly a reasonable question to ask (I tried to avoid it to prevent flame wars to be frank – but now I’ve stepped in it at this point it appears). As one of the comments on Kaiser’s follow up notes (and I agree), some tools are better for some jobs and we shouldn’t prefer one tool because of some sort of dogmatic allegiance. To make it clear though, and it was part of my motivation to write my initial response to the challenge post, I highly disagree that this particular task, which entails grabbing data from the internet, manipulating it, and creating a graph, and updating said graph on a monthly basis is better done in Excel. For a direct example of my non-allegiance to doing everything in SPSS for this job, I wouldn’t do the grabbing the data from the internet part in SPSS (indeed – it isn’t even directly possible unless you use Python code). Assuming it could be fully automated, I would write a custom SPSS job that manipulates the data after a wget command grabs the data, and have it all wrapped up in one bat file that runs on a monthly timer.

To go off on a slight tangent, why do I think I’m qualified to make such a distinction? Well, I use both SPSS and Excel on a regular basis. I wouldn’t consider myself a wiz at Excel nor VBA for Excel, but I have made custom Excel MACROS in the past to perform various jobs (make and format charts/tables etc.), and I have one task (a custom daily report of the crime incidents reported the previous day) I do on a daily basis at my job in Excel. So, FWIW, I feel reasonably qualified to make decisions on what tasks I should perform in which tools. So I’m giving my opinion, the same way Kaiser gave his initial opinion. I doubt my experience is as illustruous as Kaiser’s, but you can go to my CV page to see my current and prior work roles as an analyst. If I thought Excel, or Access, or R, or Python, or whatever was a better tool I would certainly personally use and suggest that. If you don’t have alittle trust in my opinion on such matters, well, you shouldn’t read what I write!

So, again to be clear, I feel this is a job better for SPSS (both the data manipulation and creating the graphics), although I admit it is initially harder to write the code to accomplish the task than pointing, clicking and going through chart wizards in Excel. So here I will try to articulate those reasons.

  • Any task I do on a regular basis, I want to be as automated as possible. Having to point-click, copy-paste on a regular basis invites both human error and is a waste of time. I don’t doubt you could fully (or very near) automate the task in Excel (as the comment on my blog post mentions). But this will ultimately involve scripting in VBA, which diminishes in any way that the Excel solution is easier than the SPSS solution.
  • The breadth of both data management capabilities, statistical analysis, and graphics are much larger in SPSS than in Excel. Consider the VBA code necessary to replicate my initial VARSTOCASES command in Excel, that is reshaping wide data to stacked long form. Consider the necessary VBA code to execute summary statistics over different groups without knowing what the different groups are beforehand. These are just a sampling of data management tools that are routine in statistics packages. In terms of charting, the most obvious function lacking in Excel is that it currently does not have facilities to make small-multiple charts (you can see some exceptional hacks from Jon Peltier, but those are certainly more limited in functionality that SPSS). Not mentioned (but most obvious) is the statistical capabilities of a statistical software!

So certainly, this particular job, could be done in Excel, as it does not require any functionality unique to a stats package. But why hamstring myself with these limitations from the onset? Frequently after I build custom, routine analysis like this I continually go back and provide more charts, so even if I have a good conceptualization of what I want to do at the onset there is no guarantee I won’t want to add this functionality in later. In terms of charting not having flexible small multiple charts is really a big deal, they can be used all the time.

Admittedly, this job is small enough in scope, if say the prior analyst was doing a regular updated chart via copy-paste like Kaiser is suggesting, I would consider just keeping that same format (it certainly is a lost opportunity cost to re-write the code in SPSS, and the fact that it is only on a monthly basis means to get time recovered if the task were fully automated would take quite some time). I just have personally enough experience in SPSS I know I could script a solution in SPSS quicker from the on-set than in Excel (I certainly can’t extrapolate that to anyone else though).

Part of both my preference and experience in SPSS comes from the jobs I personally have to do. For an example, I routinely pull a database of 500,000 incidents, do some data cleaning, and then merge this to a table of 300,000 charges and offenses and then merge to a second table of geocoded incident locations. Then using this data I routinely subset it, create aggregate summaries, tables, estimate various statistics and models, make some rudimentary maps, or even export the necessary data to import into a GIS software.

For arguments sake (with the exception of some of the more complicated data cleaning) this could be mostly done in SQL – but certainly no reasonable person should consider doing these multiple table merges and data cleaning in Excel (the nice interactive facilities with working with the spreadsheet in Excel are greatly dimished with any tables that take more a few scrolls to see). Statistical packages are really much more than tools to fit models, they are tools for working and manipulating data. I would highly recommend if you have to conduct routine tasks in which you manipulate data (something I assume most analysts have to do) you should consider learning statistical sofware, the same way I would recommend you should get to know SQL.

To be more balanced, here are things (knowing SPSS really well and Excel not as thoroughly) I think Excel excels at compared to SPSS;

  • Ease of making nicely formatted tables
  • Ease of directly interacting and editing components of charts and tables (this includes adding in supplementary vector graphics and labels).
  • Sparklines
  • Interactive Dashboards/Pivot Tables

Routine data management is not one of them, and only really sparklines and interactive dashboards are functionality in which I would prefer to make an end product in Excel over SPSS (and that doesn’t mean the whole workflow needs to be one software). I clean up ad-hoc tables for distribution in Excel all the time, because (as I said above) editing them in Excel is easier than editing them in SPSS. Again, my opinion, FWIW.

My experience blogging in 2012

I figured I would write a brief post about my experience blogging. I created this blog and published my first post in December of 2011. Since then, in 2012, I published 30 blog posts, and totaled 7,200 views. While I thought the number was quite high (albeit a bit dissapointing compared to the numbers of Larry Wasserman), it is still many more people than would have listened to what I had to say if I didn’t write a blog. When starting out I averaged under 10 views a day, but throughout the year it steadily grew, and now I average about 30 views per day. The post that had the most traffic in one day was When should we use a black background for a map?, and that was largely because of some twitter traffic (a result of Steven Romalewski tweeting it and then it being re-tweeted by Kenneth Field), and it had 73 views.

I started the blog because I really loved reading alot of others blogs, and so I hope to encourage others to do so as well. It is a nice venue to share work and opinions for an academic, as it is more flexible and can be less formal than articles. Also much of what I write about I would just consider helpful tips or generic discussion that I wouldn’t get to discuss otherwise (SPSS programming and graph tips will never make it into a publication). One of my main motivations was actually R-Bloggers and the SAS blog roll; I would like a similarly active community for SPSS, but there is none really that I have found outside of the NABBLE forum (some exceptions are Andy Field, The Analysis Factor, Jon Peck and these few posts by a Louis K I only found through the labyrinth that is the IBM developerworks site (note I think you need to be signed in to even see that site), but they certainly aren’t very active and/or don’t write much about SPSS). I assume the best way to remedy that is to lead by example! Most of my more popular posts are ones about SPSS, and I frequently get web-traffic via general google searches of SPSS + something else I blogged about (hacking the template and comparing continuous distributions are my two top posts).

Also the blog is also just another place to highlight my academic work and bring more attention to it. WordPress tells me how often someone clicks a link on the blog, and someone has clicked the link to my CV close to 40 times since I’ve made the blog. Hopefully I have some pre-print journal articles to share on the blog in the near future (as well as my prospectus). My post on my presentation at ASC did not generate much traffic, but I would love to see a similar trend for other criminologists/criminal justicians in the future. My work isn’t perfect for sure, but why not get it out there at least for it to be judged and hopefully get feedback.

I would like to blog more, and I actively try to write something if I haven’t in a few weeks, but I don’t stress about it too much. I certainly have an infinite pool of posts to write about programming and generating graphs in SPSS. I have also thought about talking about historical graphics in criminology and criminal justice, or generally talking about some historical and contemporary crime mapping work. Other potential posts I’d like to write about are a more formal treatment about why I loathe most difference-in-differences designs, and perhaps about the sillyness that can ensue when using null-hypothesis significance testing to determine racial bias. But they will both take more careful elaboration on, so might not be anytime soon.

So in short, SPSSer’s, crime mapper’s, criminologist’s/criminal justician’s, I want you to start blogging, and I will eagerly consume your work (and in the meantime hopefully produce some more useful stuff on my end)!