For my current work as a data scientist, I spend most of my time writing SQL queries, generating some sort of predictive model on that data using python, and automating those data pipelines using additional command line scripts. Pretty much nothing coding wise I do on a day to day basis I learned in my entire educational career.
The only specific coding classes I took in school were SAS in undergrad and SPSS in grad. All other coding was in Stata and a very tiny bit in R, both incidental to statistical classes. Even those should hardly count, as all it entails is load a dataset and run reg y x
or something similar.
That focuses on the software engineering side – the other side of being a data scientist is essentially being an applied mathematician. That may sound fancy, but the work I do I like to think is more akin to accounting with probabilities (where I have to personally create models to estimate the probabilities). While I had extensive quantitative training in graduate school, again nothing I was taught even remotely resembles the mathematics I use on a regular basis at my job.
My social science education entirely focused on causal inference, estimating parameters on the right hand side of the regression equation. I did not cover prediction/forecasting/machine-learning one iota in my classes. I did not even have any classes on cost-benefit analysis, which is more akin to me calculating potential return on investment when I am creating new machine learning models for my company.
The only thing I do regularly at my job you could reasonably point to specific educational training/prep on was presenting results in PowerPoint presentations.
That being said, no way I would be in my current position if I did not have a PhD. For a potential counter-factual, I debated on dropping out of undergrad at one point and going to community college to install HVAC systems. I feel pretty comfortable assuming I would not have ended up as a data scientist if I took that career path. (Before you think to poo-poo on that career path choice, it is easily possible my personal net worth would be in the same ballpark at this point in my life in that counter-factual installing HVAC world. There are significant opportunity costs you are eating when you pursue a PhD.)
So what exactly was the value of my PhD? While you take some classes as a PhD student, I don’t see the main benefit of those as being vocational in nature. When pursuing a PhD it is a full time endeavor, and it is the entire environment that marks it as a major difference from undergraduate education. Pretty much every conversation you have as a PhD student is focused on science.
A second major difference is that you are not a passive consumer of scientific research – you have bridged to becoming a producer of that knowledge. A PhD dissertation by its nature is very sink or swim – you are expected to come up with a particular research topic/agenda, and conduct the appropriate analysis to investigate that particular topic, then share your results with the world. This is very different than working in a job where someone tells you what to do – you show up in the morning and you have 100% latitude to pursue whatever you want.
These two things together I believe are where the value lies in a PhD. The independence necessary to be a successful in a PhD is by its nature not something you can get via prior work experience (unless you count say starting your own business). This coupled with the scientific environment provides an atmosphere where constant learning is necessary to get to the finish line of the dissertation. Even if I still was an academic, it is always necessary for me to consume new material, teach myself new things, and apply that to the work I am pursuing.
So while I did not learn python programming or machine learning in grad school, I just go out, try to consume as much as I can on the material, and apply that knowledge to solve the current problems I am dealing with. There will always be something new I need to teach myself while I am still working, but that is OK. I have the means to teach myself those things from my PhD experience. I am not sure I would have really ever gotten to that point just by focusing on vocational aspects (e.g. taking classes on machine learning or programming) – I think I only got to that point by having to pursue my own independent research.
I’ve been musing this more as potential students ask me whether it is worth it to pursue a PhD. I have mixed feelings, but have settled on this simple dichotomy – if you are only pursuing a PhD because you want to teach, I have grave reservations against recommending a PhD. The supply for these professor positions greatly outpace the demand from universities. So even if you do well as a student, there is no guarantee you will get a tenure track position. In the current market where there are dozens of really good candidates for any position, network effects can dominate that decision.
But, if you are more open to other potential positions, such as public sector researcher positions, think tanks, or private sector data science, I feel more comfortable in saying going for the PhD is a reasonable career choice.
Unfortunately, current education in terms of preparing you to be competitive for private sector data science is somewhat lacking across the social sciences. As I stated at the beginning of this post, I did not personally learn any of the tools I use regularly at my job via traditional education, but more as ancillary to my particular research interests. To follow in my path, the research you pursue needs to somewhat match the skills the current market wants, and these include:
- predictive modeling (e.g. tree based models, boosted models, deep learning)
- legitimate coding skills in python/R, as well as tools like git/Docker
- working with moderately large datasets (SQL, Hadoop, or online AWS)
- data visualization to explain results/models
I am hoping my former colleagues in social sciences will do a better job of expanding the graduate curricula to better teach these skills. They have utility for the more traditional research as well. I am not holding my breath though for that. So in the meantime if you are pursuing a PhD in the social sciences, and you want to pursue a data science job (or simply hedge in case you cannot land a tenure track gig), these are skills you need to develop on your own while also doing your PhD.
1 Comment