I have no clue how to interview for data scientists

At work we have been expanding quite a bit, so this comes with many interviews for data science candidates (as well as MLOps and a few business analysts). For a bit while we were between directors, I did the initial screening interviews (soft questions, get some background, just filter out those really out of depth). I filtered very few people in the end, and I have no clue if that was good/bad. One of the hard things about interviews is it is a very lossy feedback loop. You only really know if it worked out well 6+ months after it is over. And you only get on-policy feedback for those you hired – you don’t know if you filtered out someone who would have worked really well.

I have done more of the technical interviews though recently, which are intended to be more discriminatory. I don’t believe I do a good job at this either, or at least everyone seems OK (no clear uber bad and no clear uber good). It has been particularly hard to hire people who can come in and be seniors/independent from the get-go, as even people with many years of experience it isn’t clear to me they have the right experience to be really independent after on-boarding for a month.

For a while we did the homework thing – here is a dataset, do some data manipulation and fit a model. (You can see the homework I made on GitHub, longer version, shorter version.) We have stopped doing these though, partly because everyone’s homework looks the same – what I call copy-pasta code. So I feel asking people to spend 4-8 hours (or likely more) on homework is not good for the very little benefit it provides. The nature of simple homework assignments I believe is so superficial I don’t think you can do it in a way that is amenable to anything but copy-pasta.

So now during the technical interview we do a grilling of questions. We have some regulars, but they do not appear to me to be really discriminatory either.

So I typically start by asking people to pick a project they think had the most value, and we do a deep dive into that. Many people who are good programmers (and some even with math degrees) don’t have what I would consider real fundamental knowledge of the models they are building. How many parameters do you have in your deep learning model? Because you oversampled how did you recalibrate the predictions to correctly estimate false positives? How did you evaluate the return on investment to your model? I feel these are quite fair – I let you pick the best work you have done in your career! You should be quite familiar with it.

So now that I am unsure if you should be left alone to build models, we migrate to some specific technical questions. These can be explain the difference between random forests and xgboost models, explain an ROC curve to a business person, what is the difference between a list and a tuple in python, if you had to query a million claims once a week and apply a predictive model how would you do it, etc. (Tend to focus more on math/stats than programming.)

Those examples above most people answer just fine (so they are maybe worthless, they don’t discriminate anyone). But a few pretty much everyone we interview fails – one is this question about calculating expected utility for auditing claims with a different dollar value amount I shared on LinkedIn the other day:

Pretend you have a model that predicts the probability an insurance claim is fraudulent. You have two claims; one $2000 with a 50% probability, and one $10000 with a 20% probability. If you had to choose a single claim to audit, which would one would you pick and why?

Am I crazy to expect someone with a data science degree to be able to answer this question coherently? (Pretty close to every model we have focusing on health insurance claims at Gainwell looks like this in real life! It is not a weird, unrealistic scenario.) I have more generic variants as well, such as how do you take a predicted probability and a claim value and know it is worth it to audit. Or for a more generic one how do you know how to set the threshold to audit claims given a model prediction? These questions appear too discriminatory, in that they filter out even very experienced individuals.

This and the threshold question kills me a little inside everytime a senior person mangles the logic of it – it really signals to me you don’t understand how to translate mathematical models to make relevant business decisions. To be an independent data scientist this is a critical skill – you need it to know how to structure the model as well as how to feed that back into whatever human or automated decision making process uses that models predictions. It is what distinguishes data scientists from software engineers – I am an applied mathematician who knows how to code is how I view my role. (It is one of the reasons I think a PhD is valuable – being able to think mathematically like that in broader strokes is a typical step in the dissertation process for folks.)

So I am stuck between questions everyone can answer and questions no one can answer. I feel like I might as well flip a coin after the initial entry level interview and not waste everyone’s time.