I have no clue how to interview for data scientists

At work we have been expanding quite a bit, so this comes with many interviews for data science candidates (as well as MLOps and a few business analysts). For a bit while we were between directors, I did the initial screening interviews (soft questions, get some background, just filter out those really out of depth). I filtered very few people in the end, and I have no clue if that was good/bad. One of the hard things about interviews is it is a very lossy feedback loop. You only really know if it worked out well 6+ months after it is over. And you only get on-policy feedback for those you hired – you don’t know if you filtered out someone who would have worked really well.

I have done more of the technical interviews though recently, which are intended to be more discriminatory. I don’t believe I do a good job at this either, or at least everyone seems OK (no clear uber bad and no clear uber good). It has been particularly hard to hire people who can come in and be seniors/independent from the get-go, as even people with many years of experience it isn’t clear to me they have the right experience to be really independent after on-boarding for a month.

For a while we did the homework thing – here is a dataset, do some data manipulation and fit a model. (You can see the homework I made on GitHub, longer version, shorter version.) We have stopped doing these though, partly because everyone’s homework looks the same – what I call copy-pasta code. So I feel asking people to spend 4-8 hours (or likely more) on homework is not good for the very little benefit it provides. The nature of simple homework assignments I believe is so superficial I don’t think you can do it in a way that is amenable to anything but copy-pasta.

So now during the technical interview we do a grilling of questions. We have some regulars, but they do not appear to me to be really discriminatory either.

So I typically start by asking people to pick a project they think had the most value, and we do a deep dive into that. Many people who are good programmers (and some even with math degrees) don’t have what I would consider real fundamental knowledge of the models they are building. How many parameters do you have in your deep learning model? Because you oversampled how did you recalibrate the predictions to correctly estimate false positives? How did you evaluate the return on investment to your model? I feel these are quite fair – I let you pick the best work you have done in your career! You should be quite familiar with it.

So now that I am unsure if you should be left alone to build models, we migrate to some specific technical questions. These can be explain the difference between random forests and xgboost models, explain an ROC curve to a business person, what is the difference between a list and a tuple in python, if you had to query a million claims once a week and apply a predictive model how would you do it, etc. (Tend to focus more on math/stats than programming.)

Those examples above most people answer just fine (so they are maybe worthless, they don’t discriminate anyone). But a few pretty much everyone we interview fails – one is this question about calculating expected utility for auditing claims with a different dollar value amount I shared on LinkedIn the other day:

Pretend you have a model that predicts the probability an insurance claim is fraudulent. You have two claims; one $2000 with a 50% probability, and one $10000 with a 20% probability. If you had to choose a single claim to audit, which would one would you pick and why?

Am I crazy to expect someone with a data science degree to be able to answer this question coherently? (Pretty close to every model we have focusing on health insurance claims at Gainwell looks like this in real life! It is not a weird, unrealistic scenario.) I have more generic variants as well, such as how do you take a predicted probability and a claim value and know it is worth it to audit. Or for a more generic one how do you know how to set the threshold to audit claims given a model prediction? These questions appear too discriminatory, in that they filter out even very experienced individuals.

This and the threshold question kills me a little inside everytime a senior person mangles the logic of it – it really signals to me you don’t understand how to translate mathematical models to make relevant business decisions. To be an independent data scientist this is a critical skill – you need it to know how to structure the model as well as how to feed that back into whatever human or automated decision making process uses that models predictions. It is what distinguishes data scientists from software engineers – I am an applied mathematician who knows how to code is how I view my role. (It is one of the reasons I think a PhD is valuable – being able to think mathematically like that in broader strokes is a typical step in the dissertation process for folks.)

So I am stuck between questions everyone can answer and questions no one can answer. I feel like I might as well flip a coin after the initial entry level interview and not waste everyone’s time.

Leave a comment


  1. Always insightful and always a pleasure to read! I’ve had similar problems and after about 3–4 years I decided to not interview any more for pretty much the reasons you describe (and others that I explain in my blog http://blog.thegrandlocus.com/2017/01/against-recruitment-panel-interviews).

    My lab has been interview-free for 6 years or so. The funniest part is to figure out what to do with this constraint. The key is that you need information about how successful the person will be in your organization working on the concrete problems you work on. The best option (by far) is to try. I often propose a trial period of e.g., 6 months where we decide after 3 months if we give a long contract, keeping the final 3 months for the person to find something else if it’s a no-go. Most people refuse the deal, but I’ve always hired long-term those who accepted it (they are committed, they risk a lot just to be part of your team, so there is a self-fulfilling prophecy effect).

    In an interview-free organization, you can’t really have job ads, so you need to think about ways to find talent and let them know that you are recruiting. It’s not too hard with a small lab because there is a culture of spontaneous application, plus fellowships and grant agencies call the shots most of the time. Still, I make sure that I know people-who-know-people so that I can get good candidates in just a few phone calls when I need to. I also keep the contact of talented people I meet or know about from my professional activities.

    I got the main ideas from Jeff Atwood but I never tried the way he does it.

    So far I’ve never convinced anyone to go interview-free because “it’s nice but in my case it is really impossible”. I get that. Still, I have to say that I am quite happy with my recruitment in the last few years.

    • If you don’t advertise, doesn’t it then become ‘who you know’ rather than ‘what you know’.

      • It’s a very good point Jeremy. Thanks for bringing it up. Frankly, I had not thought about this… and I think you are right.

        But saying that it “becomes” ‘who you know’ would mean that you do not pay attention to ‘what you know’, don’t you agree? I think your comment raises two important questions: 1) how much talent / competences do you lose by paying more attention to the network of your team members and 2) what is the value of networking for talent / competences.

        I don’t have an answer for either question, but if you know something about it, I’d love to hear it.

    • The Meehl stuff is pretty damning is it not (I often cite that for people to prefer models over ad-hoc opinions). It really calls for standardized testing if we think about it that way. People heavily misrepresent themselves/their contributions on resumes, so I am not sure how we could standardize assessments on our ends from the usual inputs (Github would be good, but very few candidates have contributed to open source).

  2. Some people want mathematical geniuses as data scientists, who can’t think about a business problem. They don’t mind the people who can’t answer the question about insurance claims. You don’t want those people.
    When I interview I say “I’m going to ask you some of the kind of questions that I get asked by people at work” (and I do). How do you interpret this result? How large should my sample be for an AB test? Are there any problems with this proposed design?

    • Oh, and I forgot: A minority of people pass the interviews (not just mine – I’m one of many).

    • I have thought my expectations are more on the unreasonable genius side at times (and this one was like the tricky Fermi problems we probably all think are silly). Current data science training is heavy on tools/standard practices for evaluation (e.g. people have F1 score memorized, but can’t answer the claims question).

      Many of the instances where things went badly at work, it was due to intermediate stats stuff, like not understanding selection bias, not understanding how to translate the model to make better business decisions, etc. Coding and learning new models I feel is easier on the job to a certain point.

      I’ve wondered if the simpler questions are better than the more open ones that let people talk about prior work. My grilling on prior work may be off because I don’t understand what they did (although that is part of the point, some people can’t coherently describe the prior work they did).


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: