The Skill That Differentiates Great Data Scientists from the Rest
We tend to focus on hard skills,when the thing that matters most is asking ourselves some tough ques
There’s an understandable tendency among new and aspiring data scientists to focus on acquiring hard quantitative and programming skills.
They rush to learn the newly trending language and technique, hoping it will make them more competitive on the job market or impress their bosses.
Being versatile is great. It saves time and can help you find a better way to model something. It also makes your mind more nimble and creative. All really good things.
But ultimately, versatility is not the most important skill a data scientist can have. That’s because, almost by definition, data scientists are pretty versatile to begin with. Once you’ve learned and worked with a couple of languages or modeling techniques, it’s typically fairly easy to pick up another.
In other words, versatility and hard data science skills are commodities that give you only a limited and temporary competitive edge in the data science marketplace.
Instead, there’s a quality that’s much rarer in data scientists, one that’s far less shiny or talked about, but one that can make or break you as a data scientist. It’s much harder to acquire than new languages and skills, because it goes against the optimizing mindset of most data scientists. It’s the thing that truly distinguishes great data scientists from the rest.
It can be captured by a question that great data scientists ask themselves consistently and honestly:
What in this dataset/algorithm doesn’t look right?
Sounds simple, right? Just basic common sense. After all, when you ask yourself this question, you’re that much more likely to identify problems and improve the quality of your work. It can also help you find hidden stories in your data that you can then integrate into the work product.
Photo by National Cancer Institute on Unsplash
Most importantly, whereas using a less optimal statistical technique might lose you a bit of predictive accuracy, the impact of including the wrong variable or unwittingly drawing on a massively biased dataset can be far greater. It can undermine your work product entirely and your own credibility as a data scientist.
So, naturally, you’d think most data scientists would ask themselves what might be wrong with their data and work product before shipping it on. What could be easier than thinking critically about the data, the variables you’re using, and your findings?
A lot, it turns out. In my 10 years in the data science space — first as a data scientist and later as a data strategist and lecturer— I’ve found it surprisingly hard to ask myself this question early in my career, or to teach others to do so consistently later on.
While every data science environment is different, there are a few common reasons that make it challenging for data scientists to be more self-critical:
- You have a lot to get done. When you’re bouncing around from one project to another, it’s understandably tempting to “check the box” once your work product passes minimal quality control thresholds. You wish you had more time to dive into the data, but there are real pressures from your boss or your client to get it done already.
- You believe the story your data is telling. Confirmation bias is a major problem in data science. If it looks right to you, then you’re not motivated to question it. But even when things look a little off, it is so easy — not to mention more convenient — to explain them away. Data scientists are experts at post-hoc rationalizations.
- You don’t want to admit that you don’t know what’s up. Sometimes when something looks off, you look into it but can’t pinpoint the problem. It might be random noise or something more meaningful — you don’t know, and the data isn’t budging. But now that you’ve identified the problem, you might feel the awkward ethical nudge to mention it to your supervisor or client. And that could reflect poorly on your work. Why invite problems by digging too deep? It’s much easier to accept things as they are.
All of which is why it’s so rare to find data scientists who ask themselves consistently and honestly what doesn’t look right.
There are ways to address each of the barriers above. I’ve found, for example, that far from hurting you, acknowledging some unexplained messiness in the data actually builds trust with supervisors and clients.
“Checking the box” doesn’t feel so great once someone finds a problem and asks you to explain it. And some of my best and most impactful work began with noticing that something in the data looked off.
So if you’re just starting out, my advice to you is to learn to question yourself and your data. Hard skills are nice. But when you find a data scientist who asks themselves the tough questions, you know you can trust them absolutely. And that is worth so much more.
Originally published on medium.