I was recently asked why a Data Scientist in our team wasn’t data-sciencing but was instead working on data wrangling: data preparation, cleaning and data quality assurance. Where were the fancy statistics? Where was modelling? A graph perhaps? Or at least a slide that said Artificial Intelligence, Machine Learning, or Deep Learning.

I realized that Data Science doesn’t always live up to the hype for upper-level management. It might be that it doesn’t live up to expectations, because frankly, somebody has set the bar a little too high for us mere mortals (ahem, Google). Or it might be because, perhaps, companies just aren’t quite ready for Data Science.

We were asked why our Data Scientist wasn’t data-sciencing…

Although many companies have sprung up around the world dealing purely in AI-driven technology or ML-enabled tools the average company has yet to find its way around the field.

Why is that? Why are some companies able to monetize AI. And why are other, typically more traditional institutions, like banks, finding it hard to adopt the new techniques? Why can’t they get their hands on some good AI or Data Science?

First, we should stop calling it AI…

We should start with the elephant in the room. We should really stop calling it AI. AI, the human-like intelligence, is a long way from some proper statistics or a supervised ML model, what most companies need. Companies aren’t really in need of an AI. On top of that, it is a bloated and overused term.

But if you insist on AI. Order matters…

When you think of popular examples: like Google Photo’s, or Snapchat filters, or even self-driving cars you have to recognize that these are products (and entire companies) built around the AI. Image recognition is great and useful, but its application isn’t infinite.

AI → Product. The reason companies such as Google, Snapchat, Facebook, and Tesla have such a successful track record is because of their use cases. The use cases are chosen very carefully, in such that they built products atop of a model that was perfectly fit for purpose (Arguably self-driving cars are having a hard time using the machine learning algorithm to generate truly self-driving vehicles).

There are a few fields where ML models shine; fuzzy datasets with unclear relations is one. (Not to be confused with low-quality datasets). We see the same use cases being used all around us: the aforementioned image recognition, speech to text and translations, relationships between illness and medical data — the relationship between the outcome and input isn’t immediately clear, but a powerful computer can generate a plausible relationship between the two.

Product → AI. For many companies, the problem is often the other way around. They have a problem and want to apply AI to. This leads to all sorts of unexpected twists and turns along the way. What, for instance, happens when your problem isn’t suited for the application of AI or doesn’t even need an AI? What it needs is some process optimization. Or data quality, or some simple accesses for users?

Not every problem lends itself for AI. AI → Product ≠Product → AI.

And you aren’t even doing analytics right, yet…

Often, too, companies are limiting the application of Machine Learning and Data Science by their business knowledge and objectives. I mean that questions from the business determine what is being created or researched and what is worth the time of the Data Science teams. This means that your boundary condition is the business. Not the data scientist you so expensively hired.

Data Scientist can be much more easily trained to think in line with business objectives. They can understand the trade-off between time invested and the business value gained. It‘s hard to let a business manager define meaningful work for a group of data scientists. This is one of the reasons its so hard to find good DS managers — they need to understand and communicate with the business while having deep technical knowledge and technically heading the team.

When you don’t have a solid delivery platform in place for basic analytics, effectively delivering more complex products will be even harder.

You aren’t even data-driven…

Most companies tend to think that they are data-driven, but they aren’t. Data-driven does not mean: finding data, building whichever story you like and presenting that. Data-driven means a meticulous understanding of the data what it entails, it entails a deep vertical knowledge of how the data came about and what it excludes. It requires a provable hypothesis and taking a holistic approach. The results will not always be what you want them to be and you should be willing to accept that.

Share your anecdotal truths, but don’t be surprised if the data doesn’t always echo your experience.

Being data-driven also means that you cannot just pick the answers you like and ignore the rest. Or even ask questions that are ‘leading the witness.’ I know this is harder than it sounds, but you aren’t allowed to cherry-pick.

You are allowed (and even highly encouraged!) to bring your knowledge and experience to the analytics team. It helps when we learn of past changes in workflow, experiments, organizational or legal changes in the landscape. But don’t be surprised if the data doesn’t always echo your anecdotal truths.

You really can’t skip steps…

As with many things, so too with Data Science: you can’t take any shortcuts. Without data-wrangling, upstream quality data, prepped datasets, data understanding, properly engineered features the whole endeavour is at risk. The adagio ‘shit in, shit out’ is simply true. You cannot trust your results if you do not have strong reasons to trust the underlying data.

There is no data-sciencing without hours spent staring at the computer screen; understanding the data. No deep insights spring from the page, no fully functional AI is made without understanding its inputs, the team has to put in the grunt work. Data doesn’t behave like oil, its time and effort-intensive at value extraction:

Data science isn’t sexy — it’s hard and at times boring work and that’s actually what’s so fun about it. The reward after the struggle. The findings after your team has put in the efforts.

You probably don’t need AI…

It’s also very unlikely that you need an AI — the term being extremely bloated but freely thrown around in corporations. It’s much more likely that you need some vanilla solutions; some descriptive statistics, some mellow regressions, perhaps a supervised ML or maybe even graph database to find nearest neighbours. Probably not an AI.

Ask yourself, what do you believe to add the most value to you and your team? New information, new insights? Or automated processes that are done without the interference of humans? If it’s the latter then, by all means, indulge. If it does not, ask if you need an AI or simply some business analysis.

And even if you did, you aren’t ready to use it…

Even if you truly wanted and AI, you are probably not ready to have it. Similarly to above, when your systems and processes aren’t ready, you aren’t either. Are you able to relinquish control to the machines? Probably not.

When new solutions are introduced that automate previously man-made choices there is a ton of pushback. Not always because of malintent, but simply because of unknowing.

When marketing is told who to prioritize and what to prioritize, are your marketing managers ready to trust the results and implement? Are the marketing managers ready to allow for the system to take their job? And are you, as a company, able to find meaningful ways to use their skills? Be aware of these questions when you are truly ready to go AI.

Then there are questions about reliability, oversight, ownership, compliance. So even when you go AI, never go full AI, keep a solid network of control and test around your systems.

So there you go, you probably aren’t ready for AI…

There, that’s that. Your company is probably not ready yet. And that is okay. There is a ton of information that can be parsed into meaningful insights. There is (literally) a million to make from doing the basics right first.

So, stop hunting for the gold and see that a stack of dollar bills is also worth a hell of a lot. Allow for the process to be the process, see it as part of the bigger whole. Start small and work your way up from there, taking all the steps in between.

So when our Data Scientist wasn’t data-sciencing he was putting in the work required to get to where he needed to be: in a situation that the analysis, modelling, visualization all make sense because the underlying data can be trusted. He was doing the work to get ready for our company to have a trustable analysis and models. He was putting in the work to get ready for AI.