In 2013, McKinsey published an article to examine how retailers can keep up with consumers. It stated that 35% of consumer purchases on Amazon and 75% of content watched on Netflix was driven by product recommendations. Not generic recommendations, but personalized ones

“powered by increasingly sophisticated algorithms and predictive models that analyze transaction data and digital-media trends.”

Today, in the era of the filter bubble, personalization is so carefully ingrained in our internet experience that we don’t even know it’s there. To say that one way for retailers to keep up with consumers is personalization is like saying that brushing your teeth daily can prevent tooth decay.

We all know—and if you don’t do it, that's on you!

In fact, the success of machine learning across a range of applications, not just recommendation systems, is well known. Deep learning models can perform as well as healthcare professionals in medical diagnostics, decision trees can be used to predict customer churn for telecoms companies, and facial recognition can assist with border control at airports to reduce queue time.

So there you have it, using machine learning is a sure way of improving process efficiency and performance.

In reality, a machine learning model isn’t a fix-all solution, because all they do is output a prediction. What they cannot do, and therefore should not do, is directly dictate policy.

In this context policy is the set of processes that the output of the machine learning algorithm would change, i.e. the strategy used to recommend personalized products, the active interventions used to combat customers churn, how to decide whether or not to let someone cross the border, etc.

One of the greatest mistakes made by end-users of machine learning algorithms is to assume that there is a simple jump from predictive outcomes to policy changes.

Photo by Kid Circus on Unsplash

To explore this in more depth, let’s take a closer look at recommendation systems and remind ourselves of the success of Amazon and Netflix:

Already, 35 percent of what consumers purchase on Amazon and 75 percent of what they watch on Netflix come from product recommendations based on such algorithms.

And then ask ourselves the following question:

By how much would consumer engagement have decreased without these personalised recommendations?

Predicting outcomes

This may seem like a peculiar question— if 35% of engagement is driven by personalized recommendations, then surely without the personalized recommendations, there would be 35% less engagement?

However, this line of reasoning does not hold true. It doesn’t account for products that consumers would have engaged with whether they had been recommended or not.

To investigate this further, we need to take a brief look at the underlying algorithm in most recommendation systems, collaborative filtering.

Collaborative filtering looks at the taste of similar consumers for a given product to understand what a consumer’s taste for that product might be.

In collaborative filtering, predictions are made about a given users tastes for different products based on the underlying assumption that if two people agree on one issue, they are also likely to agree on a different issue…

If consumers A and B have both watched a Start Wars movie, and B has also watched a Star Trek movie, it is likely that A would also watch a Star Trek movie.

There are several flaws in this line of reasoning, including the assumption that watching a film is a proxy for liking a film; but for now, we will assume that this method effectively predicts consumer taste.

And that is exactly what the model does, it predicts consumers’ tastes. If we were to recommend the Star Trek film to consumer A the model predicts that they will view it.

That’s what the model does. What it doesn’t do is tell you whether or not you should recommend the movie.

Dictating policy

It seems obvious that the products a consumer is most likely to enjoy, as predicted by a high-performing recommendation system, should be recommended to the consumer. However, as we said earlier, this doesn’t account for what the consumer would do whether or not they had seen the recommendations.

In our movie example, perhaps person A is a huge sci-fi fan and was intending on watching the Star Strek film anyway. In that case, the personalized recommendations have reduced the number of clicks for him to get the movie he wants to watch (granted, a positive user experience is not a bad thing), but it hasn’t improved engagement.

Here’s another example:

A and B both buy and lamp, and B then buys a light bulb. A is also recommended a light bulb and buys it.

Again, person A may have bought the light bulb without being recommended it. In fact, in this case, it’s almost a given that they’ll need to buy a light bulb from somewhere!

Photo by Pierre Châtel-Innocenti on Unsplash

And this is what is often misunderstood. End-users of machine learning models often do not understand that the outcomes of a predictive model cannot be directly transferred to policy change.

Decision-making

In order for predictive machine learning models to be used effectively, there needs to be an extra step before policy change. During this step, decisions are made as to how best to use the outcomes of the model effectively to inform policy.

However, even with a model that performs well on paper, this may not be a simple step. There are many different techniques that can be used, in different circumstances, to try and maximize the impact gained from the output of predictive models.

We will look at a few techniques here, across several different applications of machine learning.

Contextual recommendations

Amazon is a champion of personalization. They completely cover the website with product recommendations, but they are all provided with some context, whether “You might like…” or “Customers who bought this item also bought”.

Amazon uses more complex algorithms than basic collaborative filtering, but this means that it can recommend the right products to you at the right time, with the right context.

When you enter the site, it might be most effective to recommend a product you hadn’t seen before, but when you’re on the page for the lamp, it could be useful to know that other customers also bought a light bulb.

A/B testing

For Netflix, every product change goes through rigorous A/B testing before becoming part of the default user experience. In fact, the testing of images associated with titles can increase viewing by more than 20%!

A/B testing is the perfect method for understanding what would happen in different scenarios. For recommendations, it allows us to answer the question “how much they actually increase, not just drive, customer engagement?”. This can be measured by designing a test to compare groups that do and don’t receive personalized recommendations.

Photo by Jason Dent on Unsplash

Earlier, we mentioned predictive churn models. A/B testing can be used here too. When the model predicts that someone is likely to churn, what do you do? You can send them a special discount, offer them an upgrade or do nothing.

A test can be designed to compare the outcomes of different interventions used on customers who are likely to churn to identify which are most effective and economical.

Human assistance

In some cases, the repercussions of an incorrect decision can be too high to create autonomous policies based on the outcomes of machine learning models.

Automated Border Control (ABC) uses facial recognition to compare your face to your passport photo to decide whether or not to let you through the gate. The predicted output from the model is the certainty with which it calculates there is a match.

A policy is created to decide what level of certainty is needed to allow you to automatically enter the country. Should it be a 99% certainty? 80%? If this decision is not made correctly, it could lead to people illegally entering the country! on the other hand, it could lead to reduced efficiency where nobody is let through, and everyone is redirected to the main queue.

Photo by Arlington Research on Unsplash

In practice, there is usually a human in another room to check any images below a predefined threshold. Therefore, the model is working alongside a human to make decisions, but is not entirely autonomous.

Final thoughts

Machine learning is a valuable tool that will only become more prominent in our daily lives. It is often pitched as the solution to organizational problems, from process optimization to consumer engagement.

However, machine learning is only a valuable asset if the predicted outcomes are interpreted correctly and acted upon sensibly.

Machine learning is only a part of the solution, and if considerable thought is not given to the rest of the solution, then any policy changes that result from it can, at best, be a waste of resources and, at worst, cause considerable damage.

The real shame is that in many cases the underlying models perform their task extremely well and if utilized properly could impact real, positive change.