Over the last 10 years, searches for the terms “machine learning” and “data science” have increased 12x based on Google Trends data. With popularity can come a form of irrational excitement where companies may want to do machine learning for the sake of doing machine learning.

Building products powered by machine learning requires a delicate balance of delivering early incremental value to satisfy business partners while not watering down the impact by using too little data or features in the models. As product managers, science managers, or scientists, we need to manage internal expectations while still keeping the momentum of support for ML initiatives.

To make this happen, we should focus on the customer need instead of only concentrating on the science deliverable.

But who is the customer of the science deliverable? If we are developing a customer lapse propensity model, then the customer could be a retention marketing team.

If we are building a price elasticity model, then the customer could be a product price strategy team. In another instance, if we are launching a website chatbot, then the customer could be a digital support team. However, in all of these instances, the “customer” of these machine learning models could also be the end business customer: the recipient of the retention promotion, the person willingly paying the price for the product, or the user asking the chatbot a question.

The subtle distinction is the level of direct interaction a particular machine learning model, or data product, has with the end business customer. An early version of the data product may produce a recommendation and an internal user may implement the recommendation.

A later version could be fully automated to interact with the end business customer. However, just because a data product is initially scoped to support internal users, future development could result in a data product that directly interacts with the end business customer. As such, we should always “work backwards” from the end customer.

There are a few articles that provide ideas on how to think about data products, or any product that uses data to deliver its core value proposition. By this definition, any products that use machine learning can be considered data products.

One article [1] classifies data products into three types: data as a service, data as insights, and data-enhanced products. Another

[2] also classifies data products into three types: data for benchmarking, data for predictions, and data for recommendation systems. Yet another document

[3] breaks data products into: raw data, derived data, algorithms, decision support, and automated decision making. In each of these cases, data products are classified based on the output of the product. Instead of focusing on the output, this article will introduce a new customer-based framework for classifying data products.

This framework allows us to more easily consider both the technical as well as the organizational constraints associated with data product development.

Working backwards from the customer helps us consider incremental development based on customer needs. It also allows us to set expectations of the value proposition.

Before we start, here is summary of the framework.

Three Degrees away from the Customer…

“Third degree data products” are separated from the end customer by two or more layers of human interaction or other science-driven inputs. The output of customer clustering, offline fraud scoring, product review sentiment, or retention probability models are all examples of third degree data products. This type of data product exists in the form of data sets, science notebooks, or visualizations.

This is the heart of eliciting insights from data. Without these data products, which are consumed by scientists and analysts, there cannot be any realized future value. Third degree data products are often the most exciting to discuss as a science team (“we are going to predict x!”) but can be challenging to support from a business perspective.

This is because it is often difficult to measure the direct financial impact derived from this type of product. Yet, third degree data products provide real option value. This real option is the potential to expand into a first or second degree data product that will have measurable value.

Often the real option value is significant because a single third degree data product can support many areas. For example, a customer lifetime value model could support additional data products for marketing segmentation, promotion targeting, or customer support queue prioritization.

Two Degrees away from the Customer…

“Second degree data products” are separated from the end customer by one layer of human interaction. These products are often used by internal business teams and can include UIs incorporating both a forecast and an optimization. Second degree data products aren’t insights alone but rather recommendations. These recommendations could also include alternatives with sensitivity analysis.

Sometimes, these products can be developed as a prototype or first version that later leads to a first degree data product. In other situations, we may never be able to capture the appropriate data necessary to drive a holistic decision so there will always be a need to have a ‘human influence’ to a science-based recommendation.

In yet other cases, if the decision that the product supports is rarely made and sizable in impact, there may be too big of a risk in making the product more hands-off. The quantified value associated with second degree data products can be squishy.

This is because one needs to answer the question: what is the value of making a smarter decision? The value is a blend of the impact associated with a science recommendation and the human decision. The primary risk of second degree data products is the lack of adoption.

If the recommended decisions aren’t explainable, aren’t intuitive, or need significant manual influence, then the product will likely be deprecated.

One Degree away from the Customer…

We can classify products where the outputs of machine learning models directly engage with the end customer as “first degree data products”. This means that the ML output is only one degree away from the user. Examples of these data products include personalized user experiences, product recommendations, chat bots, and driver navigation. With these types of data products, we can assess the customer impact and measure business value through AB testing.

Having the ability to quantify value makes these types of data products easier to prioritize from a business perspective. However, because these products interact with customers, there will always be a risk of damaging customer trust.

For example, a first degree data product could provide irrelevant recommendations, incorrect driving directions, extreme price changes, or in the case of IBM’s Watson, provide erroneous cancer treatment recommendations to doctors. Launching a first degree data product requires engineering support and there are many dependencies that must occur. Often, there needs to be an adaptable UX that can support experimentation along with automated data collection.

For these data products, we can bootstrap early experimentation using business rules with the collected data later used to support model training and enhancement. Over time as this product collects direct feedback from the customer, the machine learning model can provide more relevant output.

Distinguishing the Data Product from the Science Model…

Let’s walk through a quick example of how this framework could work in practice. In this example, we are working for a company that is early in their analytics journey and we are tasked with using machine learning to reduce customer churn. The “end customer” of the final, first degree data product are customers that are likely to churn.

This product could be a system that automatically identifies customers with a high propensity to churn and send them a targeted promotional offer. Getting to this point requires not just science work but engineering work as well.

As a result, it will take time to realize value of the product which could increase the risk of losing executive support if there are any development delays. As we build the customer-level retention probability model that will power this first degree data product, we could also deliver a second degree data product. This second degree data product could be a ‘Churn Prevention Targeting Tool’ that could be used by marketing to identify customers that have high churn likelihood scores.

With this information, the marketing team could manually provide promotions to reduce churn risk. In breaking up delivery, the organization can realize value sooner as well as potentially experiment to determine how valid the churn predictions are. Lastly, a third degree data product is the output of the customer churn science model.

This output can not only support our goal of reducing customer churn but could also serve as an input to other science models. For example, a customer churn or retention probability model can be an input to a customer lifetime value model.

Example Data Product Classification

So, to recap:

· First degree data products provide a science-based output directly to the customer for them to act on.

· Second degree data products provide a science-based output to the business for employees to act on that can then be provided to the customer.

· Third degree data products provide a science-based output to a second degree data product. A third degree data product could also provides insights, without recommendations, to analysts or scientists.

It can sometimes be a long journey to realize value from data products: both for the customer and the business. Photo from Pexels by Flo Maderebner

A final note is that we can extend this framework further to include one more data product type: “zero degree data products”. Zero degree data products use their science-based output to act on behalf of the customer. This is different than a first degree data product that requires the customer to still act for themselves.

Examples of zero degree data products could include self-driving cars, self-flying drones, or other autonomous machinery. There is a key technical difference between this type of data product and the other three mentioned. Zero degree data products are often constructed with deep learning. By comparison, we often build the other three categories of data products using machine learning algorithms.

In conclusion, when developing an AI strategy or building a ML project roadmap, it is easy to become hyper-aspirational. Organizations often envision first degree data products when discussing ML solutions. In reality, many data products need to start as third degree data products and incrementally become more sophisticated.

This customer-centric data framework isn’t a panacea to guarantee a quality science product but it can help us in product definition. It also helps us understand how to think in terms of agile development. In doing so, we can minimize scope creep while first providing value to internal business partners and following incremental delivery, to external customers.