From powering self-driving cars at companies like Tesla to beating up on professionals and some of the world’s most challenging games, machine learning is making its way into software applications across every industry. Used properly, machine learning can provide compelling advantages over more traditional applications, except these advantages come at a cost.

In 2015, a group of researchers from Google released a paper titled, hidden technical debt in machine learning systems. In this article, I will be highlighting the key points from that paper and providing some easy to understand examples along the way. If you want to check out the original paper, and I suggest that you do, here is the link: https://papers.nips.cc/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf

If you are not familiar with what technical debt is, it is a metaphor used in software engineering to describe the effect that making a particular design choice will have in terms of generating future work.

For example, to meet a critical deadline, you might implement a quick and brittle solution knowing full well that in the future, you will need to replace that solution or refactor it into a more robust implementation. The paper lays out a variety of reasons that machine learning systems incur more technical debt than their traditional software system counterparts, and I am going to highlight the top few.

Complex Models Erode Boundaries

What does this mean? When writing software, it is best practice to isolate different portions of the codebase from each other. For example, let us imagine we are operating an e-commerce website.

We might have one module associated with containing the user accounts and another related to handling the shopping cart and its functionality. If those modules are isolated, it enables them to be worked on simultaneously by different people or teams without having any issues.

But because our site is a leading technology company, we want to start working on building a machine learning product recommender system to boost sales.

There are lots of input signals we could use to collect and feed into this model, including information about the user account such as age, gender, address, e.t.c, and information from the shopping cart such as the history of the items that were added and removed.

Just like that, we have started to break down that strong isolation between the two different modules from earlier. Also, future models are likely going to depend on many of the same signals leading to a web of dependencies in which changing one subsystem now ripples out to impact many others.

Data Dependencies Cost More than Code Dependencies

Let us examine a few of those data dependencies within the context of our hypothetical product recommendation system. The first input signal might be statistics about the entire user base of the website and their behavior in terms of which products they look at and which products they end up buying.

The second input signal might be the output of another machine learning model that is predicting whether or not this particular user is a voracious reader or not. A third signal might be the product identifier or ID, which encodes things like the product category, including whether or not the product is a book or a clothing item, or maybe an electronic item.

However, because software changes over time, that product ID system may have changed recently. So we could even have a fourth input corresponding to that new product ID, whereas the first one is the legacy product ID. Now, how do these signals represent potential sources of technical debt? Let’s imagine a few cases.

One, that upstream machine learning model might be updated, modifying the input signal. Two, the general user base of the site might drift over time as the marketing emphasis shifts from book sales to one more focused on clothing and or electronics. Three, that legacy product ID system might be shut off entirely, thus breaking our model completely.

These changes would, at a minimum, warrant retraining of our product recommendation engine to avoid decreasing accuracy or might even require rewriting the entire model to adapt to a change in an upstream dependency. In traditional software, there are static code analysis tools that are built specifically to examine dependency graphs, and those can help to identify and triage related issues. This type of tooling is much less widespread for data dependencies.

Machine Learning System Anti-Patterns

In addition to the coupling and data-related challenges described so far, there are also some design anti-patterns that are common in machine learning systems and can lead to incurring a lot of technical debt. One such pattern is the use of excessive glue code. Oftentimes the core machine learning algorithms are implemented as general-purpose, self-contained packages.

While this may seem like a good idea because people can reuse those implementations, the amount of code that is required to wrangle the input and output data into the right formats can significantly outweigh the code of the algorithm itself. This mismatch can slow down development and make tweaking the algorithm difficult.

Also, because of the experimental nature of machine learning system development, it can be tempting to perform experiments using branching or conditional code paths. The cost of each of these individual experiments is low.

The problem is that over time if they accumulate, they can become a nightmare to maintain and slow the pace of progress of the entire system.

So at this point, should we give up on machine learning in favor of writing traditional software? Of course not! While machine learning systems can incur hidden technical debt, there are some ways that we can start to address the highlighted challenges.

Monitoring

Setting up automated monitors that will fire alerts when things go wrong can be the first line of defense and identifying problems. For example, we can track the distribution of the predicted labels and compare that to the distribution of observed labels to identify potential prediction biases in our models and how those might change over time.

Data and Metadata Versioning

Another technique is to start tracking specific versions of our data, as well as metadata associated with the model, such as the hyper-parameters used for training. Doing this can greatly improve the reproducibility of the model training process and make it faster to track down issues, should they occur.

The Use of Machine Learning Tools

Several tools are being developed that can help address certain pain points from across the machine learning engineering life cycle. These include products like weights and biases for experiment management, MLflow for model management, or Kubeflow for entire life cycle support.

I hope this article has been of help. Are you using machine learning in your work? If so, how do you manage the related technical debt? Feel free to share the challenges that you have encountered and, or solutions that you have come up with. Someone might learn something in the process.