Systems
In November 2016, Google announced that it had incorporated its multilingual neural machine
translation system into Google Translate, marking one of the first success stories of deep artificial
neural networks in production at scale.1 According to Google, with this update, the quality of
translation improved more in a single leap than they had seen in the previous 10 years combined.
This success of deep learning renewed the interest in machine learning (ML) at large. Since then,
more and more companies have turned toward ML for solutions to their most challenging problems.
In just five years, ML has found its way into almost every aspect of our lives: how we access
information, how we communicate, how we work, how we find love. The spread of ML has been
so rapid that it’s already hard to imagine life without it. Yet there are still many more use cases for
ML waiting to be explored in fields such as health care, transportation, farming, and even in
helping us understand the universe.2
Many people, when they hear “machine learning system,” think of just the ML algorithms being
used such as logistic regression or different types of neural networks. However, the algorithm is
only a small part of an ML system in production. The system also includes the business
requirements that gave birth to the ML project in the first place, the interface where users and
developers interact with your system, the data stack, and the logic for developing, monitoring, and
updating your models, as well as the infrastructure that enables the delivery of that logic. Figure 1-
1 shows you the different components of an ML system and in which chapters of this book they
will be covered.
THE RELATIONSHIP BETWEEN
MLOPS AND ML SYSTEMS DESIGN
Ops in MLOps comes from DevOps, short for Developments and Operations. To operationalize something
means to bring it into production, which includes deploying, monitoring, and maintaining it. MLOps is a
set of tools and best practices for bringing ML into production.
ML systems design takes a system approach to MLOps, which means that it considers an ML system
holistically to ensure that all the components and their stakeholders can work together to satisfy the
specified objectives and requirements.
,Figure 1-1. Different components of an ML system. “ML algorithms” is usually what people think of when they say machine learning, but it’s only
a small part of the entire system.
There are many excellent books about various ML algorithms. This book doesn’t cover any
specific algorithms in detail but rather helps readers understand the entire ML system as a whole.
In other words, this book’s goal is to provide you with a framework to develop a solution that best
works for your problem, regardless of which algorithm you might end up using. Algorithms might
become outdated quickly as new algorithms are constantly being developed, but the framework
proposed in this book should still work with new algorithms.
The first chapter of the book aims to give you an overview of what it takes to bring an ML model
to production. Before discussing how to develop an ML system, it’s important to ask a fundamental
question of when and when not to use ML. We’ll cover some of the popular use cases of ML to
illustrate this point.
After the use cases, we’ll move on to the challenges of deploying ML systems, and we’ll do so by
comparing ML in production to ML in research as well as to traditional software. If you’ve been
in the trenches of developing applied ML systems, you might already be familiar with what’s
written in this chapter. However, if you have only had experience with ML in an academic setting,
this chapter will give an honest view of ML in the real world and set your first application up for
success.
When to Use Machine Learning
As its adoption in the industry quickly grows, ML has proven to be a powerful tool for a wide
range of problems. Despite an incredible amount of excitement and hype generated by people both
, inside and outside the field, ML is not a magic tool that can solve all problems. Even for problems
that ML can solve, ML solutions might not be the optimal solutions. Before starting an ML project,
you might want to ask whether ML is necessary or cost-effective.3
To understand what ML can do, let’s examine what ML solutions generally do:
Machine learning is an approach to (1) learn (2) complex patterns from (3) existing data and use
these patterns to make (4) predictions on (5) unseen data.
We’ll look at each of the italicized keyphrases in the above framing to understand its implications
to the problems ML can solve:
1. Learn: the system has the capacity to learn
A relational database isn’t an ML system because it doesn’t have the capacity to learn. You
can explicitly state the relationship between two columns in a relational database, but it’s
unlikely to have the capacity to figure out the relationship between these two columns by
itself.
For an ML system to learn, there must be something for it to learn from. In most cases, ML
systems learn from data. In supervised learning, based on example input and output pairs,
ML systems learn how to generate outputs for arbitrary inputs. For example, if you want
to build an ML system to learn to predict the rental price for Airbnb listings, you need to
provide a dataset where each input is a listing with relevant characteristics (square footage,
number of rooms, neighborhood, amenities, rating of that listing, etc.) and the associated
output is the rental price of that listing. Once learned, this ML system should be able to
predict the price of a new listing given its characteristics.
2. Complex patterns: there are patterns to learn, and they are complex
ML solutions are only useful when there are patterns to learn. Sane people don’t invest
money into building an ML system to predict the next outcome of a fair die because there’s
no pattern in how these outcomes are generated.4 However, there are patterns in how stocks
are priced, and therefore companies have invested billions of dollars in building ML
systems to learn those patterns.
Whether a pattern exists might not be obvious, or if patterns exist, your dataset or ML
algorithms might not be sufficient to capture them. For example, there might be a pattern
in how Elon Musk’s tweets affect cryptocurrency prices. However, you wouldn’t know
until you’ve rigorously trained and evaluated your ML models on his tweets. Even if all
your models fail to make reasonable predictions of cryptocurrency prices, it doesn’t mean
there’s no pattern.
Consider a website like Airbnb with a lot of house listings; each listing comes with a zip
code. If you want to sort listings into the states they are located in, you wouldn’t need an