Chapter 1: Collecting, Labeling and Validating Data (available)
Chapter 2: Feature Engineering and Selection (available)
Chapter 3: Data Journey and Data Storage (available)
Chapter 4: Advanced Labeling, Automation, and Data Preprocessing (available)
Chapter 5: Model Resource Management Techniques (available)
Chapter 6: High Performance Modeling (available)
Chapter 7: Model Analysis (available)
Chapter 8: Interoperability (available)
Chapter 9: Neural Architecture Search (available)
Chapter 10: Introduction to Model Serving (unavailable)
Chapter 11: Model Serving Patterns and Infrastructure (unavailable)
Chapter 12: Model Management and Delivery (unavailable)
Chapter 13: Model Monitoring and Logging (unavailable)
Chapter 14: Privacy and Legal Requirements (unavailable)
Chapter 15: Productionalizing Machine Learning Pipelines (unavailable)
Chapter 16: Classifying Unstructured Texts (unavailable)
Chapter 17: Image Classification (unavailable)
Chapter 1. Introduction to Machine Learning
Production Systems
A NOTE FOR EARLY RELEASE
READERS
,With Early Release ebooks, you get books in their earliest form—the author’s raw and unedited
content as they write—so you can take advantage of these technologies long before the official
release of these titles.
This will be the 1st chapter of the final book. Please note that the GitHub repo will be made active
later on.
If you have comments about how we might improve the content and/or examples in this book, or
if you notice missing material within this chapter, please reach out to the author
at .
The field of machine learning engineering is so vast that it can be easy to get lost in the different
steps that are necessary to get a model from an experiment into a production deployment. Over the
last few years, machine learning, novel machine learning concepts such as attention, and more
recently large language models (LLMs), have been in the news almost every day. However, very
little discussion has focused on production machine learning, which brings machine learning into
products and applications.
Production Machine Learning covers all areas of machine learning beyond simply training a
machine learning model. Production Machine Learning can be viewed as a combination of
machine learning development and modern software development practices. Machine learning
pipelines build the foundation for Production Machine Learning. Implementing and executing
machine learning pipelines are key aspects of production machine learning.
In this chapter, we will introduce the concept of Production Machine Learning. We’ll also
introduce what machine learning pipelines are, look at their benefits, and walk through the steps
of a machine learning pipeline.
What Is Production Machine Learning?
In an academic or research setting, modeling is relatively straightforward. Typically you have a
data set (often a standard data set that is supplied to you, already cleaned and labeled), and you’re
going to use that dataset to train your model and evaluate the results.
The result that you’re trying to achieve is simply a model that makes good predictions. You’ll
probably go through a few iterations to fully optimize the model, but once you’re satisfied with
the results then typically you’re done.
Production ML requires a lot more than just a model. We’ve found that a model is typically only
about 5% of the code that is required to put an ML application into production. Over their lifetimes
Production ML applications will be deployed, maintained, and improved, so that you can deliver
a consistent high-quality experience to your users.
Let’s look at some of the differences between machine learning modeling in a non-production
environment (typically research or academic), and machine learning in a production environment.
, In an academic or research environment you’re typically using a static dataset. Production
ML uses real-world data, which is dynamic and usually shifting.
The design priority for academic or research ML is usually the highest accuracy over the
entire training set. But the design priority for production ML is fast inference, fairness, and
good interpretability - as well as acceptable accuracy - and minimizing cost.
Model training for research ML is based on a single optimal result, and the tuning and
training necessary to achieve it. Production ML requires continuous monitoring,
assessment, and retraining.
Interpretability and fairness are very important for any ML modeling, but they are
absolutely crucial for production ML.
And finally, while the main challenge of academic and research ML is finding and tuning
a high accuracy model, the main challenge for production ML is that accuracy plus
everything else - the entire system.
In a Production ML environment, you’re not just producing a single result, you’re developing a
product or service that is often a mission-critical part of your offering.
For example, in Production ML, if you’re doing supervised learning, then you need to make sure
that your labels are accurate. You also need to make sure that your training dataset has examples
which cover the same feature space as the requests that your model will receive. You also want to
reduce the dimensionality of your feature vector to optimize your system performance while
retaining or enhancing the predictive information in your data.
Throughout all of this you need to consider and measure the fairness of your data and model,
especially for rare conditions. In fields such as healthcare, for example, rare but important
conditions may be absolutely critical to success.
On top of all of that, you’re putting a piece of software into production. That requires a system
design that includes all of the things that are required for any production software deployment.
You need to consider:
Data preprocessing methods
Parallelized model training setups
Repeatable model analysis
Scalable model deployment
Your Production ML system needs to run automatically, so that you’re continuously monitoring
your model performance, ingesting new data, retraining as needed, and redeploying to maintain or
improve your performance.
And of course, in building an ML Production system, like any production system, you need to try
to do all of this at the minimum cost, while producing the maximum performance. It might seem
daunting, but the good news is that there are well-established tools and methodologies for doing
this.