Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Presentation

Reliable Machine Learning

Rating
-
Sold
-
Pages
310
Uploaded on
02-08-2024
Written in
2021/2022

"Whether you're part of a small startup or a multinational corporation, this practical book shows data scientists, software and site reliability engineers, product managers, and business owners how to run and establish ML reliably, effectively, and accountably within your organization. You'll gain insight into everything from how to do model monitoring in production to how to run a well-tuned model development team in a product organization. By applying an SRE mindset to machine learning, authors and engineering professionals Cathy Chen, Kranti Parisa, Niall Richard Murphy, D. Sculley, Todd Underwood, and featured guest authors show you how to run an efficient and reliable ML system. Whether you want to increase revenue, optimize decision making, solve problems, or understand and influence customer behavior, you'll learn how to perform day-to-day ML tasks while keeping the bigger picture in mind."

Show more Read less
Institution
Course

Content preview

,Chapter 1. Introduction
We begin with a model, or framework, for adding machine learning (ML) to a website, widely
applicable across a number of domains—not just this example. This model we call the ML loop.


The ML Lifecycle
ML applications are never really done. They also don’t start or stop in any one place, either
technically or organizationally. ML model developers often hope their lives will be simple, and
they’ll have to collect data and train a model only once, but it rarely happens that way.

A simple thought experiment can help us understand why. Suppose we have an ML model, and
we are investigating whether the model works well enough (according to a certain threshold) or
doesn’t. If it doesn’t work well enough, data scientists, business analysts, and ML engineers will
typically collaborate on how to understand the failures and improve upon them. This involves, as
you might expect, a lot of work: perhaps modifying the existing training pipeline to change some
features, adding or removing some data, and restructuring the model in order to iterate on what has
already been done.

Conversely, if the model is working well, what usually happens is that organizations get excited.
The natural thought is that if we can make so much progress with one, naïve attempt, imagine how
much better we can do if we work harder on it and get more sophisticated. This typically
involves—you guessed it—modifying the existing training pipeline, changing features, adding or
removing data, and possibly even restructuring the model. Either way, more or less the same work
is done, and the first model we make is simply a starting point for what we do next.

Let’s look at the ML lifecycle, or loop, in more detail (Figure 1-1).

,Figure 1-1. ML lifecycle


ML systems start with data, so let’s start on the left side of the diagram and go through this loop
in more detail. We will specifically look at each stage and explain, in the context of our shopping
site, who in the organization is involved in each stage and the key activities they will carry out.

, Data Collection and Analysis
First, the team takes stock of the data it has and starts to assess that data. The team members need
to decide whether they have all the data they require, and then prioritize the business or
organizational uses to which they can put the data. They must then collect and process the data.

The work associated with data collection and analysis touches almost everyone in the company,
though how precisely it touches them often varies a lot among firms. For example, business
analysts could live in the finance, accounting, or product teams, and use platform-provided data
every day. Or data and platform engineers might build reusable tools for ingesting, cleaning, and
processing data, though they might not be involved in business decisions. (In a smaller company,
perhaps they’re all just software or product engineers.) Some places have formal data engineering
roles. Others have data scientists, product analysts, and user experience (UX) researchers all
consuming the output of work from this phase.

For YarnIt, our web shop operator, most of the organization is involved in this step. This includes
the business and product teams, which will know best the highest-impact areas of the business for
optimization. For example, they can determine whether a small increase in profit for every sale is
more important to the business, or whether instead it makes more sense to slightly increase order
frequency. They can point to problems or opportunities with low- and high-margin products, and
talk about segmentation of the customers into more and less profitable customers. Product and ML
engineers will also be involved, thinking about what to do with all of this data, and site reliability
engineers (SREs) will make recommendations and decisions about the overall pipeline in order to
make it more monitorable, manageable, and reliable.

Managing data for ML is a sufficiently involved topic that we’ve devoted Chapter 2 to data
management principles and later discuss training data in Chapters 4 and 10. For now, it is useful
to assume that the proper design and management of a data collection and processing system is at
the core of any good ML system. Once we have the data in a suitable place and format, we
will begin to train a model.

ML Training Pipelines
ML training pipelines are specified, designed, built, and used by data engineers, data scientists,
ML engineers, and SREs. They are the special-purpose extract, transform, load (ETL) data
processing pipelines that read the unprocessed data and apply the ML algorithm and structure of
our model to the data.1 Their job is to consume training data and produce completed models, ready
for evaluation and use. These models are either produced complete at once or incrementally in a
variety of ways—some models are incomplete in that they cover only some of the available data,
and others are incomplete in scope as they are designed to cover only part of the ML learning as a
whole.

Training pipelines are one of the only parts of our ML system that directly and explicitly use ML-
specific algorithms, although even here these are most commonly packaged up in relatively mature
platforms and frameworks such as TensorFlow and PyTorch.

Written for

Course

Document information

Uploaded on
August 2, 2024
Number of pages
310
Written in
2021/2022
Type
PRESENTATION
Person
Unknown

Subjects

$4.99
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF

Get to know the seller
Seller avatar
RobertCuong

Get to know the seller

Seller avatar
RobertCuong Telecommunication
Follow You need to be logged in order to follow users or courses
Sold
-
Member since
3 year
Number of followers
0
Documents
225
Last sold
-
GPON and WiFi

+ SDH solution based on Fujitsu/Alcatel/Huawei devices in deployment and troubleshoot + Switching and Routing network fundamental and advance + GPON solution with deep knowledge of PLOAM/OMCI, activation procedure. Analysis of Private/Public OMCI + WiFi solution with WiFi Management/Control/Data. WiFi bandsteering, WiFi mesh, and WiFi 6, 6E, 7, ...

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Working on your references?

Create accurate citations in APA, MLA and Harvard with our free citation generator.

Working on your references?

Frequently asked questions