Presentation

Reliable Machine Learning

Rating

Sold

Pages

310

Uploaded on

02-08-2024

Written in

2021/2022

"Whether you're part of a small startup or a multinational corporation, this practical book shows data scientists, software and site reliability engineers, product managers, and business owners how to run and establish ML reliably, effectively, and accountably within your organization. You'll gain insight into everything from how to do model monitoring in production to how to run a well-tuned model development team in a product organization. By applying an SRE mindset to machine learning, authors and engineering professionals Cathy Chen, Kranti Parisa, Niall Richard Murphy, D. Sculley, Todd Underwood, and featured guest authors show you how to run an efficient and reliable ML system. Whether you want to increase revenue, optimize decision making, solve problems, or understand and influence customer behavior, you'll learn how to perform day-to-day ML tasks while keeping the bigger picture in mind."

Show more Read less

Institution

Course

Content preview

,Chapter 1. Introduction
We begin with a model, or framework, for adding machine learning (ML) to a website, widely
applicable across a number of domains—not just this example. This model we call the ML loop.

The ML Lifecycle
ML applications are never really done. They also don’t start or stop in any one place, either
technically or organizationally. ML model developers often hope their lives will be simple, and
they’ll have to collect data and train a model only once, but it rarely happens that way.

A simple thought experiment can help us understand why. Suppose we have an ML model, and
we are investigating whether the model works well enough (according to a certain threshold) or
doesn’t. If it doesn’t work well enough, data scientists, business analysts, and ML engineers will
typically collaborate on how to understand the failures and improve upon them. This involves, as
you might expect, a lot of work: perhaps modifying the existing training pipeline to change some
features, adding or removing some data, and restructuring the model in order to iterate on what has
already been done.

Conversely, if the model is working well, what usually happens is that organizations get excited.
The natural thought is that if we can make so much progress with one, naïve attempt, imagine how
much better we can do if we work harder on it and get more sophisticated. This typically
involves—you guessed it—modifying the existing training pipeline, changing features, adding or
removing data, and possibly even restructuring the model. Either way, more or less the same work
is done, and the first model we make is simply a starting point for what we do next.

Let’s look at the ML lifecycle, or loop, in more detail (Figure 1-1).

,Figure 1-1. ML lifecycle

ML systems start with data, so let’s start on the left side of the diagram and go through this loop
in more detail. We will specifically look at each stage and explain, in the context of our shopping
site, who in the organization is involved in each stage and the key activities they will carry out.

, Data Collection and Analysis
First, the team takes stock of the data it has and starts to assess that data. The team members need
to decide whether they have all the data they require, and then prioritize the business or
organizational uses to which they can put the data. They must then collect and process the data.

The work associated with data collection and analysis touches almost everyone in the company,
though how precisely it touches them often varies a lot among firms. For example, business
analysts could live in the finance, accounting, or product teams, and use platform-provided data
every day. Or data and platform engineers might build reusable tools for ingesting, cleaning, and
processing data, though they might not be involved in business decisions. (In a smaller company,
perhaps they’re all just software or product engineers.) Some places have formal data engineering
roles. Others have data scientists, product analysts, and user experience (UX) researchers all
consuming the output of work from this phase.

For YarnIt, our web shop operator, most of the organization is involved in this step. This includes
the business and product teams, which will know best the highest-impact areas of the business for
optimization. For example, they can determine whether a small increase in profit for every sale is
more important to the business, or whether instead it makes more sense to slightly increase order
frequency. They can point to problems or opportunities with low- and high-margin products, and
talk about segmentation of the customers into more and less profitable customers. Product and ML
engineers will also be involved, thinking about what to do with all of this data, and site reliability
engineers (SREs) will make recommendations and decisions about the overall pipeline in order to
make it more monitorable, manageable, and reliable.

Managing data for ML is a sufficiently involved topic that we’ve devoted Chapter 2 to data
management principles and later discuss training data in Chapters 4 and 10. For now, it is useful
to assume that the proper design and management of a data collection and processing system is at
the core of any good ML system. Once we have the data in a suitable place and format, we
will begin to train a model.

ML Training Pipelines
ML training pipelines are specified, designed, built, and used by data engineers, data scientists,
ML engineers, and SREs. They are the special-purpose extract, transform, load (ETL) data
processing pipelines that read the unprocessed data and apply the ML algorithm and structure of
our model to the data.1 Their job is to consume training data and produce completed models, ready
for evaluation and use. These models are either produced complete at once or incrementally in a
variety of ways—some models are incomplete in that they cover only some of the available data,
and others are incomplete in scope as they are designed to cover only part of the ML learning as a
whole.

Training pipelines are one of the only parts of our ML system that directly and explicitly use ML-
specific algorithms, although even here these are most commonly packaged up in relatively mature
platforms and frameworks such as TensorFlow and PyTorch.

Report Copyright Violation

Written for

Course: Reliable Machine Learning

All documents for this subject (1)

Document information

Uploaded on: August 2, 2024
Number of pages: 310
Written in: 2021/2022
Type: PRESENTATION
Person: Unknown

Subjects

reliable machine learning

$4.99

Get access to the full document:

Written by students who passed

Immediately available after payment

Read online or as PDF

Get to know the seller

RobertCuong

Get to know the seller

RobertCuong Telecommunication

View profile

Sold

Member since

3 year

Number of followers

Documents

225

Last sold

GPON and WiFi

+ SDH solution based on Fujitsu/Alcatel/Huawei devices in deployment and troubleshoot + Switching and Routing network fundamental and advance + GPON solution with deep knowledge of PLOAM/OMCI, activation procedure. Analysis of Private/Public OMCI + WiFi solution with WiFi Management/Control/Data. WiFi bandsteering, WiFi mesh, and WiFi 6, 6E, 7, ...

0.0

0 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller RobertCuong. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $4.99. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 47251 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

Reliable Machine Learning

Content preview

Written for

Document information

Subjects

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Working on your references?

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?