Presentation

Building Machine Learning Systems with a Feature Store

Rating

Sold

Pages

100

Uploaded on

02-08-2024

Written in

2022/2023

Get up to speed on a new unified approach to building machine learning (ML) systems with batch data, real-time data, and large language models (LLMs) based on independent, modular ML pipelines and a shared data layer. With this practical book, data scientists and ML engineers will learn in detail how to develop, maintain, and operate modular ML systems.

Show more Read less

Institution

Course

Content preview

,Brief Table of Contents (Not Yet Final)
Preface

Introduction

Chapter 1: Building Machine Learning Systems

Chapter 2: Machine Learning Pipelines

Chapter 3: Your Friendly Neighborhood Air Quality Forecasting Service (available)

Chapter 4: Feature Stores (available)

Chapter 5: Hopsworks Feature Store (unavailable)

Chapter 6: Model-Independent Transformations (unavailable)

Chapter 7: Model-Dependent Transformations (unavailable)

Chapter 8: Batch Feature Pipelines (unavailable)

Chapter 9: Streaming Feature Pipelines (unavailable)

Chapter 10: Training Pipelines (unavailable)

Chapter 11: Inference Pipelines (unavailable)

Chapter 12: MLOps (unavailable)

Chapter 13: Feature and Model Monitoring (unavailable)

Chapter 14: Vector Databases (unavailable)

Chapter 15: Case Study: Personalized Recommendations (unavailable)

Chapter 1. Building Machine Learning
Systems
A NOTE FOR EARLY RELEASE
READERS

,With Early Release ebooks, you get books in their earliest form—the author’s raw and unedited
content as they write—so you can take advantage of these technologies long before the official
release of these titles.

This will be the 1st chapter of the final book. The GitHub repo can be found
at https://github.com/featurestorebook/mlfs-book.

If you have comments about how we might improve the content and/or examples in this book, or
if you notice missing material within this chapter, please reach out to the editor
at .

Imagine you have been tasked with producing a financial forecast for the upcoming financial year.
You decide to use machine learning as there is a lot of available data, but, not unexpectedly, the
data is spread across many different places—in spreadsheets and many different tables in the data
warehouse. You have been working for several years at the same organization, and this is not the
first time you have been given this task. Every year to date, the final output of your model has
been a PowerPoint presentation showing the financial projections. Each year, you trained a new
model, and your model made one prediction and you were finished with it. Each year, you started
effectively from scratch. You had to find the data sources (again), re-request access to the data to
create the features for your model, and then dig out the Jupyter notebook from last year and update
it with new data and improvements to your model.

This year, however, you realize that it may be worth investing the time in building the scaffolding
for this project so that you have less work to do next year. So, instead of delivering a powerpoint,
you decide to build a dashboard. Instead of requesting one-off access to the data, you build feature
pipelines that extract the historical data from its source(s) and compute the features (and labels)
used in your model. You have an insight that the feature pipelines can be used to do two things:
compute both the historical features used to train your model and compute the features that will be
used to make predictions with your trained model. Now, after training your model, you can connect
it to the feature pipelines to make predictions that power your dashboard. You thank yourself one
year later when you only have to tweak this ML system by adding/updating/removing features, and
training a new model. The time you saved in grunt data source, cleaning, and feature engineering,
you now use to investigate new ML frameworks and model architectures, resulting in a much
improved financial model, much to the delight of your boss.

The above example shows the difference between training a model to make a one-off prediction
on a static dataset versus building a batch ML system - a system that automates reading from data
sources, transforming data into features, training models, performing inference on new data with
the model, and updating a dashboard with the model’s predictions. The dashboard is the value
delivered by the model to stakeholders.

If you want a model to generate repeated value, the model should make predictions more than once.
That means, you are not finished when you have evaluated the model’s performance on a test set
drawn from your static dataset. Instead you will have to build ML pipelines, programs that
transform raw data into features, and feed features to your model for easy retraining, and feed new

, features to your model so that it can make predictions, generating more value with every prediction
it makes.

You have embarked on the same journey from training models on static datasets to building ML
systems. The most important part of that journey is working with dynamic data, see figure 1. This
means moving from static data, such as the hand curated datasets used in ML competitions found
on Kaggle.com, to batch data, datasets that are updated at some interval (hourly, daily, weekly,
yearly), to real-time data.

Figure 1-1. A ML system that only generates a one-off prediction on a static dataset generates less business
value than a ML system that can make predictions on a schedule with batches of input data. ML systems
that can make predictions with real-time data are more technically challenging, but can create even more
business value.

A ML system is a software system that manages the two main life cycles for a model: training and
inference (making predictions).

The Evolution of Machine Learning Systems
In the mid 2010s, revolutionary ML Systems started appearing in consumer Internet applications,
such as image tagging in Facebook and Google Translate. The first generation of ML systems were
either batch ML systems that make predictions on a schedule, see figure 2, or interactive online
ML systems that make predictions in response to user actions, see figure 3.

Report Copyright Violation

Written for

Course: Building Machine Learning Systems with a Feature S

All documents for this subject (1)

Document information

Uploaded on: August 2, 2024
Number of pages: 100
Written in: 2022/2023
Type: PRESENTATION
Person: Unknown

Subjects

building machine learning systems with a feature

$4.99

Get access to the full document:

Written by students who passed

Immediately available after payment

Read online or as PDF

Get to know the seller

RobertCuong

Get to know the seller

RobertCuong Telecommunication

View profile

Sold

Member since

3 year

Number of followers

Documents

225

Last sold

GPON and WiFi

+ SDH solution based on Fujitsu/Alcatel/Huawei devices in deployment and troubleshoot + Switching and Routing network fundamental and advance + GPON solution with deep knowledge of PLOAM/OMCI, activation procedure. Analysis of Private/Public OMCI + WiFi solution with WiFi Management/Control/Data. WiFi bandsteering, WiFi mesh, and WiFi 6, 6E, 7, ...

0.0

0 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller RobertCuong. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $4.99. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 48766 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

Building Machine Learning Systems with a Feature Store

Content preview

Written for

Document information

Subjects

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Working on your references?

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?