Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Presentation

Building Machine Learning Systems with a Feature Store

Rating
-
Sold
-
Pages
100
Uploaded on
02-08-2024
Written in
2022/2023

Get up to speed on a new unified approach to building machine learning (ML) systems with batch data, real-time data, and large language models (LLMs) based on independent, modular ML pipelines and a shared data layer. With this practical book, data scientists and ML engineers will learn in detail how to develop, maintain, and operate modular ML systems.

Show more Read less
Institution
Course

Content preview

,Brief Table of Contents (Not Yet Final)
Preface

Introduction

Chapter 1: Building Machine Learning Systems

Chapter 2: Machine Learning Pipelines

Chapter 3: Your Friendly Neighborhood Air Quality Forecasting Service (available)

Chapter 4: Feature Stores (available)

Chapter 5: Hopsworks Feature Store (unavailable)

Chapter 6: Model-Independent Transformations (unavailable)

Chapter 7: Model-Dependent Transformations (unavailable)

Chapter 8: Batch Feature Pipelines (unavailable)

Chapter 9: Streaming Feature Pipelines (unavailable)

Chapter 10: Training Pipelines (unavailable)

Chapter 11: Inference Pipelines (unavailable)

Chapter 12: MLOps (unavailable)

Chapter 13: Feature and Model Monitoring (unavailable)

Chapter 14: Vector Databases (unavailable)

Chapter 15: Case Study: Personalized Recommendations (unavailable)


Chapter 1. Building Machine Learning
Systems
A NOTE FOR EARLY RELEASE
READERS

,With Early Release ebooks, you get books in their earliest form—the author’s raw and unedited
content as they write—so you can take advantage of these technologies long before the official
release of these titles.

This will be the 1st chapter of the final book. The GitHub repo can be found
at https://github.com/featurestorebook/mlfs-book.

If you have comments about how we might improve the content and/or examples in this book, or
if you notice missing material within this chapter, please reach out to the editor
at .

Imagine you have been tasked with producing a financial forecast for the upcoming financial year.
You decide to use machine learning as there is a lot of available data, but, not unexpectedly, the
data is spread across many different places—in spreadsheets and many different tables in the data
warehouse. You have been working for several years at the same organization, and this is not the
first time you have been given this task. Every year to date, the final output of your model has
been a PowerPoint presentation showing the financial projections. Each year, you trained a new
model, and your model made one prediction and you were finished with it. Each year, you started
effectively from scratch. You had to find the data sources (again), re-request access to the data to
create the features for your model, and then dig out the Jupyter notebook from last year and update
it with new data and improvements to your model.

This year, however, you realize that it may be worth investing the time in building the scaffolding
for this project so that you have less work to do next year. So, instead of delivering a powerpoint,
you decide to build a dashboard. Instead of requesting one-off access to the data, you build feature
pipelines that extract the historical data from its source(s) and compute the features (and labels)
used in your model. You have an insight that the feature pipelines can be used to do two things:
compute both the historical features used to train your model and compute the features that will be
used to make predictions with your trained model. Now, after training your model, you can connect
it to the feature pipelines to make predictions that power your dashboard. You thank yourself one
year later when you only have to tweak this ML system by adding/updating/removing features, and
training a new model. The time you saved in grunt data source, cleaning, and feature engineering,
you now use to investigate new ML frameworks and model architectures, resulting in a much
improved financial model, much to the delight of your boss.

The above example shows the difference between training a model to make a one-off prediction
on a static dataset versus building a batch ML system - a system that automates reading from data
sources, transforming data into features, training models, performing inference on new data with
the model, and updating a dashboard with the model’s predictions. The dashboard is the value
delivered by the model to stakeholders.

If you want a model to generate repeated value, the model should make predictions more than once.
That means, you are not finished when you have evaluated the model’s performance on a test set
drawn from your static dataset. Instead you will have to build ML pipelines, programs that
transform raw data into features, and feed features to your model for easy retraining, and feed new

, features to your model so that it can make predictions, generating more value with every prediction
it makes.

You have embarked on the same journey from training models on static datasets to building ML
systems. The most important part of that journey is working with dynamic data, see figure 1. This
means moving from static data, such as the hand curated datasets used in ML competitions found
on Kaggle.com, to batch data, datasets that are updated at some interval (hourly, daily, weekly,
yearly), to real-time data.




Figure 1-1. A ML system that only generates a one-off prediction on a static dataset generates less business
value than a ML system that can make predictions on a schedule with batches of input data. ML systems
that can make predictions with real-time data are more technically challenging, but can create even more
business value.

A ML system is a software system that manages the two main life cycles for a model: training and
inference (making predictions).


The Evolution of Machine Learning Systems
In the mid 2010s, revolutionary ML Systems started appearing in consumer Internet applications,
such as image tagging in Facebook and Google Translate. The first generation of ML systems were
either batch ML systems that make predictions on a schedule, see figure 2, or interactive online
ML systems that make predictions in response to user actions, see figure 3.

Written for

Course

Document information

Uploaded on
August 2, 2024
Number of pages
100
Written in
2022/2023
Type
PRESENTATION
Person
Unknown

Subjects

$4.99
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF

Get to know the seller
Seller avatar
RobertCuong

Get to know the seller

Seller avatar
RobertCuong Telecommunication
Follow You need to be logged in order to follow users or courses
Sold
-
Member since
3 year
Number of followers
0
Documents
225
Last sold
-
GPON and WiFi

+ SDH solution based on Fujitsu/Alcatel/Huawei devices in deployment and troubleshoot + Switching and Routing network fundamental and advance + GPON solution with deep knowledge of PLOAM/OMCI, activation procedure. Analysis of Private/Public OMCI + WiFi solution with WiFi Management/Control/Data. WiFi bandsteering, WiFi mesh, and WiFi 6, 6E, 7, ...

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Working on your references?

Create accurate citations in APA, MLA and Harvard with our free citation generator.

Working on your references?

Frequently asked questions