Presentation

Architecting Data and Machine Learning Platforms

Rating

Sold

Pages

301

Uploaded on

02-08-2024

Written in

2021/2022

"All cloud architects need to know how to build data platforms that enable businesses to make data-driven decisions and deliver enterprise-wide intelligence in a fast and efficient way. This handbook shows you how to design, build, and modernize cloud native data and machine learning platforms using AWS, Azure, Google Cloud, and multicloud tools like Snowflake and Databricks. Authors Marco Tranquillin, Valliappa Lakshmanan, and Firat Tekiner cover the entire data lifecycle from ingestion to activation in a cloud environment using real-world enterprise architectures. You'll learn how to transform, secure, and modernize familiar solutions like data warehouses and data lakes, and you'll be able to leverage recent AI/ML patterns to get accurate and quicker insights to drive competitive advantage."

Show more Read less

Institution

Course

Content preview

,Chapter 1. Modernizing Your Data
Platform: An Introductory Overview
Data is a valuable asset that can help your company make better decisions, identify new
opportunities, and improve operations. Google in 2013 undertook a strategic project to increase
employee retention by improving manager quality. Even something as loosey-goosey as manager
skill could be studied in a data-driven manner. Google was able to improve management
favorability from 83% to 88% by analyzing 10K performance reviews, identifying common
behaviors of high-performing managers, and creating training programs. Another example of a
strategic data project was carried out at Amazon. The ecommerce giant implemented
a recommendation system based on customer behaviors that drove 35% of purchases in 2017. The
Warriors, a San Francisco basketball team, is yet another example; they enacted an analytics
program that helped catapult them to the top of their league. All these—employee retention,
product recommendations, improving win rates—are examples of business goals that were
achieved by modern data analytics.

To become a data-driven company, you need to build an ecosystem for data analytics, processing,
and insights. This is because there are many different types of applications (websites, dashboards,
mobile apps, ML models, distributed devices, etc.) that create and consume data. There are also
many different departments within your company (finance, sales, marketing, operations, logistics,
etc.) that need data-driven insights. Because the entire company is your customer base, building a
data platform is more than just an IT project.

This chapter introduces data platforms, their requirements, and why traditional data architectures
prove insufficient. It also discusses technology trends in data analytics and AI, and how to build
data platforms for the future using the public cloud. This chapter is a general overview of the core
topics covered in more detail in the rest of the book.

The Data Lifecycle
The purpose of a data platform is to support the steps that organizations need to carry out to move
from raw data to insightful information. It is helpful to understand the steps of the data lifecycle
(collect, store, process, visualize, activate) because they can be mapped almost as-is to a data
architecture to create a unified analytics platform.

The Journey to Wisdom
Data helps companies to develop smarter products, reach more customers, and increase their return
on investment (ROI). Data can also be leveraged to measure customer satisfaction, profitability,
and cost. But the data by itself is not enough. Data is raw material that needs to pass through a
series of stages before it can be used to generate insights and knowledge. This sequence of stages
is what we call a data lifecycle. There are many definitions available in the literature, but from a
general point of view, we can identify five main stages in modern data platform architecture:

,1. Collect

Data has to be acquired and injected into the target systems (e.g., manual data entry, batch
loading, streaming ingestion, etc.).

2. Store

Data needs to be persisted in a durable fashion with the ability to easily access it in the
future (e.g., file storage system, database).

3. Process/transform

Data has to be manipulated to make it useful for subsequent steps (e.g., cleansing,
wrangling, transforming).

4. Analyze/visualize

Data needs to be studied to derive business insights via manual elaboration (e.g., queries,
slice and dice) or automatic processing (e.g., enrichment using ML application
programming interfaces—APIs).

5. Activate

Surfacing the data insights in a form and place where decisions can be made (e.g.,
notifications that act as a trigger for specific manual actions, automatic job executions
when specific conditions are met, ML models that send feedback to devices).

Each of these stages feeds into the next, similar to the flow of water through a set of pipes.

Water Pipes Analogy
To understand the data lifecycle better, think of it as a simplified water pipe system. The water
starts at an aqueduct and is then transferred and transformed through a series of pipes until it
reaches a group of houses. The data lifecycle is similar, with data being collected, stored,
processed/transformed, and analyzed before it is used to make decisions (see Figure 1-1).

, Figure 1-1. Water lifecycle, providing an analogy for the five steps in the data lifecycle

You can see some similarities between the plumbing world and the data world. Plumbing engineers
are like data engineers, who design and build the systems that make data usable. People who
analyze water samples are like data analysts and data scientists, who analyze data to find insights.
Of course, this is just a simplification. There are many other roles in a company that use data, like
executives, developers, business users, and security administrators. But this analogy can help you
remember the main concepts.

In the canonical data lifecycle, shown in Figure 1-2, data engineers collect and store data in an
analytics store. The stored data is then processed using a variety of tools. If the tools involve
programming, the processing is typically done by data engineers. If the tools are declarative, the
processing is typically done by data analysts. The processed data is then analyzed by business
users and data scientists. Business users use the insights to make decisions, such as launching
marketing campaigns or issuing refunds. Data scientists use the data to train ML models, which
can be used to automate tasks or make predictions.

Report Copyright Violation

Written for

Course: Architecting Data and Machine Learning Platforms

All documents for this subject (1)

Document information

Uploaded on: August 2, 2024
Number of pages: 301
Written in: 2021/2022
Type: PRESENTATION
Person: Unknown

Subjects

architecting data and machine learning platforms

$4.99

Get access to the full document:

Written by students who passed

Immediately available after payment

Read online or as PDF

Get to know the seller

RobertCuong

Get to know the seller

RobertCuong Telecommunication

View profile

Sold

Member since

3 year

Number of followers

Documents

225

Last sold

GPON and WiFi

+ SDH solution based on Fujitsu/Alcatel/Huawei devices in deployment and troubleshoot + Switching and Routing network fundamental and advance + GPON solution with deep knowledge of PLOAM/OMCI, activation procedure. Analysis of Private/Public OMCI + WiFi solution with WiFi Management/Control/Data. WiFi bandsteering, WiFi mesh, and WiFi 6, 6E, 7, ...

0.0

0 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller RobertCuong. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $4.99. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 54457 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

Architecting Data and Machine Learning Platforms

Content preview

Written for

Document information

Subjects

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Working on your references?

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?