Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Presentation

Architecting Data and Machine Learning Platforms

Rating
-
Sold
-
Pages
301
Uploaded on
02-08-2024
Written in
2021/2022

"All cloud architects need to know how to build data platforms that enable businesses to make data-driven decisions and deliver enterprise-wide intelligence in a fast and efficient way. This handbook shows you how to design, build, and modernize cloud native data and machine learning platforms using AWS, Azure, Google Cloud, and multicloud tools like Snowflake and Databricks. Authors Marco Tranquillin, Valliappa Lakshmanan, and Firat Tekiner cover the entire data lifecycle from ingestion to activation in a cloud environment using real-world enterprise architectures. You'll learn how to transform, secure, and modernize familiar solutions like data warehouses and data lakes, and you'll be able to leverage recent AI/ML patterns to get accurate and quicker insights to drive competitive advantage."

Show more Read less
Institution
Course

Content preview

,Chapter 1. Modernizing Your Data
Platform: An Introductory Overview
Data is a valuable asset that can help your company make better decisions, identify new
opportunities, and improve operations. Google in 2013 undertook a strategic project to increase
employee retention by improving manager quality. Even something as loosey-goosey as manager
skill could be studied in a data-driven manner. Google was able to improve management
favorability from 83% to 88% by analyzing 10K performance reviews, identifying common
behaviors of high-performing managers, and creating training programs. Another example of a
strategic data project was carried out at Amazon. The ecommerce giant implemented
a recommendation system based on customer behaviors that drove 35% of purchases in 2017. The
Warriors, a San Francisco basketball team, is yet another example; they enacted an analytics
program that helped catapult them to the top of their league. All these—employee retention,
product recommendations, improving win rates—are examples of business goals that were
achieved by modern data analytics.

To become a data-driven company, you need to build an ecosystem for data analytics, processing,
and insights. This is because there are many different types of applications (websites, dashboards,
mobile apps, ML models, distributed devices, etc.) that create and consume data. There are also
many different departments within your company (finance, sales, marketing, operations, logistics,
etc.) that need data-driven insights. Because the entire company is your customer base, building a
data platform is more than just an IT project.

This chapter introduces data platforms, their requirements, and why traditional data architectures
prove insufficient. It also discusses technology trends in data analytics and AI, and how to build
data platforms for the future using the public cloud. This chapter is a general overview of the core
topics covered in more detail in the rest of the book.


The Data Lifecycle
The purpose of a data platform is to support the steps that organizations need to carry out to move
from raw data to insightful information. It is helpful to understand the steps of the data lifecycle
(collect, store, process, visualize, activate) because they can be mapped almost as-is to a data
architecture to create a unified analytics platform.

The Journey to Wisdom
Data helps companies to develop smarter products, reach more customers, and increase their return
on investment (ROI). Data can also be leveraged to measure customer satisfaction, profitability,
and cost. But the data by itself is not enough. Data is raw material that needs to pass through a
series of stages before it can be used to generate insights and knowledge. This sequence of stages
is what we call a data lifecycle. There are many definitions available in the literature, but from a
general point of view, we can identify five main stages in modern data platform architecture:

,1. Collect

Data has to be acquired and injected into the target systems (e.g., manual data entry, batch
loading, streaming ingestion, etc.).

2. Store

Data needs to be persisted in a durable fashion with the ability to easily access it in the
future (e.g., file storage system, database).

3. Process/transform

Data has to be manipulated to make it useful for subsequent steps (e.g., cleansing,
wrangling, transforming).

4. Analyze/visualize

Data needs to be studied to derive business insights via manual elaboration (e.g., queries,
slice and dice) or automatic processing (e.g., enrichment using ML application
programming interfaces—APIs).

5. Activate

Surfacing the data insights in a form and place where decisions can be made (e.g.,
notifications that act as a trigger for specific manual actions, automatic job executions
when specific conditions are met, ML models that send feedback to devices).

Each of these stages feeds into the next, similar to the flow of water through a set of pipes.

Water Pipes Analogy
To understand the data lifecycle better, think of it as a simplified water pipe system. The water
starts at an aqueduct and is then transferred and transformed through a series of pipes until it
reaches a group of houses. The data lifecycle is similar, with data being collected, stored,
processed/transformed, and analyzed before it is used to make decisions (see Figure 1-1).

, Figure 1-1. Water lifecycle, providing an analogy for the five steps in the data lifecycle


You can see some similarities between the plumbing world and the data world. Plumbing engineers
are like data engineers, who design and build the systems that make data usable. People who
analyze water samples are like data analysts and data scientists, who analyze data to find insights.
Of course, this is just a simplification. There are many other roles in a company that use data, like
executives, developers, business users, and security administrators. But this analogy can help you
remember the main concepts.

In the canonical data lifecycle, shown in Figure 1-2, data engineers collect and store data in an
analytics store. The stored data is then processed using a variety of tools. If the tools involve
programming, the processing is typically done by data engineers. If the tools are declarative, the
processing is typically done by data analysts. The processed data is then analyzed by business
users and data scientists. Business users use the insights to make decisions, such as launching
marketing campaigns or issuing refunds. Data scientists use the data to train ML models, which
can be used to automate tasks or make predictions.

Written for

Course

Document information

Uploaded on
August 2, 2024
Number of pages
301
Written in
2021/2022
Type
PRESENTATION
Person
Unknown

Subjects

$4.99
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF

Get to know the seller
Seller avatar
RobertCuong

Get to know the seller

Seller avatar
RobertCuong Telecommunication
Follow You need to be logged in order to follow users or courses
Sold
-
Member since
3 year
Number of followers
0
Documents
225
Last sold
-
GPON and WiFi

+ SDH solution based on Fujitsu/Alcatel/Huawei devices in deployment and troubleshoot + Switching and Routing network fundamental and advance + GPON solution with deep knowledge of PLOAM/OMCI, activation procedure. Analysis of Private/Public OMCI + WiFi solution with WiFi Management/Control/Data. WiFi bandsteering, WiFi mesh, and WiFi 6, 6E, 7, ...

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Working on your references?

Create accurate citations in APA, MLA and Harvard with our free citation generator.

Working on your references?

Frequently asked questions