Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Presentation

Data-Centric Machine Learning with Python

Rating
-
Sold
-
Pages
330
Uploaded on
02-08-2024
Written in
2020/2021

"In the rapidly advancing data-driven world where data quality is pivotal to the success of machine learning and artificial intelligence projects, this critically timed guide provides a rare, end-to-end overview of data-centric machine learning (DCML), along with hands-on applications of technical and non-technical approaches to generating deeper and more accurate datasets. This book will help you understand what data-centric ML/AI is and how it can help you to realize the potential of ‘small data’. Delving into the building blocks of data-centric ML/AI, you’ll explore the human aspects of data labeling, tackle ambiguity in labeling, and understand the role of synthetic data. From strategies to improve data collection to techniques for refining and augmenting datasets, you’ll learn everything you need to elevate your data-centric practices. Through applied examples and insights for overcoming challenges, you’ll get a roadmap for implementing data-centric ML/AI in diverse applications in Python."

Show more Read less
Institution
Course

Content preview

,Table of Contents
Part 1: What Data-Centric Machine Learning Is and Why We
Need It
1 Exploring Data-Centric Machine Learning
Understanding data-centric ML

The origins of data centricity

The components of ML systems

Data is the foundational ingredient

Data-centric versus model-centric ML

Data centricity is a team sport

The importance of quality data in ML

Identifying high-value legal cases with natural language processing

Predicting cardiac arrests in emergency calls

Summary

References

2 From Model-Centric to Data-Centric – ML’s Evolution
Exploring why ML development ended up being mostly model-centric

The 1940s to 1970s – the early days

The 1980s to 1990s – the rise of personal computing and the internet

The 2000s – the rise of tech giants

2010–now – big data drives AI innovation

,Model-centricity was the logical evolutionary outcome

Unlocking the opportunity for small data ML

Why we need data-centric AI more than ever

The cascading effects of data quality

Avoiding data cascades and technical debt

Summary

References

Part 2: The Building Blocks of Data-Centric ML
3 Principles of Data-Centric ML
Sometimes, all you need is the right data

Principle 1 – data should be the center of ML development

A checklist for data-centricity

Principle 2 – leverage annotators and SMEs effectively

Direct labeling with human annotators

Verifying output quality with human annotators

Codifying labeling rules with programmatic labeling

Principle 3 – use ML to improve your data

Principle 4 – follow ethical, responsible, and well-governed ML practices

Summary

References

4 Data Labeling Is a Collaborative Process

, Understanding the benefits of diverse human labeling

Understanding common challenges arising from human labelers

Designing a framework for high-quality labels

Designing clear instructions

Aligning motivations and using SMEs

Collaborating iteratively

Dealing with ambiguity and reflecting diversity

Understanding approaches for dealing with ambiguity in labeling

Measuring labeling consistency

Summary

References

Part 3: Technical Approaches to Better Data
5 Techniques for Data Cleaning
The six key dimensions of data quality

Installing the required packages

Introducing the dataset

Ensuring the data is consistent

Checking that the data is unique

Ensuring that the data is complete and not missing

Ensuring that the data is valid

Ensuring that the data is accurate

Ensuring that the data is fresh

Written for

Course

Document information

Uploaded on
August 2, 2024
Number of pages
330
Written in
2020/2021
Type
PRESENTATION
Person
Unknown

Subjects

$4.99
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF

Get to know the seller
Seller avatar
RobertCuong

Get to know the seller

Seller avatar
RobertCuong Telecommunication
Follow You need to be logged in order to follow users or courses
Sold
-
Member since
3 year
Number of followers
0
Documents
225
Last sold
-
GPON and WiFi

+ SDH solution based on Fujitsu/Alcatel/Huawei devices in deployment and troubleshoot + Switching and Routing network fundamental and advance + GPON solution with deep knowledge of PLOAM/OMCI, activation procedure. Analysis of Private/Public OMCI + WiFi solution with WiFi Management/Control/Data. WiFi bandsteering, WiFi mesh, and WiFi 6, 6E, 7, ...

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Working on your references?

Create accurate citations in APA, MLA and Harvard with our free citation generator.

Working on your references?

Frequently asked questions