Class notes

Machine Learning

Rating

Sold

Pages

Uploaded on

08-11-2025

Written in

2025/2026

Brief notes on machine learning concepts, covering supervised and unsupervised learning, model training, and real-world applications.

Institution

Course

Content preview

Unit 1

Data
• a raw information, facts or numbers collected to be examined or analysed to make
decisions.
• should be in a formalized manner suitable for communication, interpretation and
processing.
Information
• Result of analysing data
Data versus Information
Data are the building blocks of information. Likewise, pieces of information are the
building blocks of records.
Information: Data that has been given value through analysis, interpretation, or
compilation in a meaningful form.
Types of Data:
1. Structured: Well-organized data with a defined schema (e.g., tables).
2. Unstructured: Unorganized data with no fixed format (e.g., emails).
3. Natural Language: A type of unstructured data that involves human language,
requiring linguistic techniques.
4. Machine-generated: Data created automatically by systems without human input.
5. Graph-based: Data focusing on object relationships and connections.
6. Audio, Video, Images: Multimedia data captured through sound and visuals.
7. Streaming: Real-time data generated continuously by events.
DATA ACQ
Introduction:
Data Acquisition is the process of collecting, filtering, and converting real-world data
from various relevant sources into a format that can be processed by computers for
further analysis. In today's data-driven world, it plays a crucial role in business
intelligence, machine learning, and decision-making systems.

Importance of Data Acquisition:
1. Strategic Planning: Helps businesses analyze customer behavior and market trends
to frame effective strategies.
2. Error Detection: Makes it easier to identify inconsistencies or gaps in data early on.
3. Minimizes Human Error: Automates data collection to reduce manual mistakes.
4. Improved Data Security: Secure handling and storage of data.
5. Cost Efficiency: Automates repetitive tasks, saving time and operational costs.
6. Enables Recommendation Systems: Collected data is crucial for building systems
like product or content recommendations.

Data Acquisition in Machine Learning:
In machine learning, data acquisition is the first and most important step. The model's
accuracy and performance depend heavily on the quality of data collected.
Key Steps:
1. Collection & Integration: Data is gathered from multiple sources and combined as
required.
2. Formatting: Raw data is organized and cleaned to fit the model's requirements.

,3. Labeling: Data is tagged with proper labels or classifications for supervised
learning.

Data Acquisition Process:
1. Data Discovery: Searching and identifying new, useful datasets from internal or
external sources.
2. Data Augmentation: Enhancing existing datasets by adding external data to
improve richness and context.
3. Data Generation: Creating datasets either manually (e.g., surveys) or automatically
(e.g., sensors, web scraping).

Techniques and Tools for Data Acquisition:
1. Data Warehouses & ETL (Extract, Transform, Load):
• Data Warehouse: A centralized database where structured data from various
sources is stored.
• ETL Process:
o Extract: Retrieve data from multiple sources
o Transform: Convert it into a suitable format
o Load: Store it in a data warehouse
• ETL Tools:
o Code-based: SQL, PL/SQL, BASE SAS
o GUI-based: Informatica, Data Stage, SSIS
2. Data Lakes & ELT (Extract, Load, Transform):
• Store structured, semi-structured, and unstructured data (e.g., images, videos,
PDFs)
• Follows ELT process where raw data is stored first and transformed only when
needed
• More flexible and suitable for big data environments
3. Cloud Data Warehouses:
• Examples: Amazon Redshift, Google BigQuery, Snowflake
• Offers on-demand storage without physical hardware
• Cost-effective and scalable solutions for modern enterprises

Data Collection Sources:
1. Primary Data Collection:
Original, firsthand data collected directly from the source.
• Surveys/Questionnaires: Structured forms to collect data online or offline
• Interviews: One-on-one interaction; can be structured or unstructured
• Observations: Monitoring subjects in a natural environment
• Experiments: Controlled environment to study cause-effect relationships
• Focus Groups: Group discussions for feedback or opinion gathering
2. Secondary Data Collection:
Data collected by others, reused for new analysis.
• Published Sources: Books, research papers, newspapers
• Online Databases: Statistical and academic data
• Government Records: Census data, economic reports
• Public Data: Social media posts, forums

, • Previous Research Studies: Existing datasets used for comparative or extended
research

Internal and External Systems:
Internal Systems:
• Data generated within an organization
• Comes from internal operations like sales, production, customer support
• Stored in CRM, ERP systems, spreadsheets, etc.
• Highly structured and controlled
External Systems:
• Data gathered from outside the organization
• Sources: market trends, competitors, government databases, social media
• Includes data from APIs (Application Programming Interfaces)
Web APIs:
• Enable communication between different software systems
• REST API: Most common API type; returns data as resources
o Lightweight, easy to use, suitable for web applications

Conclusion:
Data Acquisition is the foundation of any data-centric technology or system. Without
accurate and well-collected data, the success of machine learning models, business
intelligence tools, and automated systems is compromised. Understanding the sources,
tools, and processes involved in data acquisition enables better decision-making and
technological advancement.
Data Pre processing
Data preprocessing is an important process of data mining. In this process, raw data is
converted into an understandable format and made ready for further analysis.
Purpose of data preprocessing:
❖ Get data overview
❖ Identify missing data
❖ Identify outliers or anomalous data
❖ Remove Inconsistencies
Data preprocessing in Machine Learning : A practical approach
Data preprocessing is a process of preparing the raw data and making it suitable for a
machine learning model. It is the first and crucial step while creating a machine
learning model. A real-world data generally contains noises, missing values, and maybe
in an unusable format which cannot be directly used for machine learning models.
Data preprocessing is required tasks for cleaning the data and making it suitable for a
machine learning model which also increases the accuracy and efficiency of a machine
learni
It involves below steps:
1. Getting the dataset
2. Importing libraries
3. Importing datasets
4. Finding Missing Data
5. Encoding Categorical Data
6. Splitting dataset into training and test set

Report Copyright Violation

Written for

Institution: Sathyabama Institute Of Science And Technology
Course: Data Science and Information

All documents for this subject (9)

Document information

Uploaded on: November 8, 2025
Number of pages: 26
Written in: 2025/2026
Type: Class notes
Professor(s): Abirami
Contains: All classes

Subjects

data
machine learning
aids
ml

$7.49

Get access to the full document:

Written by students who passed

Immediately available after payment

Read online or as PDF

Get to know the seller

lsharan

Get to know the seller

lsharan Sathyabama institute of science and technology

View profile

Sold

Member since

6 months

Number of followers

Documents

Last sold

0.0

0 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller lsharan. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $7.49. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 49710 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

Machine Learning

Content preview

Written for

Document information

Subjects

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Working on your references?

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?