Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Class notes

Machine Learning

Rating
-
Sold
-
Pages
26
Uploaded on
08-11-2025
Written in
2025/2026

Brief notes on machine learning concepts, covering supervised and unsupervised learning, model training, and real-world applications.

Institution
Course

Content preview

Unit 1

Data
• a raw information, facts or numbers collected to be examined or analysed to make
decisions.
• should be in a formalized manner suitable for communication, interpretation and
processing.
Information
• Result of analysing data
Data versus Information
Data are the building blocks of information. Likewise, pieces of information are the
building blocks of records.
Information: Data that has been given value through analysis, interpretation, or
compilation in a meaningful form.
Types of Data:
1. Structured: Well-organized data with a defined schema (e.g., tables).
2. Unstructured: Unorganized data with no fixed format (e.g., emails).
3. Natural Language: A type of unstructured data that involves human language,
requiring linguistic techniques.
4. Machine-generated: Data created automatically by systems without human input.
5. Graph-based: Data focusing on object relationships and connections.
6. Audio, Video, Images: Multimedia data captured through sound and visuals.
7. Streaming: Real-time data generated continuously by events.
DATA ACQ
Introduction:
Data Acquisition is the process of collecting, filtering, and converting real-world data
from various relevant sources into a format that can be processed by computers for
further analysis. In today's data-driven world, it plays a crucial role in business
intelligence, machine learning, and decision-making systems.

Importance of Data Acquisition:
1. Strategic Planning: Helps businesses analyze customer behavior and market trends
to frame effective strategies.
2. Error Detection: Makes it easier to identify inconsistencies or gaps in data early on.
3. Minimizes Human Error: Automates data collection to reduce manual mistakes.
4. Improved Data Security: Secure handling and storage of data.
5. Cost Efficiency: Automates repetitive tasks, saving time and operational costs.
6. Enables Recommendation Systems: Collected data is crucial for building systems
like product or content recommendations.

Data Acquisition in Machine Learning:
In machine learning, data acquisition is the first and most important step. The model's
accuracy and performance depend heavily on the quality of data collected.
Key Steps:
1. Collection & Integration: Data is gathered from multiple sources and combined as
required.
2. Formatting: Raw data is organized and cleaned to fit the model's requirements.

,3. Labeling: Data is tagged with proper labels or classifications for supervised
learning.

Data Acquisition Process:
1. Data Discovery: Searching and identifying new, useful datasets from internal or
external sources.
2. Data Augmentation: Enhancing existing datasets by adding external data to
improve richness and context.
3. Data Generation: Creating datasets either manually (e.g., surveys) or automatically
(e.g., sensors, web scraping).

Techniques and Tools for Data Acquisition:
1. Data Warehouses & ETL (Extract, Transform, Load):
• Data Warehouse: A centralized database where structured data from various
sources is stored.
• ETL Process:
o Extract: Retrieve data from multiple sources
o Transform: Convert it into a suitable format
o Load: Store it in a data warehouse
• ETL Tools:
o Code-based: SQL, PL/SQL, BASE SAS
o GUI-based: Informatica, Data Stage, SSIS
2. Data Lakes & ELT (Extract, Load, Transform):
• Store structured, semi-structured, and unstructured data (e.g., images, videos,
PDFs)
• Follows ELT process where raw data is stored first and transformed only when
needed
• More flexible and suitable for big data environments
3. Cloud Data Warehouses:
• Examples: Amazon Redshift, Google BigQuery, Snowflake
• Offers on-demand storage without physical hardware
• Cost-effective and scalable solutions for modern enterprises


Data Collection Sources:
1. Primary Data Collection:
Original, firsthand data collected directly from the source.
• Surveys/Questionnaires: Structured forms to collect data online or offline
• Interviews: One-on-one interaction; can be structured or unstructured
• Observations: Monitoring subjects in a natural environment
• Experiments: Controlled environment to study cause-effect relationships
• Focus Groups: Group discussions for feedback or opinion gathering
2. Secondary Data Collection:
Data collected by others, reused for new analysis.
• Published Sources: Books, research papers, newspapers
• Online Databases: Statistical and academic data
• Government Records: Census data, economic reports
• Public Data: Social media posts, forums

, • Previous Research Studies: Existing datasets used for comparative or extended
research

Internal and External Systems:
Internal Systems:
• Data generated within an organization
• Comes from internal operations like sales, production, customer support
• Stored in CRM, ERP systems, spreadsheets, etc.
• Highly structured and controlled
External Systems:
• Data gathered from outside the organization
• Sources: market trends, competitors, government databases, social media
• Includes data from APIs (Application Programming Interfaces)
Web APIs:
• Enable communication between different software systems
• REST API: Most common API type; returns data as resources
o Lightweight, easy to use, suitable for web applications


Conclusion:
Data Acquisition is the foundation of any data-centric technology or system. Without
accurate and well-collected data, the success of machine learning models, business
intelligence tools, and automated systems is compromised. Understanding the sources,
tools, and processes involved in data acquisition enables better decision-making and
technological advancement.
Data Pre processing
Data preprocessing is an important process of data mining. In this process, raw data is
converted into an understandable format and made ready for further analysis.
Purpose of data preprocessing:
❖ Get data overview
❖ Identify missing data
❖ Identify outliers or anomalous data
❖ Remove Inconsistencies
Data preprocessing in Machine Learning : A practical approach
Data preprocessing is a process of preparing the raw data and making it suitable for a
machine learning model. It is the first and crucial step while creating a machine
learning model. A real-world data generally contains noises, missing values, and maybe
in an unusable format which cannot be directly used for machine learning models.
Data preprocessing is required tasks for cleaning the data and making it suitable for a
machine learning model which also increases the accuracy and efficiency of a machine
learni
It involves below steps:
1. Getting the dataset
2. Importing libraries
3. Importing datasets
4. Finding Missing Data
5. Encoding Categorical Data
6. Splitting dataset into training and test set

Written for

Institution
Course

Document information

Uploaded on
November 8, 2025
Number of pages
26
Written in
2025/2026
Type
Class notes
Professor(s)
Abirami
Contains
All classes

Subjects

$7.49
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF

Get to know the seller
Seller avatar
lsharan

Get to know the seller

Seller avatar
lsharan Sathyabama institute of science and technology
Follow You need to be logged in order to follow users or courses
Sold
-
Member since
6 months
Number of followers
0
Documents
15
Last sold
-

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Working on your references?

Create accurate citations in APA, MLA and Harvard with our free citation generator.

Working on your references?

Frequently asked questions