Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Exam (elaborations)

DATA SCIENCE & BIG DATA ANALYTICS UNIT I -- Complete Exam Notes Structured for 4 . 5 . 6 Mark Questions | Target: 30/30

Rating
-
Sold
-
Pages
22
Grade
A+
Uploaded on
03-03-2026
Written in
2025/2026

DATA SCIENCE & BIG DATA ANALYTICS UNIT I -- Complete Exam Notes Structured for 4 . 5 . 6 Mark Questions | Target: 30/30

Institution
Course

Content preview

SAVITRIBAI PHULE PUNE UNIVERSITY

B.E. (Computer Engineering / IT / Data Science)




DATA SCIENCE & BIG DATA ANALYTICS
UNIT I -- Complete Exam Notes
Structured for 4 . 5 . 6 Mark Questions | Target: 30/30




Subject Data Science & Big Data Analytics

Unit Unit I -- Introduction & Data Preprocessing

University Savitribai Phule Pune University (SPPU)

Exam Pattern 4-Mark | 5-Mark | 6-Mark Questions

Contents 15 Topics + Model Answers + Viva Q&A;

Target Score Full 30/30 Marks

,SPPU | Data Science & Big Data Analytics | Unit I Exam Notes -- Score 30/30




PART 1 . COMPLETE UNIT NOTES

1. Basics and Need of Data Science & Big Data

1.1 What is Data Science?

DEFINITION
Data Science is an interdisciplinary field that uses scientific methods, processes, algorithms, and
systems to extract knowledge and insights from structured and unstructured data. It combines statistics,
computer science, domain expertise, and machine learning for decision-making.



1.2 Why is Data Science Needed?
• Exponential data growth from IoT, social media, and digital transactions
• Traditional tools cannot process or analyze large, complex datasets
• Enables predictive analytics, fraud detection, and personalization
• Provides competitive business advantage through data-driven decisions
• Uncovers hidden patterns and correlations invisible to human analysis


1.3 What is Big Data?

DEFINITION
Big Data refers to extremely large datasets that cannot be stored, processed, or analyzed using
traditional database tools. It is characterized by the 5 V's: Volume, Velocity, Variety, Veracity, and Value.



1.4 Need for Big Data Technologies
• Organizations generate terabytes of data every single day
• Traditional RDBMS cannot handle petabyte-scale datasets
• Real-time processing needed for stock markets, social media, IoT sensors
• Enables cost reduction through optimized business operations
• Supports pattern recognition and accurate future prediction


2. Applications of Data Science

Domain / Industry Data Science Applications

Healthcare Disease prediction, drug discovery, medical imaging, patient monitoring

Finance & Banking Fraud detection, credit scoring, risk analysis, algorithmic trading

E-Commerce Product recommendations, customer segmentation, demand forecasting

Social Media Sentiment analysis, content recommendation, targeted advertising

Transportation Route optimization, autonomous vehicles, traffic prediction

Education Personalized learning, dropout prediction, student performance analysis

Manufacturing Predictive maintenance, quality control, supply chain optimization




Page 2

, SPPU | Data Science & Big Data Analytics | Unit I Exam Notes -- Score 30/30




Government Crime prediction, policy planning, census analysis


REAL EXAMPLE - Healthcare
Patient records (age, symptoms, lab results) are fed to a Random Forest model that predicts heart
disease probability with 92% accuracy -- enabling proactive medical intervention and reducing mortality
rates.



3. Data Explosion

Data Explosion refers to the unprecedented and exponential growth of data being generated, captured, and
stored globally due to digital transformation.


Causes of Data Explosion
• Social Media -- Billions of posts, images, videos daily (Facebook, Instagram, Twitter)
• IoT Devices -- Sensors, smart appliances generating continuous real-time data streams
• Mobile Devices -- GPS tracking, app usage, purchase records, call logs
• E-Commerce -- Clickstream, purchase history, product browsing, reviews
• Scientific Research -- Genome sequencing, satellite imagery, astronomical data
• Cloud Computing -- Cheaper storage encourages organizations to store everything

DATA EXPLOSION - GLOBAL DATA VOLUME GROWTH

2010 ||-- ~1 Zettabyte
2015 ||-------- ~8 Zettabytes
2020 ||---------------- ~40 Zettabytes
2025 ||------------------------ ~120+ Zettabytes
----------------------------------
1 Zettabyte = 1 billion Terabytes
Data doubles approximately every 2 years



Challenges Created by Data Explosion
• Storage management, scalability, and infrastructure cost
• Data security, privacy, and regulatory compliance
• Need for specialized tools: Hadoop, Apache Spark, NoSQL databases
• Ensuring data quality and consistency at massive scale


4. Five V's of Big Data

The 5 V's are the key characteristics that define and distinguish Big Data:


Massive scale of data generated every second.
V1 - VOLUME Example: Facebook: 4 PB/day | Twitter: 500M tweets/day | YouTube: 500 hrs video/min


Speed at which data is generated, collected, and processed (real-time or near-real-time).
V2 - VELOCITY Example: Stock ticks processed in microseconds | IoT sensors stream data continuously


Diverse types: Structured (tables), Semi-structured (JSON, XML), Unstructured (text, images,
V3 - VARIETY video).
Example: Hospital stores: text reports + X-ray images + lab values + ECG signals




Page 3

Written for

Institution
Course

Document information

Uploaded on
March 3, 2026
Number of pages
22
Written in
2025/2026
Type
Exam (elaborations)
Contains
Questions & answers

Subjects

$8.49
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF

Get to know the seller
Seller avatar
gayatrikate

Get to know the seller

Seller avatar
gayatrikate savitribai phule pune university
Follow You need to be logged in order to follow users or courses
Sold
-
Member since
2 months
Number of followers
0
Documents
1
Last sold
-

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Working on your references?

Create accurate citations in APA, MLA and Harvard with our free citation generator.

Working on your references?

Frequently asked questions