Big Data Processing Frameworks
● Processing large data sets using distributed computing systems
● Examples: Hadoop, Spark, Flink
Data Visualization
● Representing data in a graphical format
● Examples: Matplotlib, Seaborn, Tableau
Probability and Statistical Inference
● Probability: the chance of an event occurring
● Statistical inference: drawing conclusions about a population based on sample data
Point Estimation and Interval Estimation
● Point estimation: estimating a single value for a population parameter
● Interval estimation: estimating a range of possible values for a population parameter
Titanic Passenger Survival Analysis
● Analyzing data to predict whether a passenger survived the Titanic shipwreck
Hypothesis Testing
● Comparing two sets of data to determine if they are significantly different
● Examples: t-test, ANOVA
Decision Trees & Model Importance
● Decision tree is a model that predicts outcomes by recursively partitioning the data
● Model importance: measuring how important each feature is to the predictions
Vehicle Purchase Prediction for SUVs
● Predicting whether a customer will purchase an SUV
Weather Prediction: Rain/Snow
● Predicting weather conditions (rain or snow) based on data
Confusion Matrix for Model Evaluation
● A table used to evaluate the performance of a classification model
Mean, Median, Mode, Variance, & Standard Deviation Calculation
● Mean: average of a dataset
, ● Median: middle value of a dataset
● Mode: most frequently occurring value in a dataset
● Variance: measure of how spread out the data is
● Standard deviation: square root of the variance
Machine Learning - Importance and Applications
● Improving automation and decision-making capabilities of systems
● Examples: image recognition, natural language processing, fraud detection
Animal Classification: Birds vs. Mammals
● Classifying animals as birds or mammals based on data
Types of Probability: Marginal, Joint, and Conditional
● Marginal probability: probability of an event without considering other events
● Joint probability: probability of multiple events occurring together
● Conditional probability: probability of an event given that another event has occurred
Probability Distributions: Density, Normal, and Central Limit Theorem
● Probability distribution: function giving the probability of each value of a random
variable
● Density: continuous probability distribution
● Normal: continuous symmetric distribution
● Central Limit Theorem: when adding many independent random variables, the sum
tends to be normally distributed
Use Cases and Real-world Examples
● Fraud detection: identifying fraudulent transactions in financial data
● Predictive maintenance: predicting when machinery will break down
Machine Learning: A Subset of AI with Ability to Improve Automatically
● Machine learning: a subfield of artificial intelligence that allows systems to
automatically learn from data
Algorithm: Set of Rules for Problem Solving using Data
● Algorithm: set of rules for solving a problem using data
Class and Survival: Analyzing Spitting Rates Among Different Passenger
Classes
● Analyzing the survival rate of Titanic passengers based on their passenger class