Exam (elaborations)

ML Important Questions for Placements + Data Science Basics

Rating

Sold

Pages

Grade

Uploaded on

30-11-2025

Written in

2025/2026

This document provides a complete set of Machine Learning (ML) interview questions and answers, ideal for placements, internships, and data-science-related job interviews. It covers all essential ML topics such as supervised learning, unsupervised learning, regression, classification, clustering, feature engineering, model evaluation metrics, overfitting, regularization, bias–variance tradeoff, gradient descent, SVM, decision trees, ensemble methods, neural networks, and more. All concepts are explained in a simple, clear, and easy-to-remember manner, making it perfect for quick revision. This document is suitable for students preparing for AI, ML, and data science roles, as well as for university viva, exams, and interview preparation. Accurate, well-structured, and interview-focused — this study material helps you confidently answer frequently asked ML questions in real interviews.

Show more Read less

Institution

Course

Content preview

1. What are Different Kernels in SVM?
In SVM, a kernel is a function that transforms input data into a higher-dimensional space so it can be separated more easily.
 Linear Kernel → Used when data is linearly separable.
 Polynomial Kernel → Works for non-linear data by adding polynomial features.
 RBF (Radial Basis Function / Gaussian) → Most common, works well for complex boundaries.
 Sigmoid Kernel → Similar to neural networks, less used today.
👉 Example: For image classification, RBF is commonly used.
2. Why was Machine Learning Introduced?
Traditional programming means we give rules + data → output. But for complex problems (like speech recognition, image
classification), writing rules manually is impossible.
So, Machine Learning was introduced to let computers learn patterns from data automatically instead of being explicitly
programmed.
3. Explain the Difference Between Classification and Regression?
 Classification → Predicts categories (discrete outputs).
👉 Example: Spam or Not Spam.
 Regression → Predicts continuous values.
👉 Example: Predicting house prices.
4. What is Bias in Machine Learning?
Bias is the error that comes from using a simplified model that cannot capture the actual patterns in data.
👉 Example: Fitting a straight line to data that follows a curve → high bias → underfitting.
5. What is Cross-Validation?
Cross-validation is a technique to evaluate model performance by splitting data into multiple parts.
 In k-fold cross-validation, the dataset is split into k parts, train on (k-1), test on 1, and repeat k times.
 The average score shows how well the model generalizes.
👉 Prevents overfitting and gives a more reliable accuracy measure.
6. What are Support Vectors in SVM?
Support vectors are the data points closest to the decision boundary (hyperplane).
 They are the most important points because they define where the boundary lies.
👉 Removing them would change the boundary.
7. Explain SVM Algorithm in Detail
SVM (Support Vector Machine) is a supervised ML algorithm used for classification and regression.
 It tries to find the best hyperplane that separates classes with the maximum margin.
 Margin = distance between boundary and the nearest data points (support vectors).
 With kernels (RBF, polynomial), SVM can handle non-linear data.
👉 Example: In email spam detection, SVM finds the decision boundary that separates spam from not spam with the widest
possible gap.

8. What is PCA? When do you use it?
PCA (Principal Component Analysis) is a dimensionality reduction technique.
 It transforms features into fewer “principal components” that capture the maximum variance in data.

, Used when dataset has too many features (high dimensionality).
👉 Example: Reducing 100 features in image recognition down to 20 while still keeping most information.
9. What is ‘Naive’ in a Naive Bayes?
Naive Bayes assumes that all features are independent (no correlation).
 This is a “naive” assumption because in real life, features are often related.
 Despite that, it works surprisingly well in practice.
👉 Example: In spam detection, the model assumes “money” and “win” occur independently, but together they often
indicate spam.
10. What is Unsupervised Learning?
Unsupervised learning is when we train models on unlabeled data (no outputs given).
 The model tries to find patterns, groups, or structures.
👉 Examples:
 K-Means clustering → grouping customers.
 PCA → dimensionality reduction.
11. What is Supervised Learning?
Supervised learning is when we train models on labeled data (inputs + correct outputs).
 The model learns the mapping from input to output.
👉 Examples:
 Classification → spam detection.
 Regression → predicting house prices.
12. What are Different Types of Machine Learning algorithms?
 Supervised Learning → Classification (SVM, Decision Trees, Logistic Regression), Regression (Linear Regression).
 Unsupervised Learning → Clustering (K-Means, DBSCAN), Dimensionality Reduction (PCA).
 Reinforcement Learning → Q-learning, Deep Q-Networks, Policy Gradients.
 Semi-Supervised Learning → Mix of labeled + unlabeled data (Label Propagation, FixMatch).
 Self-Supervised Learning → Model generates its own labels (BERT, GPT, SimCLR).
13. What is F1 score? How would you use it?
 F1 Score = Harmonic mean of Precision and Recall.

F1=2⋅Precision⋅RecallPrecision+RecallF1 = 2 \cdot \frac{Precision \cdot Recall}{Precision +
Recall}F1=2⋅Precision+RecallPrecision⋅Recall
 Used when you need a balance between Precision and Recall, especially for imbalanced datasets.
👉 Example: In medical diagnosis, both catching most patients (recall) and avoiding false alarms (precision) matter.

14. Define Precision and Recall?
 Precision → Out of predicted positives, how many are actually positive?
Precision=TPTP+FPPrecision = \frac{TP}{TP+FP}Precision=TP+FPTP
 Recall (Sensitivity / True Positive Rate) → Out of all actual positives, how many were caught?
Recall=TPTP+FNRecall = \frac{TP}{TP+FN}Recall=TP+FNTP
👉 Example: In spam detection:

, Precision = % of predicted spam that really is spam.
 Recall = % of actual spam emails detected.
15. How to Tackle Overfitting and Underfitting?
 Overfitting solutions: Regularization (L1, L2), Dropout, Cross-validation, Pruning, Early stopping, More data.
 Underfitting solutions: Use more complex model, Add more features, Train longer.
16. What is a Neural Network?
 A neural network is a set of layers (neurons) inspired by the human brain.
 Each neuron applies weights to inputs, sums them, applies activation (like ReLU or Sigmoid), and passes output forward.
 Stacked layers learn complex patterns.
👉 Example: CNN for image classification, RNN for text.
17. What are Loss Function and Cost Functions? Explain the Difference.
 Loss function → Error for a single data point.
 Cost function → Average loss across the whole dataset.
👉 Example: MSE for one house = Loss. Mean MSE over 1000 houses = Cost.
18. What is Ensemble Learning?
 Ensemble = Combining multiple models to improve accuracy.
 Types:
o Bagging (Bootstrap Aggregation, e.g., Random Forest)
o Boosting (XGBoost, AdaBoost, LightGBM)
o Stacking (combining models with a meta-model)
19. How do you make sure which Machine Learning Algorithm to use?
 Depends on:
o Type of problem → Classification, Regression, Clustering.
o Size of data → Large data = Neural Nets, small data = Decision Trees/Logistic Regression.
o Accuracy vs Interpretability → Linear models are simple; Neural Nets are complex but powerful
20. How to Handle Outlier Values?
 Methods:
o Remove them (if data error).
o Cap them (winsorization).
o Transform (log scaling).
o Use robust models (Tree-based models handle outliers well).
21. What is a Random Forest? How does it work?
 Random Forest = Ensemble of many Decision Trees.
 Each tree is trained on a random subset of data + features.
 Final output = majority vote (classification) or average (regression).
👉 Advantage: Reduces overfitting, works well on most problems.
22. What is Collaborative Filtering? And Content-Based Filtering?
 Collaborative Filtering → Uses user-item interactions.
👉 Example: “People who liked movie A also liked movie B.”

,  Content-Based Filtering → Uses item features.
👉 Example: If you liked “Avengers,” system recommends similar action movies.
23. What is Clustering?
 Clustering = Grouping similar data points without labels.
👉 Example: Customer segmentation in marketing.
Algorithms: K-Means, DBSCAN, Hierarchical Clustering.
24. How can you select K for K-means Clustering?
 Methods:
o Elbow Method → Look at plot of inertia vs K, choose where curve bends.
o Silhouette Score → Measures how well clusters are separated.
25. What are Recommender Systems?
 Systems that suggest items to users.
 Types:
o Collaborative filtering
o Content-based filtering
o Hybrid systems
👉 Example: Netflix recommending movies, Amazon recommending products
26. How do you check the Normality of a dataset?
 Visual methods: Histogram, Q-Q plot.
 Statistical tests: Shapiro-Wilk, Kolmogorov-Smirnov test.
 Skewness & Kurtosis measures.
27. Can Logistic Regression be used for more than 2 classes?
 Yes ✅ using Multinomial Logistic Regression (Softmax Regression) or One-vs-Rest (OvR) strategy.
28. Explain Correlation and Covariance.
 Correlation → Standardized measure of relationship between two variables (-1 to +1).
 Covariance → Direction of relationship, not standardized.
👉 Example: Height & Weight
 Positive correlation = Taller → Heavier.
 Covariance = just shows if they vary together, but not strength.
29. What is P-value?
 P-value = Probability of observing results as extreme as current data, assuming the null hypothesis is true.
 Small p-value (< 0.05) → Reject null → result is statistically significant.
30. What are Parametric and Non-Parametric Models?
 Parametric models → Fixed number of parameters. Assume distribution. (Linear Regression, Logistic Regression, Naive
Bayes).
 Non-Parametric models → Flexible, parameters grow with data. (KNN, Decision Trees, Random Forest).
31. What is Reinforcement Learning?
 Agent learns by interacting with environment, receiving rewards/penalties.

Report Copyright Violation

Written for

Institution: Global
Course: CS 229 (MACHINELEARNING)

All documents for this subject (1)

Document information

Uploaded on: November 30, 2025
Number of pages: 81
Written in: 2025/2026
Type: Exam (elaborations)
Contains: Questions & answers

Subjects

machine learning
ml interview questions
ai interview questions
ml concepts
ml notes
ml viva questions
data science interview questions
machine learning exam questions
machine learning study material
a

$3.99

Get access to the full document:

Written by students who passed

Immediately available after payment

Read online or as PDF

Get to know the seller

sajeemaj

Get to know the seller

sajeemaj Sri Ramakrishna Engineering College

View profile

Sold

Member since

5 months

Number of followers

Documents

Last sold

0.0

0 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller sajeemaj. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $3.99. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 47251 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

ML Important Questions for Placements + Data Science Basics

Content preview

Written for

Document information

Subjects

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Working on your references?

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?