Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Exam (elaborations)

ML Important Questions for Placements + Data Science Basics

Rating
-
Sold
-
Pages
81
Grade
A
Uploaded on
30-11-2025
Written in
2025/2026

This document provides a complete set of Machine Learning (ML) interview questions and answers, ideal for placements, internships, and data-science-related job interviews. It covers all essential ML topics such as supervised learning, unsupervised learning, regression, classification, clustering, feature engineering, model evaluation metrics, overfitting, regularization, bias–variance tradeoff, gradient descent, SVM, decision trees, ensemble methods, neural networks, and more. All concepts are explained in a simple, clear, and easy-to-remember manner, making it perfect for quick revision. This document is suitable for students preparing for AI, ML, and data science roles, as well as for university viva, exams, and interview preparation. Accurate, well-structured, and interview-focused — this study material helps you confidently answer frequently asked ML questions in real interviews.

Show more Read less
Institution
Course

Content preview

1. What are Different Kernels in SVM?
In SVM, a kernel is a function that transforms input data into a higher-dimensional space so it can be separated more easily.
 Linear Kernel → Used when data is linearly separable.
 Polynomial Kernel → Works for non-linear data by adding polynomial features.
 RBF (Radial Basis Function / Gaussian) → Most common, works well for complex boundaries.
 Sigmoid Kernel → Similar to neural networks, less used today.
👉 Example: For image classification, RBF is commonly used.
2. Why was Machine Learning Introduced?
Traditional programming means we give rules + data → output. But for complex problems (like speech recognition, image
classification), writing rules manually is impossible.
So, Machine Learning was introduced to let computers learn patterns from data automatically instead of being explicitly
programmed.
3. Explain the Difference Between Classification and Regression?
 Classification → Predicts categories (discrete outputs).
👉 Example: Spam or Not Spam.
 Regression → Predicts continuous values.
👉 Example: Predicting house prices.
4. What is Bias in Machine Learning?
Bias is the error that comes from using a simplified model that cannot capture the actual patterns in data.
👉 Example: Fitting a straight line to data that follows a curve → high bias → underfitting.
5. What is Cross-Validation?
Cross-validation is a technique to evaluate model performance by splitting data into multiple parts.
 In k-fold cross-validation, the dataset is split into k parts, train on (k-1), test on 1, and repeat k times.
 The average score shows how well the model generalizes.
👉 Prevents overfitting and gives a more reliable accuracy measure.
6. What are Support Vectors in SVM?
Support vectors are the data points closest to the decision boundary (hyperplane).
 They are the most important points because they define where the boundary lies.
👉 Removing them would change the boundary.
7. Explain SVM Algorithm in Detail
SVM (Support Vector Machine) is a supervised ML algorithm used for classification and regression.
 It tries to find the best hyperplane that separates classes with the maximum margin.
 Margin = distance between boundary and the nearest data points (support vectors).
 With kernels (RBF, polynomial), SVM can handle non-linear data.
👉 Example: In email spam detection, SVM finds the decision boundary that separates spam from not spam with the widest
possible gap.


8. What is PCA? When do you use it?
PCA (Principal Component Analysis) is a dimensionality reduction technique.
 It transforms features into fewer “principal components” that capture the maximum variance in data.

, Used when dataset has too many features (high dimensionality).
👉 Example: Reducing 100 features in image recognition down to 20 while still keeping most information.
9. What is ‘Naive’ in a Naive Bayes?
Naive Bayes assumes that all features are independent (no correlation).
 This is a “naive” assumption because in real life, features are often related.
 Despite that, it works surprisingly well in practice.
👉 Example: In spam detection, the model assumes “money” and “win” occur independently, but together they often
indicate spam.
10. What is Unsupervised Learning?
Unsupervised learning is when we train models on unlabeled data (no outputs given).
 The model tries to find patterns, groups, or structures.
👉 Examples:
 K-Means clustering → grouping customers.
 PCA → dimensionality reduction.
11. What is Supervised Learning?
Supervised learning is when we train models on labeled data (inputs + correct outputs).
 The model learns the mapping from input to output.
👉 Examples:
 Classification → spam detection.
 Regression → predicting house prices.
12. What are Different Types of Machine Learning algorithms?
 Supervised Learning → Classification (SVM, Decision Trees, Logistic Regression), Regression (Linear Regression).
 Unsupervised Learning → Clustering (K-Means, DBSCAN), Dimensionality Reduction (PCA).
 Reinforcement Learning → Q-learning, Deep Q-Networks, Policy Gradients.
 Semi-Supervised Learning → Mix of labeled + unlabeled data (Label Propagation, FixMatch).
 Self-Supervised Learning → Model generates its own labels (BERT, GPT, SimCLR).
13. What is F1 score? How would you use it?
 F1 Score = Harmonic mean of Precision and Recall.

F1=2⋅Precision⋅RecallPrecision+RecallF1 = 2 \cdot \frac{Precision \cdot Recall}{Precision +
Recall}F1=2⋅Precision+RecallPrecision⋅Recall
 Used when you need a balance between Precision and Recall, especially for imbalanced datasets.
👉 Example: In medical diagnosis, both catching most patients (recall) and avoiding false alarms (precision) matter.


14. Define Precision and Recall?
 Precision → Out of predicted positives, how many are actually positive?
Precision=TPTP+FPPrecision = \frac{TP}{TP+FP}Precision=TP+FPTP
 Recall (Sensitivity / True Positive Rate) → Out of all actual positives, how many were caught?
Recall=TPTP+FNRecall = \frac{TP}{TP+FN}Recall=TP+FNTP
👉 Example: In spam detection:

, Precision = % of predicted spam that really is spam.
 Recall = % of actual spam emails detected.
15. How to Tackle Overfitting and Underfitting?
 Overfitting solutions: Regularization (L1, L2), Dropout, Cross-validation, Pruning, Early stopping, More data.
 Underfitting solutions: Use more complex model, Add more features, Train longer.
16. What is a Neural Network?
 A neural network is a set of layers (neurons) inspired by the human brain.
 Each neuron applies weights to inputs, sums them, applies activation (like ReLU or Sigmoid), and passes output forward.
 Stacked layers learn complex patterns.
👉 Example: CNN for image classification, RNN for text.
17. What are Loss Function and Cost Functions? Explain the Difference.
 Loss function → Error for a single data point.
 Cost function → Average loss across the whole dataset.
👉 Example: MSE for one house = Loss. Mean MSE over 1000 houses = Cost.
18. What is Ensemble Learning?
 Ensemble = Combining multiple models to improve accuracy.
 Types:
o Bagging (Bootstrap Aggregation, e.g., Random Forest)
o Boosting (XGBoost, AdaBoost, LightGBM)
o Stacking (combining models with a meta-model)
19. How do you make sure which Machine Learning Algorithm to use?
 Depends on:
o Type of problem → Classification, Regression, Clustering.
o Size of data → Large data = Neural Nets, small data = Decision Trees/Logistic Regression.
o Accuracy vs Interpretability → Linear models are simple; Neural Nets are complex but powerful
20. How to Handle Outlier Values?
 Methods:
o Remove them (if data error).
o Cap them (winsorization).
o Transform (log scaling).
o Use robust models (Tree-based models handle outliers well).
21. What is a Random Forest? How does it work?
 Random Forest = Ensemble of many Decision Trees.
 Each tree is trained on a random subset of data + features.
 Final output = majority vote (classification) or average (regression).
👉 Advantage: Reduces overfitting, works well on most problems.
22. What is Collaborative Filtering? And Content-Based Filtering?
 Collaborative Filtering → Uses user-item interactions.
👉 Example: “People who liked movie A also liked movie B.”

,  Content-Based Filtering → Uses item features.
👉 Example: If you liked “Avengers,” system recommends similar action movies.
23. What is Clustering?
 Clustering = Grouping similar data points without labels.
👉 Example: Customer segmentation in marketing.
Algorithms: K-Means, DBSCAN, Hierarchical Clustering.
24. How can you select K for K-means Clustering?
 Methods:
o Elbow Method → Look at plot of inertia vs K, choose where curve bends.
o Silhouette Score → Measures how well clusters are separated.
25. What are Recommender Systems?
 Systems that suggest items to users.
 Types:
o Collaborative filtering
o Content-based filtering
o Hybrid systems
👉 Example: Netflix recommending movies, Amazon recommending products
26. How do you check the Normality of a dataset?
 Visual methods: Histogram, Q-Q plot.
 Statistical tests: Shapiro-Wilk, Kolmogorov-Smirnov test.
 Skewness & Kurtosis measures.
27. Can Logistic Regression be used for more than 2 classes?
 Yes ✅ using Multinomial Logistic Regression (Softmax Regression) or One-vs-Rest (OvR) strategy.
28. Explain Correlation and Covariance.
 Correlation → Standardized measure of relationship between two variables (-1 to +1).
 Covariance → Direction of relationship, not standardized.
👉 Example: Height & Weight
 Positive correlation = Taller → Heavier.
 Covariance = just shows if they vary together, but not strength.
29. What is P-value?
 P-value = Probability of observing results as extreme as current data, assuming the null hypothesis is true.
 Small p-value (< 0.05) → Reject null → result is statistically significant.
30. What are Parametric and Non-Parametric Models?
 Parametric models → Fixed number of parameters. Assume distribution. (Linear Regression, Logistic Regression, Naive
Bayes).
 Non-Parametric models → Flexible, parameters grow with data. (KNN, Decision Trees, Random Forest).
31. What is Reinforcement Learning?
 Agent learns by interacting with environment, receiving rewards/penalties.

Written for

Institution
Course

Document information

Uploaded on
November 30, 2025
Number of pages
81
Written in
2025/2026
Type
Exam (elaborations)
Contains
Questions & answers

Subjects

$3.99
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF

Get to know the seller
Seller avatar
sajeemaj

Get to know the seller

Seller avatar
sajeemaj Sri Ramakrishna Engineering College
Follow You need to be logged in order to follow users or courses
Sold
-
Member since
5 months
Number of followers
0
Documents
4
Last sold
-

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Working on your references?

Create accurate citations in APA, MLA and Harvard with our free citation generator.

Working on your references?

Frequently asked questions