In SVM, a kernel is a function that transforms input data into a higher-dimensional space so it can be separated more easily.
Linear Kernel → Used when data is linearly separable.
Polynomial Kernel → Works for non-linear data by adding polynomial features.
RBF (Radial Basis Function / Gaussian) → Most common, works well for complex boundaries.
Sigmoid Kernel → Similar to neural networks, less used today.
👉 Example: For image classification, RBF is commonly used.
2. Why was Machine Learning Introduced?
Traditional programming means we give rules + data → output. But for complex problems (like speech recognition, image
classification), writing rules manually is impossible.
So, Machine Learning was introduced to let computers learn patterns from data automatically instead of being explicitly
programmed.
3. Explain the Difference Between Classification and Regression?
Classification → Predicts categories (discrete outputs).
👉 Example: Spam or Not Spam.
Regression → Predicts continuous values.
👉 Example: Predicting house prices.
4. What is Bias in Machine Learning?
Bias is the error that comes from using a simplified model that cannot capture the actual patterns in data.
👉 Example: Fitting a straight line to data that follows a curve → high bias → underfitting.
5. What is Cross-Validation?
Cross-validation is a technique to evaluate model performance by splitting data into multiple parts.
In k-fold cross-validation, the dataset is split into k parts, train on (k-1), test on 1, and repeat k times.
The average score shows how well the model generalizes.
👉 Prevents overfitting and gives a more reliable accuracy measure.
6. What are Support Vectors in SVM?
Support vectors are the data points closest to the decision boundary (hyperplane).
They are the most important points because they define where the boundary lies.
👉 Removing them would change the boundary.
7. Explain SVM Algorithm in Detail
SVM (Support Vector Machine) is a supervised ML algorithm used for classification and regression.
It tries to find the best hyperplane that separates classes with the maximum margin.
Margin = distance between boundary and the nearest data points (support vectors).
With kernels (RBF, polynomial), SVM can handle non-linear data.
👉 Example: In email spam detection, SVM finds the decision boundary that separates spam from not spam with the widest
possible gap.
8. What is PCA? When do you use it?
PCA (Principal Component Analysis) is a dimensionality reduction technique.
It transforms features into fewer “principal components” that capture the maximum variance in data.
, Used when dataset has too many features (high dimensionality).
👉 Example: Reducing 100 features in image recognition down to 20 while still keeping most information.
9. What is ‘Naive’ in a Naive Bayes?
Naive Bayes assumes that all features are independent (no correlation).
This is a “naive” assumption because in real life, features are often related.
Despite that, it works surprisingly well in practice.
👉 Example: In spam detection, the model assumes “money” and “win” occur independently, but together they often
indicate spam.
10. What is Unsupervised Learning?
Unsupervised learning is when we train models on unlabeled data (no outputs given).
The model tries to find patterns, groups, or structures.
👉 Examples:
K-Means clustering → grouping customers.
PCA → dimensionality reduction.
11. What is Supervised Learning?
Supervised learning is when we train models on labeled data (inputs + correct outputs).
The model learns the mapping from input to output.
👉 Examples:
Classification → spam detection.
Regression → predicting house prices.
12. What are Different Types of Machine Learning algorithms?
Supervised Learning → Classification (SVM, Decision Trees, Logistic Regression), Regression (Linear Regression).
Unsupervised Learning → Clustering (K-Means, DBSCAN), Dimensionality Reduction (PCA).
Reinforcement Learning → Q-learning, Deep Q-Networks, Policy Gradients.
Semi-Supervised Learning → Mix of labeled + unlabeled data (Label Propagation, FixMatch).
Self-Supervised Learning → Model generates its own labels (BERT, GPT, SimCLR).
13. What is F1 score? How would you use it?
F1 Score = Harmonic mean of Precision and Recall.
F1=2⋅Precision⋅RecallPrecision+RecallF1 = 2 \cdot \frac{Precision \cdot Recall}{Precision +
Recall}F1=2⋅Precision+RecallPrecision⋅Recall
Used when you need a balance between Precision and Recall, especially for imbalanced datasets.
👉 Example: In medical diagnosis, both catching most patients (recall) and avoiding false alarms (precision) matter.
14. Define Precision and Recall?
Precision → Out of predicted positives, how many are actually positive?
Precision=TPTP+FPPrecision = \frac{TP}{TP+FP}Precision=TP+FPTP
Recall (Sensitivity / True Positive Rate) → Out of all actual positives, how many were caught?
Recall=TPTP+FNRecall = \frac{TP}{TP+FN}Recall=TP+FNTP
👉 Example: In spam detection:
, Precision = % of predicted spam that really is spam.
Recall = % of actual spam emails detected.
15. How to Tackle Overfitting and Underfitting?
Overfitting solutions: Regularization (L1, L2), Dropout, Cross-validation, Pruning, Early stopping, More data.
Underfitting solutions: Use more complex model, Add more features, Train longer.
16. What is a Neural Network?
A neural network is a set of layers (neurons) inspired by the human brain.
Each neuron applies weights to inputs, sums them, applies activation (like ReLU or Sigmoid), and passes output forward.
Stacked layers learn complex patterns.
👉 Example: CNN for image classification, RNN for text.
17. What are Loss Function and Cost Functions? Explain the Difference.
Loss function → Error for a single data point.
Cost function → Average loss across the whole dataset.
👉 Example: MSE for one house = Loss. Mean MSE over 1000 houses = Cost.
18. What is Ensemble Learning?
Ensemble = Combining multiple models to improve accuracy.
Types:
o Bagging (Bootstrap Aggregation, e.g., Random Forest)
o Boosting (XGBoost, AdaBoost, LightGBM)
o Stacking (combining models with a meta-model)
19. How do you make sure which Machine Learning Algorithm to use?
Depends on:
o Type of problem → Classification, Regression, Clustering.
o Size of data → Large data = Neural Nets, small data = Decision Trees/Logistic Regression.
o Accuracy vs Interpretability → Linear models are simple; Neural Nets are complex but powerful
20. How to Handle Outlier Values?
Methods:
o Remove them (if data error).
o Cap them (winsorization).
o Transform (log scaling).
o Use robust models (Tree-based models handle outliers well).
21. What is a Random Forest? How does it work?
Random Forest = Ensemble of many Decision Trees.
Each tree is trained on a random subset of data + features.
Final output = majority vote (classification) or average (regression).
👉 Advantage: Reduces overfitting, works well on most problems.
22. What is Collaborative Filtering? And Content-Based Filtering?
Collaborative Filtering → Uses user-item interactions.
👉 Example: “People who liked movie A also liked movie B.”
, Content-Based Filtering → Uses item features.
👉 Example: If you liked “Avengers,” system recommends similar action movies.
23. What is Clustering?
Clustering = Grouping similar data points without labels.
👉 Example: Customer segmentation in marketing.
Algorithms: K-Means, DBSCAN, Hierarchical Clustering.
24. How can you select K for K-means Clustering?
Methods:
o Elbow Method → Look at plot of inertia vs K, choose where curve bends.
o Silhouette Score → Measures how well clusters are separated.
25. What are Recommender Systems?
Systems that suggest items to users.
Types:
o Collaborative filtering
o Content-based filtering
o Hybrid systems
👉 Example: Netflix recommending movies, Amazon recommending products
26. How do you check the Normality of a dataset?
Visual methods: Histogram, Q-Q plot.
Statistical tests: Shapiro-Wilk, Kolmogorov-Smirnov test.
Skewness & Kurtosis measures.
27. Can Logistic Regression be used for more than 2 classes?
Yes ✅ using Multinomial Logistic Regression (Softmax Regression) or One-vs-Rest (OvR) strategy.
28. Explain Correlation and Covariance.
Correlation → Standardized measure of relationship between two variables (-1 to +1).
Covariance → Direction of relationship, not standardized.
👉 Example: Height & Weight
Positive correlation = Taller → Heavier.
Covariance = just shows if they vary together, but not strength.
29. What is P-value?
P-value = Probability of observing results as extreme as current data, assuming the null hypothesis is true.
Small p-value (< 0.05) → Reject null → result is statistically significant.
30. What are Parametric and Non-Parametric Models?
Parametric models → Fixed number of parameters. Assume distribution. (Linear Regression, Logistic Regression, Naive
Bayes).
Non-Parametric models → Flexible, parameters grow with data. (KNN, Decision Trees, Random Forest).
31. What is Reinforcement Learning?
Agent learns by interacting with environment, receiving rewards/penalties.