Questions and Answers Already Passed
What is the essence of data science?
✔✔It’s about transforming raw data into actionable insights that drive decisions.
How do data scientists deal with messy data?
✔✔By cleaning, transforming, and organizing it to reveal its full potential.
What does the term "feature engineering" mean?
✔✔The process of creating new features or modifying existing ones to improve the model's
performance.
How does a machine learning model "learn"?
✔✔By finding patterns and relationships within the data to make predictions or decisions.
What is the purpose of a loss function in machine learning?
✔✔To quantify how far off a model's predictions are from the actual outcomes, guiding
improvements.
1
,How does a neural network mimic the brain?
✔✔It uses layers of interconnected nodes to process information, similar to how neurons
communicate.
What does "scaling" mean in the context of machine learning?
✔✔Adjusting the features so they are on a similar scale, allowing models to learn more
efficiently.
What is a deep learning model?
✔✔A type of neural network with many layers that allows for complex pattern recognition.
Why do data scientists often use random forests?
✔✔Because they combine multiple decision trees to improve accuracy and reduce overfitting.
How do you handle missing data in a dataset?
✔✔By imputing, deleting, or flagging the missing data, depending on the nature of the analysis.
2
,What is a hyperparameter in machine learning?
✔✔A configuration setting used to control the model’s learning process, like the number of trees
in a random forest.
Why is data visualization important in data science?
✔✔It helps turn complex data into easy-to-understand visuals, revealing trends and insights at a
glance.
What makes clustering algorithms so special?
✔✔They group similar data points together, finding hidden structures without requiring labeled
data.
What is the relationship between bias and variance in machine learning?
✔✔Bias refers to error due to overly simplistic models, while variance refers to error due to
models that are too complex.
What is the "black box" problem in machine learning?
✔✔It refers to models, like deep neural networks, whose decision-making process is difficult to
interpret or understand.
3
, What is the significance of precision and recall in a classification task?
✔✔Precision ensures accurate positive predictions, while recall ensures the model doesn’t miss
any true positives.
How do you know when your model is "good enough"?
✔✔By testing its performance on unseen data and ensuring it generalizes well beyond the
training set.
Why is feature selection important?
✔✔It helps eliminate irrelevant features, improving the model's speed and accuracy.
What’s the difference between batch processing and real-time processing in data science?
✔✔Batch processing handles data in chunks, while real-time processing handles data as it
arrives.
What role does the "training set" play in machine learning?
✔✔It’s the data the model uses to learn the patterns and relationships that predict outcomes.
4