Data Science correct answers Extract knowledge from large-scale data and use it for future
purposes such as prediction, decision making, or recommendation
Why is Data Science an important topic these days correct answers 1. New sources of data that
not exist before
2. New capabilities to acquire, store, and process data
3. New algorithms and methods to analyze data
Ingredients of Data Science correct answers Data, ML algorithms, Big Data manipulation
techniques
Machine Learning correct answers learn from past data to predictions on future data
Observation correct answers data sample
Features correct answers attributes that represent an observation
Labels correct answers values assigned to observations
Training Data correct answers past observations given to the ML algorithm for training
Testing/Prediction Data correct answers observations given to a predictive model for prediction
Training Stage correct answers building a predictive model based on the training dataset
Testing Stage correct answers applying the trained model to forecast what is going to happen in
the future
, Supervised Learning correct answers learning from labeled observations
Unsupervised Learning correct answers learning from unlabeled observations
Semi-Supervised Learning correct answers labels are provided only for a part of the training
data. The remaining data is unlabeled
Reinforcement Learning correct answers learning from an agent taking actions in an environment
so as to maximize a long-term reward
Transfer Learning correct answers learning from a dataset while solving a problem, and then
applying the extracted knowledge to a different but related dataset/problem
Active Learning correct answers similar to semi-supervised learning, but the algorithm is able to
interactively query the user or some other information source to obtain the labels as needed
What are the two important approaches to supervised learning correct answers Classification:
predict a discrete valued output for each observation
Regression: predict a continuous valued output for each observation
KNN Classifier correct answers classification algorithm that classifies objects based on the
closest training samples in the feature space
KNN advantages and disadvantages correct answers Advantages: easy and simple to implement,
low computational complexity
Disadvantages: computationally intensive for large-scale problems: inefficient for Big Data,
choosing best k is challenging