Edition by Bilder - Ch. 1-6, 9781439855676, with Rationales
What is the purpose of data analysis?
- To sample data wherever it's found
- To quantify existing data
- To archive data effectively
- To gather insights from raw data - ANSWER: - To gather insights from raw data
To combine functions, use the ________________:
- parentheses
- slash
- pipe operator
- two commas - ANSWER: - pipe operator
Which of the following is NOT true?
- Data formatting ensures that data is consistent and easily understandable.
- In statistics, coherence is an indication of the quality of the information in a single data set.
- Fully coherent data is consistent and can be reliably combined for analysis.
- In a data set, data is usually collected from a single source and stored in a single format. - ANSWER: -
In a data set, data is usually collected from a single source and stored in a single format.
Which of the following is NOT a reason to perform data normalization?
- Enable a fair comparison between different features
- Make some analyses easier
- Minimize the effects of outliers
- Eliminate outliers - ANSWER: - Eliminate outliers
What are descriptive statistics?
- A shallow analysis of data that tells very little about it
- A way of relating different data sets
- A method for showing some basic features of a data set
- A method for revealing details about a data set - ANSWER: - A method for showing some basic
features of a data set
What is the purpose of an ANOVA test?
- It helps find correlations between different groups of a categorical variable.
- It helps compare correlating categories in different data sets.
- It determines which variable is most statistically significant.
- It is not a useful test except in certain specific cases. - ANSWER: - It helps find correlations between
different groups of a categorical variable.
Which of the following is NOT true about a model?
- A model cannot predict a value given only one other value.
- Models work by relating one or more independent variables to dependent variables.
- The more data you have, the more accurate your model will be. x
- Different types of models may be more accurate in different situations. - ANSWER: - Models work by
relating one or more independent variables to dependent variables.
A positive correlation is one in which _______________.
- a causative relationship is shown x
- both variables move in the same direction x
- only one variable moves
- both variables move in opposite directions - ANSWER: - only one variable moves
, Which is NOT true for comparing multiple linear regression (MLR) and simple linear regression (SLR)?
Polynomial regression will have a smaller MSE than regular regression.
A lower mean squared error (MSE) always implies a better fit.
R2 will have a smaller MSE.
The MSE for an MLR model will be smaller than the MSE for an SLR model. - ANSWER: A lower mean
squared error (MSE) always implies a better fit.
A testing set is _________.
a selected portion of the data set that is known to function well within the model
multiple data sets that have been run on the model
a large portion of a data set that is used to build a sound model
a small portion of a data set that is used to see whether a model works - ANSWER: a small portion of a
data set that is used to see whether a model works
Which of the following is NOT true?
Coherence is irrelevant when assessing the quality of a data set.
Data formatting ensures that data is consistent and easily understandable.
Data is collected from different sources and may be stored in different formats.
Fully coherent data is consistent and can be reliably combined for analysis. - ANSWER: Coherence is
irrelevant when assessing the quality of a data set.
What is the F-test score?
It measures level of confidence in results.
It is the variance between sample group means divided by variance within the sample group.
If it is small, it indicates a strong correlation between variable categories and the target variable.
It's a measure of whether variance is statistically significant. x - ANSWER: It is the variance between
sample group means divided by variance within the sample group.
Which of these is NOT a method for normalizing data?
Simple feature scaling
Compound Y
Z-score
Min-max - ANSWER: Compound Y
Which of the following is NOT a task facilitated by R?
Model evaluation
Data generation
Model development
Data cleaning - ANSWER: Data generation
Functions contained in packages such as dplyr are used to:
Prevent unwanted operations
Identify users of the data set
Select a data set to use
Perform common operations - ANSWER: Perform common operations
What does a P-score measure?
It is the ratio of variance between group means over the variance within each of the sample group
means.
If it is large, it indicates a strong correlation between variable categories and the target variable.
It indicates whether the ANOVA test result is statistically significant.
It indicates the validity of data within a data set - ANSWER: It indicates whether the ANOVA test result
is statistically significant.
Data analysis plays an important role in which of the following scenarios? Select 3 answers.
Finding data.
Predicting the future.