Badm 211 HWK questions
Display rows of data – answer credit_df.head(17)
Choose the answer that best describes the display you have created:
A) There are six numeric predictors, four categorical predictors, and a numeric response
variable.B) There are six numeric predictors, five categorical predictors, and a numeric
response variable.C) There are two predictors measured in currency, four predictors
measured in integer values, and four categorical predictors.D) There are three
predictors measured in currency, three predictors measured in integer values, and four
categorical predictors. - answerC) There are two predictors measured in currency, four
predictors measured in integer values, and four categorical predictors
description of the dataset showing the number of samples, mean, standard deviation,
minimum, quartiles, and maximum values for all attributes – answer credit_df.describe()
onvert all categorical variables to their dummy-coded values in the same dataframe
using the all-inclusive k method. Use underscore as a prefix separator. -
answercredit_df = pd.get_dummies(credit_df, prefix_sep='_')
print(credit_df.columns)
Catergorical Data v. numeric - answerdummies will show as catergory_subtype for
catergorical data
credit_df.plot.scatter(x='Income',y="Balance",)
plt.legend() - answerSCATTER PLOT
dataForPlot = credit_df.groupby('Gender_Male').mean().Balance.plot(kind='bar',
figsize=[5,3])
dataForPlot.set_ylabel('Avg. Balance') - answerBAR CHART
ax = credit_df.Balance.hist()
ax.set_xlabel('Balance')
ax.set_ylabel('count')
plt.show() - answerHISTOGRAM
ax = credit_df.boxplot(column="Balance", by="Education")
ax.set_ylabel('Balance')
plt.suptitle('')
plt.title('')
plt.show() - answerBOXPLOT
corr = credit_df.corr()
, sns.heatmap(corr, xticklabels=corr.columns, yticklabels=corr.columns) -
answerHEATMAP
The highest bivariate correlations are between binary categorical variables. -
answerTRUE
corr = credit_df.corr()
plt.show() - answerCORRELATION CHART
pd.pivot_table(credit_df, values='Balance', index=['Education_bin'],
columns=['Gender_Male'], aggfunc=np.mean, margins=False) - answerPIVOT TABLE
When creating a scatterplot between two variables with many overlapping points,
__________ can be used to make the plot more interpretable. - answerjitter
Adding features to a multivariate model may result in ________________. - answerthe
curse of dimensionality
A popular aggregation technique for time-series data is a _____________. -
answermoving average
Gertrude from the Economics section has requested that you create predictive model to
estimate which of four levels of income a potential customer will have. She has provided
you with 11 variables, so you will need _________ samples to achieve acceptable
accuracy. - answer264
Which of the following is an example of prediction? - answerforecasting sales
A categorical variable may be best defined as a ______________. - answerpredictor
variable
The BostonHousing dataset has two dependent variables, MEDV and CAT_MEDV. The
first is a numerical measure of housing values while the second is a binomial
classification with the class boundary set at $30,000. You wish to use linear regression
for predicting MEDV and logistic regression for predicting CAT_MEDV, but you must
encode the categorical variables with one less choice for each category because of
________________. - answermulticollinearity
If you have 20 predictors and 2 classes, then you'll need a minimum of _________
cases. - answer240
he use of color in a heatmap makes it easy to identify _______________ - answerhigh
and low correlations
As variables are added to a multivariate model, the data space becomes _________ -
answersparse
Display rows of data – answer credit_df.head(17)
Choose the answer that best describes the display you have created:
A) There are six numeric predictors, four categorical predictors, and a numeric response
variable.B) There are six numeric predictors, five categorical predictors, and a numeric
response variable.C) There are two predictors measured in currency, four predictors
measured in integer values, and four categorical predictors.D) There are three
predictors measured in currency, three predictors measured in integer values, and four
categorical predictors. - answerC) There are two predictors measured in currency, four
predictors measured in integer values, and four categorical predictors
description of the dataset showing the number of samples, mean, standard deviation,
minimum, quartiles, and maximum values for all attributes – answer credit_df.describe()
onvert all categorical variables to their dummy-coded values in the same dataframe
using the all-inclusive k method. Use underscore as a prefix separator. -
answercredit_df = pd.get_dummies(credit_df, prefix_sep='_')
print(credit_df.columns)
Catergorical Data v. numeric - answerdummies will show as catergory_subtype for
catergorical data
credit_df.plot.scatter(x='Income',y="Balance",)
plt.legend() - answerSCATTER PLOT
dataForPlot = credit_df.groupby('Gender_Male').mean().Balance.plot(kind='bar',
figsize=[5,3])
dataForPlot.set_ylabel('Avg. Balance') - answerBAR CHART
ax = credit_df.Balance.hist()
ax.set_xlabel('Balance')
ax.set_ylabel('count')
plt.show() - answerHISTOGRAM
ax = credit_df.boxplot(column="Balance", by="Education")
ax.set_ylabel('Balance')
plt.suptitle('')
plt.title('')
plt.show() - answerBOXPLOT
corr = credit_df.corr()
, sns.heatmap(corr, xticklabels=corr.columns, yticklabels=corr.columns) -
answerHEATMAP
The highest bivariate correlations are between binary categorical variables. -
answerTRUE
corr = credit_df.corr()
plt.show() - answerCORRELATION CHART
pd.pivot_table(credit_df, values='Balance', index=['Education_bin'],
columns=['Gender_Male'], aggfunc=np.mean, margins=False) - answerPIVOT TABLE
When creating a scatterplot between two variables with many overlapping points,
__________ can be used to make the plot more interpretable. - answerjitter
Adding features to a multivariate model may result in ________________. - answerthe
curse of dimensionality
A popular aggregation technique for time-series data is a _____________. -
answermoving average
Gertrude from the Economics section has requested that you create predictive model to
estimate which of four levels of income a potential customer will have. She has provided
you with 11 variables, so you will need _________ samples to achieve acceptable
accuracy. - answer264
Which of the following is an example of prediction? - answerforecasting sales
A categorical variable may be best defined as a ______________. - answerpredictor
variable
The BostonHousing dataset has two dependent variables, MEDV and CAT_MEDV. The
first is a numerical measure of housing values while the second is a binomial
classification with the class boundary set at $30,000. You wish to use linear regression
for predicting MEDV and logistic regression for predicting CAT_MEDV, but you must
encode the categorical variables with one less choice for each category because of
________________. - answermulticollinearity
If you have 20 predictors and 2 classes, then you'll need a minimum of _________
cases. - answer240
he use of color in a heatmap makes it easy to identify _______________ - answerhigh
and low correlations
As variables are added to a multivariate model, the data space becomes _________ -
answersparse