update} QUESTIONS AND ANSWERS 100%
CORRECT
Missing values can be imputed/replaced with other values. If my dataset has 1000
rows, and 200 missing values for the category age. What could I impute for age? (This
question is not asking which of values you SHOULD use. Just what you COULD use) -
correct answer Impute the mean
Impute the most common value
Impute the median
Assume all packages are loaded that need to be to make the code run successfully.
,Assume the test data ('test.csv') is loaded into your environment and named 'df'. So
something like
Df = pd.read_csv("../data/test.csv")
What would produce the first 7 rows from a dataframe? - correct answer df.head(7)
Or you pick df.head with ONE plus however many numbers shown in problem
Price is a variable we are interested in building a model on (later, once we've learned
that stuff) that makes missing values and outliers particularly important to address. If
price has an outlier variable that is really really extreme, what should we do with it?
(the choices I am offering you below are very narrow. There is obviously more we
could do... But given what you see in the dataset, and what I have said before about this
issue... What would you do???) - correct answer Delete those rows
,Let's take a look at the homes dataset.
What is the mean Acreage of the homes?
How many records (just the count of rows) does the dataset have?What is the mean
totalheatedsqft in this dataset?What is the mean totalbedrooms?What is the mean Total
Bathrooms? - correct answer .267
7478
2708
2.9
We are learning a lot from our exploration of the homes dataset. Let's take a look at
price, this is listed as "lastsaleprice".
, What strikes you as weird about the price variable in this dataset? (this is a non
technical way of asking you what the outlier is)
(choose the best answer) - correct answer Many homes appear to have been sold for
100
Price is a variable we are interested in building a model on (later, once we've learned
that stuff) that makes missing values and outliers particularly important to address. If
price has an outlier variable that is really really extreme, what should we do with it -
correct answer Delete those rows
In the titanic dataset we used in the videos: