INSTRUCTIONS FOR QUESTIONS 1-5
For each of the following five questions, select the probability distribution that could best
be used to model the described scenario. Each distribution might be used, zero, one, or
more than one time in the five questions.
These scenarios are meant to be simple and straightforward; if you're an expert in the
field the question asks about, please do not rely on your expertise to fill in all the extra
complexity (you'll end up making the questions below more difficult than I intended).
Move To...
Question 1
1..4 pts
Number of people clicking an online banner ad each hour
Binomial
Exponentia
l
Geometric
Correct!
Poisson
Weibull
Move To...
Question 2
1..4 pts
Time from when a generator is turned on until it
fails Binomial
,Exponential
Geometri
c Poisson
Correct!
Weibull
Move To...
Question 3
1..4 pts
Number of hits to a real estate web site each
minute Binomial
Exponential
Geometric
Correct!
Poisson
Weibull
,Move To...
Question 4
1..4 pts
Number of people entering a grocery store each
minute Binomial
Exponential
Geometric
Correct!
Poisson
Weibull
Move To...
Question 5
1..4 pts
Time between hits on a real estate web site
Binomial
Correct!
Exponentia
l
Geometric
, Poisson
Weibull
Move
To...
INFORMATION FOR QUESTIONS 6-7
Five classification models were built for predicting whether a neighborhood will soon see
a large rise in home prices, based on public elementary school ratings and other factors.
The training data set was missing the school rating variable for every new school (3% of
the data points).
Because ratings are unavailable for newly-opened schools, it is believed that
locations that have recently experienced high population growth are more likely to
have missing school rating data.
• Model 1 used imputation, filling in the missing data with the average
school rating from the rest of the data.
• Model 2 used imputation, building a regression model to fill in the missing
school rating data based on other variables.
• Model 3 used imputation, first building a classification model to estimate
(based on other variables) whether a new school is likely to have been built
as a result of recent population growth (or whether it has been built for
another purpose, e.g. to replace a very old school), and then using that
classification to select one of two regression models to fill in an estimate of the
school rating; there are two different regression models (based on other
variables), one for neighborhoods with new schools built due to population
growth, and one for neighborhoods with new schools built for other reasons.
• Model 4 used a binary variable to identify locations with missing information.
• Model 5 used a categorical variable: first, a classification model was used
to estimate whether a new school is likely to have been built as a result of
recent population growth; and then each neighborhood was categorized
as "data available", "missing, population growth", or "missing, other
reason".