Verified 100% Correct
When you can collect data quickly. When the data is representative and the amount of
data is small compared to the whole population
Do you have to decide the sample size ahead of time for A/B tests
no, and we can run the hypothesis test anytime we want What is
full factorial design
you test every combination and then use ANOVA to determine importance of each
factor
What is fractional factorial design when you test a
subset of the entire set of combinations What is a
balanced design?
You test each choice the same # of times and each pair of choices the same # of times
When is regression effective work well to determine important factors?
If there aren't significant interactions between the factors. what
is exploration?
focusing on getting more information; in this case, to determine with more certainty
which ad is really the best what is exploitation
we're focused on getting immediate value; in this example, to show the add that seems
to be doing best so far, because it seems to be most likely to be clicked.
what is the multi-armed bandit approach and how does it balance exploration and
exploitation.
We start with no info and have an equal probability of selecting each alternative. After
performing some tests, we've gotten more information, so we can update the
, probabilities of each one being best and start assigning new tests according to those
probabilities. We keep testing multiple alternatives; so, we're still doing exploration. But
we make it more likely to pick the best ones so we're also doing exploitation What are
some of the parameters in the multi-armed bandit approach
number of tests between recalculating probabilities; how to update the probabilities; and
how to pick an alternative to test based on probabilities and/or expected values. For
updating we can use bayesian updates or estimate from the observed distribution What
are common reasons that data sets are missing values?
* a person accidentally types in the wrong value
* a person did not want to reveal the true value
* an automated system did not work correctly to record the value
What are some examples of why there might be bias in missing data
* Income: people with higher incomes are less likely to omit this answer * Radar gun: a
car that passes the radar gun very slowly might be treated as an anomaly and its
speed might be recorded in the system
* Heart transplants: If there's a variable "date of death" it will be missing for patients still
living and thus the missing data will naturally include more successful transplant cases
What are three ways of dealing with missing data that don't require imputation discard
the data, use categorical variables to indicate missing data, estimate missing values
What are the pros and cons of throwing away missing data
Pros: not potentially introducing errors; easy to implement
Cons: don't want to lose to many data points; potential for censored or biased missing
data
What is the categorical variable approach
If the data is categorical, we just add another category "missing". With quantitative
variables you include interactions variables between the categorical variable and other
variables.
Why wouldn't you want to fill in missing quantitative variabes with 0
It can lead to problems if some types of data points are more likely than others to have
missing data. The coefficients of the other variables might be pulled in one direction or
another to try to account for the missing data