Geschreven door studenten die geslaagd zijn Direct beschikbaar na je betaling Online lezen of als PDF Verkeerd document? Gratis ruilen 4,6 TrustPilot
logo-home
Tentamen (uitwerkingen)

ISYE 6501 - Midterm 2 Exam Questions And Answers Verified 100% Correct

Beoordeling
-
Verkocht
-
Pagina's
11
Cijfer
A+
Geüpload op
26-06-2025
Geschreven in
2024/2025

ISYE 6501 - Midterm 2 Exam Questions And Answers Verified 100% Correct When you can collect data quickly. When the data is representative and the amount of data is small compared to the whole population Do you have to decide the sample size ahead of time for A/B tests no, and we can run the hypothesis test anytime we want What is full factorial design you test every combination and then use ANOVA to determine importance of each factor What is fractional factorial design when you test a subset of the entire set of combinations What is a balanced design? You test each choice the same # of times and each pair of choices the same # of times When is regression effective work well to determine important factors? If there aren't significant interactions between the factors. what is exploration? focusing on getting more information; in this case, to determine with more certainty which ad is really the best what is exploitation we're focused on getting immediate value; in this example, to show the add that seems to be doing best so far, because it seems to be most likely to be clicked. what is the multi-armed bandit approach and how does it balance exploration and exploitation. We start with no info and have an equal probability of selecting each alternative. After performing some tests, we've gotten more information, so we can update the probabilities of each one being best and start assigning new tests according to those probabilities. We keep testing multiple alternatives; so, we're still doing exploration. But we make it more likely to pick the best ones so we're also doing exploitation What are some of the parameters in the multi-armed bandit approach number of tests between recalculating probabilities; how to update the probabilities; and how to pick an alternative to test based on probabilities and/or expected values. For updating we can use bayesian updates or estimate from the observed distribution What are common reasons that data sets are missing values? * a person accidentally types in the wrong value * a person did not want to reveal the true value * an automated system did not work correctly to record the value What are some examples of why there might be bias in missing data * Income: people with higher incomes are less likely to omit this answer * Radar gun: a car that passes the radar gun very slowly might be treated as an anomaly and its speed might be recorded in the system * Heart transplants: If there's a variable "date of death" it will be missing for patients still living and thus the missing data will naturally include more successful transplant cases What are three ways of dealing with missing data that don't require imputation discard the data, use categorical variables to indicate missing data, estimate missing values What are the pros and cons of throwing away missing data Pros: not potentially introducing errors; easy to implement Cons: don't want to lose to many data points; potential for censored or biased missing data What is the categorical variable approach If the data is categorical, we just add another category "missing". With quantitative variables you include interactions variables between the categorical variable and other variables. Why wouldn't you want to fill in missing quantitative variabes with 0 It can lead to problems if some types of data points are more likely than others to have missing data. The coefficients of the other variables might be pulled in one direction or another to try to account for the missing data What are the advantages and disadvantages of imputing missing data with the mean, median (numeric) or mode (categorical) Advantage: hedge against being too wrong and easy to compute Disadvantage: it can be biased imputation. Example people with high income less likely to answer survey and thus the mean/median will underestimate the missing value What are the advantages and disadvantages of using regression for imputation It reduces or eliminates the problem of bias. Also gives better values for missing data Disadvantages: we have to build, validate and test a whole other model just to fill in the missing data and then we have to do it all over again to get the answer we want. Also we are using the same data twice: once for imputation and a second time to fit the model How does adding variability to a regression imputation compare to one without without: more accurate on average but has less accurate variability with: it's less accurate on average but has more accurate variability When should you not use imputation? When more than 5% of the data is moving per factor what is the binomial distribution the probability of getting x successes out of n independent identically distributed Bernoulli (p) trials; count of successful coin flips in n trials What happens when n is big for binomial distribution it converges to normal distribution what is a Bernoulli distribution it's like a flipping coin. It can be used to model a single event and is most useful when we put many of them together what are some examples of a geometric distribution How many interviews until first job offer; how many hits until a baseball bat breaks what is a geometric distribution? How many Bernoulli trials until ...; It is the probability of having x Bernoulli(p) falures until first success or having Bernoulli(p) success until first failure

Meer zien Lees minder
Instelling
ISYE 6501 -
Vak
ISYE 6501 -

Voorbeeld van de inhoud

ISYE 6501 - Midterm 2 Exam Questions And Answers
Verified 100% Correct
When you can collect data quickly. When the data is representative and the amount of
data is small compared to the whole population

Do you have to decide the sample size ahead of time for A/B tests

no, and we can run the hypothesis test anytime we want What is

full factorial design

you test every combination and then use ANOVA to determine importance of each
factor

What is fractional factorial design when you test a

subset of the entire set of combinations What is a

balanced design?

You test each choice the same # of times and each pair of choices the same # of times




When is regression effective work well to determine important factors?

If there aren't significant interactions between the factors. what

is exploration?

focusing on getting more information; in this case, to determine with more certainty
which ad is really the best what is exploitation

we're focused on getting immediate value; in this example, to show the add that seems
to be doing best so far, because it seems to be most likely to be clicked.

what is the multi-armed bandit approach and how does it balance exploration and
exploitation.

We start with no info and have an equal probability of selecting each alternative. After
performing some tests, we've gotten more information, so we can update the

, probabilities of each one being best and start assigning new tests according to those
probabilities. We keep testing multiple alternatives; so, we're still doing exploration. But
we make it more likely to pick the best ones so we're also doing exploitation What are
some of the parameters in the multi-armed bandit approach

number of tests between recalculating probabilities; how to update the probabilities; and
how to pick an alternative to test based on probabilities and/or expected values. For
updating we can use bayesian updates or estimate from the observed distribution What
are common reasons that data sets are missing values?

* a person accidentally types in the wrong value
* a person did not want to reveal the true value
* an automated system did not work correctly to record the value

What are some examples of why there might be bias in missing data

* Income: people with higher incomes are less likely to omit this answer * Radar gun: a
car that passes the radar gun very slowly might be treated as an anomaly and its
speed might be recorded in the system
* Heart transplants: If there's a variable "date of death" it will be missing for patients still
living and thus the missing data will naturally include more successful transplant cases
What are three ways of dealing with missing data that don't require imputation discard
the data, use categorical variables to indicate missing data, estimate missing values

What are the pros and cons of throwing away missing data

Pros: not potentially introducing errors; easy to implement

Cons: don't want to lose to many data points; potential for censored or biased missing
data

What is the categorical variable approach

If the data is categorical, we just add another category "missing". With quantitative
variables you include interactions variables between the categorical variable and other
variables.

Why wouldn't you want to fill in missing quantitative variabes with 0

It can lead to problems if some types of data points are more likely than others to have
missing data. The coefficients of the other variables might be pulled in one direction or
another to try to account for the missing data

Geschreven voor

Instelling
ISYE 6501 -
Vak
ISYE 6501 -

Documentinformatie

Geüpload op
26 juni 2025
Aantal pagina's
11
Geschreven in
2024/2025
Type
Tentamen (uitwerkingen)
Bevat
Vragen en antwoorden

Onderwerpen

$10.99
Krijg toegang tot het volledige document:

Verkeerd document? Gratis ruilen Binnen 14 dagen na aankoop en voor het downloaden kun je een ander document kiezen. Je kunt het bedrag gewoon opnieuw besteden.
Geschreven door studenten die geslaagd zijn
Direct beschikbaar na je betaling
Online lezen of als PDF

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
TopGradeGuru Teachme2-tutor
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
15
Lid sinds
1 jaar
Aantal volgers
0
Documenten
2395
Laatst verkocht
2 maanden geleden
GRADEHUB

We provide access to a wide range of professionally curated exams for students and educators. It offers high-quality, up-to-date assessment materials tailored to various subjects and academic levels. With instant downloads and affordable pricing, it\'s the go-to resource for exam preparation and academic success.

1.5

2 beoordelingen

5
0
4
0
3
0
2
1
1
1

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Bezig met je bronvermelding?

Maak nauwkeurige citaten in APA, MLA en Harvard met onze gratis bronnengenerator.

Bezig met je bronvermelding?

Veelgestelde vragen