Tentamen (uitwerkingen)

A+ GRADE-INTRODUCTION TO DATA SCIENCE QUESTIONS AND ANSWERS FOR EXAM PREP. 2024/2025 UPDATE.

Beoordeling

Verkocht

Pagina's

Cijfer

A+

Geüpload op

02-09-2024

Geschreven in

2024/2025

A+ GRADE-INTRODUCTION TO DATA SCIENCE QUESTIONS AND ANSWERS FOR EXAM PREP. 2024/2025 UPDATE.

Instelling

Vak

Voorbeeld van de inhoud

COMMONLY ASKED QUESTIONS FOR DATA SCIENCE.
ANSWERS RESEARCHED AND PROVIDED. 2024/2025
UPDATE

1. What is type 1 error and type 2 error? Falsely concluding that
intervention was successful. Known as false positive result
Falsely concluding intervention was not successful. Known a false
negative
2. What can we do about overfitting? > Regularization (penalizing
model complexity while we're training)
> L2 regularization penalizes really big weights - complexity(model) =
sum of squares of weights
> Regularization is about instead of minimizing only loss, its
minimizing loss + complexity which is called structural risk
minimization
3. Describe true positive, false positive, false negative, true negative:
True
Positives - we correctly called wolf; the town is saved.
> False positive - we called wolf falsely, the town is mad
> False negative - There was a wolf but we didn't spot it. Chickens
are eaten.
> True negative - no wolf, no alarm. All is well.
4. What is precision? True Positive / (True Positive + False Positive)
When you classify something as positive, how often are you right?
5. What is recall? True positive / (True positive + False Negative)
When you classify something as positive, how many times did you
fail to recall something as actually positive?
6. What is an ROC curve? A graph showing the performance of a
classification model at all classification thresholds. The curve plots
two parameters true positive rate (recall) & true negative rate,
also called Specificity (true negative / (true negative + false
positive)) along the axis from 0 to 1

,COMMONLY ASKED QUESTIONS FOR DATA SCIENCE.
ANSWERS RESEARCHED AND PROVIDED. 2024/2025
UPDATE

i.e. T PR on the y axis, and FPR on the x axis
7. What is false positive rate? (false positive / (false positive + true
negative))

8. What is the bias? An error from erroneous assumptions in the
learning algorithm. High bias can cause an algorithm to miss the relevant
relations between features and target outputs (underfitting).

The effect on the model because the sample systematically
misrepresents the 'real' data. Most datasets are a convenience
sample - the data easiest to collect
9. What is variance? An error from sensitivity to small fluctuations in
the training set. High variance can cause an algorithm to model the
random noise in the training data, rather than the intended outputs
(overfitting).
The effect on the model because it was built from this sample rather
than that sample
variance measures how inconsistent are the predictions from one
another
10. What is skewness? Asymmetry in a statistical distribution, in which
the curve appears distorted or skewed either to the left or to the right.
Skewness can be quantified to define the extent to which a distribution
differs from a normal distribution.
This is called negative skewness (tail goes towards negative
11 . What is kurtosis?: The sharpness of the peak of a frequency-
distribution curve.

, COMMONLY ASKED QUESTIONS FOR DATA SCIENCE.
ANSWERS RESEARCHED AND PROVIDED. 2024/2025
UPDATE

12. What are the different ways to handle missing values?
1 . Delete the entire row/column
2 Replace by a fixed value (i.e. "unknown")

3 General statistic replacement (replace values by a statistic
associated with a particular column like mean or median)
4 Grouped statistic replacement (replace values by a statistic
associated with a
particular group)
5 Imputation - predict values based on nearest neighbors or
likelihood
13. What kind of feature transformation can you perform on
numeric?
1. Round numeric to the nearest decimal or you can turn it into
discrete for turning it into a categorical later
2. Discretization: binning of a variable to become categorical for
better value management
3. Scaling (change the sale of the variable for better
understanding), i.e. min-max, z-score, etc.
14. What are some types of discretization methods? 1. Equal-width
binning (bins have equal ranges, roughly same distribution as
original variable
2. equal-density (frequency) binning - bins have equal number of
examples/records/rows with a uniform distribution
15. What are the 5 categories of feature generation? 1. Indicator
features (Attributes that isolate key information)

Meld schending auteursrecht

Geschreven voor

Vak: DATA SCIENCE

Alle documenten voor dit vak (213)

Documentinformatie

Geüpload op: 2 september 2024
Aantal pagina's: 17
Geschreven in: 2024/2025
Type: Tentamen (uitwerkingen)
Bevat: Vragen en antwoorden

Onderwerpen

data science

$7.49

Krijg toegang tot het volledige document:

Geschreven door studenten die geslaagd zijn

Direct beschikbaar na je betaling

Online lezen of als PDF

Maak kennis met de verkoper

TopRevision

Maak kennis met de verkoper

TopRevision University Of California - Los Angeles (UCLA)

Bekijk profiel

Volgen

Verkocht

Lid sinds

1 jaar

Aantal volgers

Documenten

154

Laatst verkocht

Top Revision Material

I provide students with easy to grasp and up to date examination materials with complete and well researched answers to guide through revision.

0.0

0 beoordelingen

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper TopRevision. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor $7.49. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 50056 samenvattingen verkocht Opgericht in 2010, al 16 jaar dé plek om samenvattingen te kopen

A+ GRADE-INTRODUCTION TO DATA SCIENCE QUESTIONS AND ANSWERS FOR EXAM PREP. 2024/2025 UPDATE.

Voorbeeld van de inhoud

Geschreven voor

Documentinformatie

Onderwerpen

Maak kennis met de verkoper

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Niet tevreden? Kies een ander document

Betaal zoals je wilt, start meteen met leren

Bezig met je bronvermelding?

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?