Tentamen (uitwerkingen)

NCTJ PA PAST EXAM QUESTIONS AND ANSWERS 2024 (GRADED A)

Beoordeling

Verkocht

Pagina's

Cijfer

A+

Geüpload op

22-04-2024

Geschreven in

2023/2024

NCTJ PA PAST EXAM QUESTIONS AND ANSWERS 2024 (GRADED A) Describe the steps for developing a stratified sample - 1) Identify the strata (the strata are defined by each commination of variables) 2) Draw a random sample from each stratum. Each sample size should be the same proportion of the total number of records in the stratum to ensure representativeness. 3) Combine all these samples to create a stratified sample. What are advantages and disadvantages of unstructured data - Advantage: Unstructured data includes information that cannot be stored in a tabular format. Can provide insights and qualitative information that cannot be included in a structured dataset. Disadvantage: Unstructured data often requires more complex methods to process for input into a predictive model. It can also be more timeconsuming and resource-intensive to analyze unstructured data. Describe two similarities and two differences between K-means clustering and hierarchical clustering. - Similarities:

Meer zien Lees minder

Instelling

Vak

Voorbeeld van de inhoud

NCTJ PA PAST EXAM QUESTIONS AND ANSWERS

2024 (GRADED A)

Describe the steps for developing a stratified sample - ✔✔1) Identify the

strata (the strata are defined by each commination of variables)

2) Draw a random sample from each stratum. Each sample size should be

the same proportion of the total number of records in the stratum to ensure

representativeness.

3) Combine all these samples to create a stratified sample.

What are advantages and disadvantages of unstructured data -

✔✔Advantage: Unstructured data includes information that cannot be

stored in a tabular format. Can provide insights and qualitative information

that cannot be included in a structured dataset.

Disadvantage: Unstructured data often requires more complex methods to

process for input into a predictive model. It can also be more time-

consuming and resource-intensive to analyze unstructured data.

Describe two similarities and two differences between K-means clustering

and hierarchical clustering. - ✔✔Similarities:

,- Both can be used to generate new features from multiple predictor

variables.

- Both are unsupervised learning techniques that group observations to

show structures and relationships in the data without reference to a target

variable.

Differences:

- K-means clustering requires preselecting k.

- Hierarchical clustering produces nested clusters

- K-means only considers dissimilarity among observations and does not

have a notion of dissimilarity among clusters.

Explain the tradeoff between selecting a value of K=2 and K=4 - ✔✔There

is a tradeoff between the percent of total variance explained by the cluster

vs. the complexity of the clustering model. K=2 explains a lower percentage

of total variance but represents a simpler model.

Issues with too many features in K-means analysis - ✔✔- Interpretability of

the signal may become more complex and less useful for a predictive

model where the features need to be interpreted.

- Outliers in any of the features should be considered. If the distance is too

great, they may be assigned their own cluster.

, - It becomes harder to differentiate between observations that are close

and those that are far apart.

What are the differences between a GLM and linear models on transformed

data. - ✔✔1. The normal linear regression has a log transformation

applied to the response variable, and the GLM does not. The log

transformation is reasonable for a variable that has right-skew. 2. The GLM

has flexibility to select a probability distribution that best fits the shape of

the response variable, whereas the normal linear regression model only

allows for one distribution. 3. In the normal linear model the variance of the

(transformed) response variable is constant while in the GLM the variance

can be a function of the mean.

How to identify heteroscedasticity - ✔✔Residuals vs. Fitted plot. Do the

mean of the residuals have an increasing trend as the prediction increases.

(funnel shape)

Interpret the Complexity Parameter table - ✔✔The complexity parameter

determines the threshold of improvement needed to produce an additional

split in the tree. This table lists the impact that changing the complexity

parameter value has on test metrics included the cross-validation error

(xerror). Cross-validation error measures how the model performs on

unseen data, which penalizes both underfit and overfit models.

Meld schending auteursrecht

Geschreven voor

Vak: NCTJ PA

Alle documenten voor dit vak (12)

Documentinformatie

Geüpload op: 22 april 2024
Aantal pagina's: 23
Geschreven in: 2023/2024
Type: Tentamen (uitwerkingen)
Bevat: Vragen en antwoorden

Onderwerpen

nctj pa past exam

$11.99

Krijg toegang tot het volledige document:

Geschreven door studenten die geslaagd zijn

Direct beschikbaar na je betaling

Online lezen of als PDF

Maak kennis met de verkoper

morren

Maak kennis met de verkoper

morren Teachme2-tutor

Bekijk profiel

Volgen

Verkocht

Lid sinds

3 jaar

Aantal volgers

Documenten

1278

Laatst verkocht

1 jaar geleden

0.0

0 beoordelingen

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper morren. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor $11.99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 51772 samenvattingen verkocht Opgericht in 2010, al 16 jaar dé plek om samenvattingen te kopen

NCTJ PA PAST EXAM QUESTIONS AND ANSWERS 2024 (GRADED A)

Voorbeeld van de inhoud

Geschreven voor

Documentinformatie

Onderwerpen

Maak kennis met de verkoper

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Niet tevreden? Kies een ander document

Betaal zoals je wilt, start meteen met leren

Bezig met je bronvermelding?

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?