Geschreven door studenten die geslaagd zijn Direct beschikbaar na je betaling Online lezen of als PDF Verkeerd document? Gratis ruilen 4,6 TrustPilot
logo-home
Tentamen (uitwerkingen)

NCTJ PA PAST EXAM QUESTIONS AND ANSWERS 2024 (GRADED A)

Beoordeling
-
Verkocht
-
Pagina's
23
Cijfer
A+
Geüpload op
22-04-2024
Geschreven in
2023/2024

NCTJ PA PAST EXAM QUESTIONS AND ANSWERS 2024 (GRADED A) Describe the steps for developing a stratified sample - 1) Identify the strata (the strata are defined by each commination of variables) 2) Draw a random sample from each stratum. Each sample size should be the same proportion of the total number of records in the stratum to ensure representativeness. 3) Combine all these samples to create a stratified sample. What are advantages and disadvantages of unstructured data - Advantage: Unstructured data includes information that cannot be stored in a tabular format. Can provide insights and qualitative information that cannot be included in a structured dataset. Disadvantage: Unstructured data often requires more complex methods to process for input into a predictive model. It can also be more timeconsuming and resource-intensive to analyze unstructured data. Describe two similarities and two differences between K-means clustering and hierarchical clustering. - Similarities:

Meer zien Lees minder
Instelling
Vak

Voorbeeld van de inhoud

NCTJ PA PAST EXAM QUESTIONS AND ANSWERS

2024 (GRADED A)




Describe the steps for developing a stratified sample - ✔✔1) Identify the

strata (the strata are defined by each commination of variables)

2) Draw a random sample from each stratum. Each sample size should be

the same proportion of the total number of records in the stratum to ensure

representativeness.

3) Combine all these samples to create a stratified sample.

What are advantages and disadvantages of unstructured data -

✔✔Advantage: Unstructured data includes information that cannot be

stored in a tabular format. Can provide insights and qualitative information

that cannot be included in a structured dataset.

Disadvantage: Unstructured data often requires more complex methods to

process for input into a predictive model. It can also be more time-

consuming and resource-intensive to analyze unstructured data.

Describe two similarities and two differences between K-means clustering

and hierarchical clustering. - ✔✔Similarities:

,- Both can be used to generate new features from multiple predictor

variables.

- Both are unsupervised learning techniques that group observations to

show structures and relationships in the data without reference to a target

variable.

Differences:

- K-means clustering requires preselecting k.

- Hierarchical clustering produces nested clusters

- K-means only considers dissimilarity among observations and does not

have a notion of dissimilarity among clusters.

Explain the tradeoff between selecting a value of K=2 and K=4 - ✔✔There

is a tradeoff between the percent of total variance explained by the cluster

vs. the complexity of the clustering model. K=2 explains a lower percentage

of total variance but represents a simpler model.

Issues with too many features in K-means analysis - ✔✔- Interpretability of

the signal may become more complex and less useful for a predictive

model where the features need to be interpreted.

- Outliers in any of the features should be considered. If the distance is too

great, they may be assigned their own cluster.

, - It becomes harder to differentiate between observations that are close

and those that are far apart.

What are the differences between a GLM and linear models on transformed

data. - ✔✔1. The normal linear regression has a log transformation

applied to the response variable, and the GLM does not. The log

transformation is reasonable for a variable that has right-skew. 2. The GLM

has flexibility to select a probability distribution that best fits the shape of

the response variable, whereas the normal linear regression model only

allows for one distribution. 3. In the normal linear model the variance of the

(transformed) response variable is constant while in the GLM the variance

can be a function of the mean.

How to identify heteroscedasticity - ✔✔Residuals vs. Fitted plot. Do the

mean of the residuals have an increasing trend as the prediction increases.

(funnel shape)

Interpret the Complexity Parameter table - ✔✔The complexity parameter

determines the threshold of improvement needed to produce an additional

split in the tree. This table lists the impact that changing the complexity

parameter value has on test metrics included the cross-validation error

(xerror). Cross-validation error measures how the model performs on

unseen data, which penalizes both underfit and overfit models.

Geschreven voor

Vak

Documentinformatie

Geüpload op
22 april 2024
Aantal pagina's
23
Geschreven in
2023/2024
Type
Tentamen (uitwerkingen)
Bevat
Vragen en antwoorden

Onderwerpen

$11.99
Krijg toegang tot het volledige document:

Verkeerd document? Gratis ruilen Binnen 14 dagen na aankoop en voor het downloaden kun je een ander document kiezen. Je kunt het bedrag gewoon opnieuw besteden.
Geschreven door studenten die geslaagd zijn
Direct beschikbaar na je betaling
Online lezen of als PDF

Maak kennis met de verkoper
Seller avatar
morren

Maak kennis met de verkoper

Seller avatar
morren Teachme2-tutor
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
1
Lid sinds
3 jaar
Aantal volgers
2
Documenten
1278
Laatst verkocht
1 jaar geleden

0.0

0 beoordelingen

5
0
4
0
3
0
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Bezig met je bronvermelding?

Maak nauwkeurige citaten in APA, MLA en Harvard met onze gratis bronnengenerator.

Bezig met je bronvermelding?

Veelgestelde vragen