Geschreven door studenten die geslaagd zijn Direct beschikbaar na je betaling Online lezen of als PDF Verkeerd document? Gratis ruilen 4,6 TrustPilot
logo-home
Tentamen (uitwerkingen)

ALL ABOUT ISYE 6501 WEEK 8 COMPLETE SOLUTION

Beoordeling
-
Verkocht
-
Pagina's
146
Cijfer
A+
Geüpload op
30-04-2022
Geschreven in
2022/2023

ALL ABOUT ISYE 6501 WEEK 8 COMPLETE SOLUTION Question 11.1 Using the crime data set from Questions 8.2, 9.1, and 10.1, build a regression model using: 1. Stepwise regression 2. Lasso 3. Elastic net For Parts 2 and 3, remember to scale the data first – otherwise, the regression coefficients will be on different scales and the constraint won’t have the desired effect. For Parts 2 and 3, use the glmnet function in R. Notes on R: • For the elastic net model, what we called λ in the videos, glmnet calls “alpha”; you can get a range of results by varying alpha from 1 (lasso) to 0 (ridge regression) [and, of course, other values of alpha in between]. • In a function call like glmnet(x,y,family=”mgaussian”,alpha=1) the predictors x need to be in R’s matrix format, rather than data frame format. You can convert a data frame to a matrix using x – for example, x - x(data[,1:n-1]) • Rather than specifying a value of T, glmnet returns models for a variety of values of T.

Meer zien Lees minder
Instelling
Vak

Voorbeeld van de inhoud

ALL ABOUT ISYE 6501 WEEK 8 COMPLETE SOLUTION
Question 11.1

Using the crime data set uscrime.txtfrom Questions 8.2, 9.1, and 10.1, build a regression model using:
1. Stepwise regression
2. Lasso
3. Elastic net
For Parts 2 and 3, remember to scale the data first – otherwise, the regression coefficients will be on
different scales and the constraint won’t have the desired effect.

For Parts 2 and 3, use the glmnetfunction in R.Notes

on R:
• For the elastic net model, what we called λ in the videos, glmnetcalls “alpha”; you can get a range of
results by varying alpha from 1 (lasso) to 0 (ridge regression) [and, of course, other values of alpha in
between].
• In a function call like glmnet(x,y,family=”mgaussian”,alpha=1) the predictors xneed to be in R’s matrix
format, rather than data frame format. You can convert a data frame to a matrix using as.matrix – for
example, x <- as.matrix(data[,1:n-1])
• Rather than specifying a value of T, glmnetreturns models for a variety of values of T.


Data Analysis -

The uscrime dataset is has number of offences per 10k population, this is a continuous dataset with a
set of possible “predictors” –

#Variable Description
#M percentage of males aged 14–24 in total state population #So
indicator variable for a southern state
#Ed mean years of schooling of the population aged 25 years or over #Po1
per capita expenditure on police protection in 1960
#Po2 per capita expenditure on police protection in 1959
#LF labor force participation rate of civilian urban males in the age-group 14-24 #M.F
number of males per 100 females
#Pop state population in 1960 in hundred thousand
#NW percentage of nonwhites in the population #U1
unemployment rate of urban males 14–24 #U2
unemployment rate of urban males 35–39
#Wealth wealth: median value of transferable assets or family income
#Ineq income inequality: percentage of families earning below half the median income #Prob
probability of imprisonment: ratio of number of commitments to number of offenses #Time average
time in months served by offenders in state prisons before their first release #Crime crime
rate: number of offenses per 100,000 population in 1960

First, to understand more about the data, after loading it into a table, I looked at the data summary, looked at the
box plot to check any possible outliers. Although I have not removed any data point from

,the set for this assignment’s purpose, I performed the test mostly for discovery, Crime values 1969 1674 1993
showed up at the highest 3 values outside the whiskers of the boxplot, using the grubbds test we possibly could
remove these outliers, but I skipped this step.

Later looked at the correlation matrix to check if any pair of variables are corelated to each other or not. I found that
there is a strong linear correlation between Po1 and Po2 with correlation coeff = .99. Also, the Wealth and Ineq has
a -ve correlation coeff -0.88 and they seem to be very closely negatively correlated.

I also checked the scatter plots of predictors against Crime to have visual idea of the correlations, which showed that
all of them might not be significant for out model.

,I. Stepwise regression –

The underlying assumption of stepwise regression is that the predictor variables are not very highly corelated. In
each step of the process a variable is added or subtracted from the set of predictors. If we start with o predictors and
keep adding, it’s a forward addition and if we start with all and keep removing variables, it’s a backwards selection.
In the R code I have performed a backward method for factor selection on the scaled data (except for column so).
This process showed that there could be 8 factors.

Step: AIC=503.93
.outcome ~ M + Ed + Po1 + M.F + U1 + U2 + Ineq +

Prob Df Sum of Sq RSS AIC
<none> 1453068 503.93
- M.F 1 103159 1556227 505.16
- U1 1 127044 1580112 505.87
- Prob 1 247978 1701046 509.34
- U2 1 255443 1708511 509.55
-M 1 296790 1749858 510.67
-- Ineq
Ed 11 445788 1898855521.24
738244 2191312 514.51
- Po1 1 1672038 3125105 537.93

In the next step I used these 8 variables to build a regression model to check if they are indeed significant. In this
step, the adjusted R2 was .74, but not all factors were significant. I repeated this step twice and removed M.F and
U1 from the initial selection of predictors and used cross validation to evaluate the final model. This time using 6
factors the R2 was .66, not very lower than the initial suggestion of a model with 8 variables.


II. Lasso –

Least absolute shrinkage and selection operator is a shrinkage and selection method for linear regression. It
minimizes the usual sum of squared errors, with a bound on the sum of the absolute values of the coefficients.
For our purpose of predictor selection with lasso, I used –

lasso=cv.glmnet(x=as.matrix(uscrime_scaled[,-16]), y=as.matrix(uscrime_scaled$Crime), alpha=1, nfolds = 5,
type.measure="mse", family="gaussian")

I plotted the Cross-validated MSE vs lambda as well as number of predictor variables vs the lambda values. Then I
found the lamda value with smallest cvm and finally looked at the minimum lambd valuesfor each of the predictors.
This process showed me that there are 10 possible variables that might be significant and hence using these I created
my first model with a R2 value of .74.

#fit a model with the variables with coefficients

, mod_lasso = lm(Crime ~So+M+Ed+Po1+LF+M.F+NW+U2+Ineq+Prob, data =
uscrime_scaled) summary(mod_lasso)

But not all the predictors were significant enough, so I had to recreate the model only with the following and this
time the adjusted R2 was .73

#remove factor which p>0.05
mod_lasso_2 = lm(Crime ~M+ Ed+ Po1+ U2+ Ineq+Prob, data = uscrime_scaled)
summary(mod_lasso_2)

Before removing the 4 variables, the R2 from cross validation was .58, whereas after removing the 4 non-
significant factors it went up to .64.


III. Elastic Net –

The elastic net is a regularized regression method that linearly combines the L1 and L2 penalties of
the lasso and ridge methods. For our analysis, I have run a loop for check with alpha values between 0 -1 in steps of
.1 and noted the R2 s of each. The best value of alpha was .9 and applying the same the cv.glmnet method calculated
the coefficients for each variable. The method gave the following 9 predictors, and I checked the R2 using them in a
regression model to be .72 but with a number of non- significant coefficients: The R2 using all these 9 predictors
after applying cross validation came to only 0.485607.

#use the predictors from the process
mod_Elastic_net = lm(Crime ~So+M+Ed+Po1+M.F+Pop+NW+U1+U2+Wealth+Ineq+Prob, data
=uscrime_scaled)
summary(mod_Elastic_net)

Comparison –

Based on the limited data we had, stepwise regression gave us the least number of predictors with a good value of
R2 and adjusted R square. Although even after applying the stepwise regression, we needed to discard some of
the variables based on the P values, still it did a better job that other two method, where elastic net chose 9
variables and lasso chose 10 to start with.


R code -


rm(list = ls())
set.seed(15)

library(MASS)
library(glmnet)

## Loading required package: Matrix

Geschreven voor

Instelling
Vak

Documentinformatie

Geüpload op
30 april 2022
Aantal pagina's
146
Geschreven in
2022/2023
Type
Tentamen (uitwerkingen)
Bevat
Vragen en antwoorden

Onderwerpen

$15.99
Krijg toegang tot het volledige document:

Verkeerd document? Gratis ruilen Binnen 14 dagen na aankoop en voor het downloaden kun je een ander document kiezen. Je kunt het bedrag gewoon opnieuw besteden.
Geschreven door studenten die geslaagd zijn
Direct beschikbaar na je betaling
Online lezen of als PDF

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
smartzone Liberty University
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
3369
Lid sinds
6 jaar
Aantal volgers
2295
Documenten
14608
Laatst verkocht
19 uur geleden
AMAIZING EDUCATION WORLD

GET ALL KIND OF EXAMS ON THIS PAGE ,COMPLETE TEST BANKS,SUMMARIES,STUDY GUIDES,PROJECT PAPERS,ASSIGNMENTS,CASE STUDIES, YOU CAN ALSO COMMUNICATE WITH THE SELLER FOR ANY PRE-ORDER,ORDER AND ETC.

3.6

610 beoordelingen

5
271
4
96
3
106
2
32
1
105

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Bezig met je bronvermelding?

Maak nauwkeurige citaten in APA, MLA en Harvard met onze gratis bronnengenerator.

Bezig met je bronvermelding?

Veelgestelde vragen