Tentamen (uitwerkingen)

ALL ABOUT ISYE 6501 WEEK 8 COMPLETE SOLUTION

Beoordeling

Verkocht

Pagina's

146

Cijfer

A+

Geüpload op

30-04-2022

Geschreven in

2022/2023

ALL ABOUT ISYE 6501 WEEK 8 COMPLETE SOLUTION Question 11.1 Using the crime data set from Questions 8.2, 9.1, and 10.1, build a regression model using: 1. Stepwise regression 2. Lasso 3. Elastic net For Parts 2 and 3, remember to scale the data first – otherwise, the regression coefficients will be on different scales and the constraint won’t have the desired effect. For Parts 2 and 3, use the glmnet function in R. Notes on R: • For the elastic net model, what we called λ in the videos, glmnet calls “alpha”; you can get a range of results by varying alpha from 1 (lasso) to 0 (ridge regression) [and, of course, other values of alpha in between]. • In a function call like glmnet(x,y,family=”mgaussian”,alpha=1) the predictors x need to be in R’s matrix format, rather than data frame format. You can convert a data frame to a matrix using x – for example, x - x(data[,1:n-1]) • Rather than specifying a value of T, glmnet returns models for a variety of values of T.

Meer zien Lees minder

Instelling

Vak

Voorbeeld van de inhoud

ALL ABOUT ISYE 6501 WEEK 8 COMPLETE SOLUTION
Question 11.1

Using the crime data set uscrime.txtfrom Questions 8.2, 9.1, and 10.1, build a regression model using:
1. Stepwise regression
2. Lasso
3. Elastic net
For Parts 2 and 3, remember to scale the data first – otherwise, the regression coefficients will be on
different scales and the constraint won’t have the desired effect.

For Parts 2 and 3, use the glmnetfunction in R.Notes

on R:
• For the elastic net model, what we called λ in the videos, glmnetcalls “alpha”; you can get a range of
results by varying alpha from 1 (lasso) to 0 (ridge regression) [and, of course, other values of alpha in
between].
• In a function call like glmnet(x,y,family=”mgaussian”,alpha=1) the predictors xneed to be in R’s matrix
format, rather than data frame format. You can convert a data frame to a matrix using as.matrix – for
example, x <- as.matrix(data[,1:n-1])
• Rather than specifying a value of T, glmnetreturns models for a variety of values of T.

Data Analysis -

The uscrime dataset is has number of offences per 10k population, this is a continuous dataset with a
set of possible “predictors” –

#Variable Description
#M percentage of males aged 14–24 in total state population #So
indicator variable for a southern state
#Ed mean years of schooling of the population aged 25 years or over #Po1
per capita expenditure on police protection in 1960
#Po2 per capita expenditure on police protection in 1959
#LF labor force participation rate of civilian urban males in the age-group 14-24 #M.F
number of males per 100 females
#Pop state population in 1960 in hundred thousand
#NW percentage of nonwhites in the population #U1
unemployment rate of urban males 14–24 #U2
unemployment rate of urban males 35–39
#Wealth wealth: median value of transferable assets or family income
#Ineq income inequality: percentage of families earning below half the median income #Prob
probability of imprisonment: ratio of number of commitments to number of offenses #Time average
time in months served by offenders in state prisons before their first release #Crime crime
rate: number of offenses per 100,000 population in 1960

First, to understand more about the data, after loading it into a table, I looked at the data summary, looked at the
box plot to check any possible outliers. Although I have not removed any data point from

,the set for this assignment’s purpose, I performed the test mostly for discovery, Crime values 1969 1674 1993
showed up at the highest 3 values outside the whiskers of the boxplot, using the grubbds test we possibly could
remove these outliers, but I skipped this step.

Later looked at the correlation matrix to check if any pair of variables are corelated to each other or not. I found that
there is a strong linear correlation between Po1 and Po2 with correlation coeff = .99. Also, the Wealth and Ineq has
a -ve correlation coeff -0.88 and they seem to be very closely negatively correlated.

I also checked the scatter plots of predictors against Crime to have visual idea of the correlations, which showed that
all of them might not be significant for out model.

,I. Stepwise regression –

The underlying assumption of stepwise regression is that the predictor variables are not very highly corelated. In
each step of the process a variable is added or subtracted from the set of predictors. If we start with o predictors and
keep adding, it’s a forward addition and if we start with all and keep removing variables, it’s a backwards selection.
In the R code I have performed a backward method for factor selection on the scaled data (except for column so).
This process showed that there could be 8 factors.

Step: AIC=503.93
.outcome ~ M + Ed + Po1 + M.F + U1 + U2 + Ineq +

Prob Df Sum of Sq RSS AIC
<none> 1453068 503.93
- M.F 1 103159 1556227 505.16
- U1 1 127044 1580112 505.87
- Prob 1 247978 1701046 509.34
- U2 1 255443 1708511 509.55
-M 1 296790 1749858 510.67
-- Ineq
Ed 11 445788 1898855521.24
738244 2191312 514.51
- Po1 1 1672038 3125105 537.93

In the next step I used these 8 variables to build a regression model to check if they are indeed significant. In this
step, the adjusted R2 was .74, but not all factors were significant. I repeated this step twice and removed M.F and
U1 from the initial selection of predictors and used cross validation to evaluate the final model. This time using 6
factors the R2 was .66, not very lower than the initial suggestion of a model with 8 variables.

II. Lasso –

Least absolute shrinkage and selection operator is a shrinkage and selection method for linear regression. It
minimizes the usual sum of squared errors, with a bound on the sum of the absolute values of the coefficients.
For our purpose of predictor selection with lasso, I used –

lasso=cv.glmnet(x=as.matrix(uscrime_scaled[,-16]), y=as.matrix(uscrime_scaled$Crime), alpha=1, nfolds = 5,
type.measure="mse", family="gaussian")

I plotted the Cross-validated MSE vs lambda as well as number of predictor variables vs the lambda values. Then I
found the lamda value with smallest cvm and finally looked at the minimum lambd valuesfor each of the predictors.
This process showed me that there are 10 possible variables that might be significant and hence using these I created
my first model with a R2 value of .74.

#fit a model with the variables with coefficients

, mod_lasso = lm(Crime ~So+M+Ed+Po1+LF+M.F+NW+U2+Ineq+Prob, data =
uscrime_scaled) summary(mod_lasso)

But not all the predictors were significant enough, so I had to recreate the model only with the following and this
time the adjusted R2 was .73

#remove factor which p>0.05
mod_lasso_2 = lm(Crime ~M+ Ed+ Po1+ U2+ Ineq+Prob, data = uscrime_scaled)
summary(mod_lasso_2)

Before removing the 4 variables, the R2 from cross validation was .58, whereas after removing the 4 non-
significant factors it went up to .64.

III. Elastic Net –

The elastic net is a regularized regression method that linearly combines the L1 and L2 penalties of
the lasso and ridge methods. For our analysis, I have run a loop for check with alpha values between 0 -1 in steps of
.1 and noted the R2 s of each. The best value of alpha was .9 and applying the same the cv.glmnet method calculated
the coefficients for each variable. The method gave the following 9 predictors, and I checked the R2 using them in a
regression model to be .72 but with a number of non- significant coefficients: The R2 using all these 9 predictors
after applying cross validation came to only 0.485607.

#use the predictors from the process
mod_Elastic_net = lm(Crime ~So+M+Ed+Po1+M.F+Pop+NW+U1+U2+Wealth+Ineq+Prob, data
=uscrime_scaled)
summary(mod_Elastic_net)

Comparison –

Based on the limited data we had, stepwise regression gave us the least number of predictors with a good value of
R2 and adjusted R square. Although even after applying the stepwise regression, we needed to discard some of
the variables based on the P values, still it did a better job that other two method, where elastic net chose 9
variables and lasso chose 10 to start with.

R code -

rm(list = ls())
set.seed(15)

library(MASS)
library(glmnet)

## Loading required package: Matrix

Meld schending auteursrecht

Geschreven voor

Instelling: Georgia Institute Of Technology
Vak: ISYE 6501

Alle documenten voor dit vak (282)

Documentinformatie

Geüpload op: 30 april 2022
Aantal pagina's: 146
Geschreven in: 2022/2023
Type: Tentamen (uitwerkingen)
Bevat: Vragen en antwoorden

Onderwerpen

isye 6501
all about isye 6501 week 8 complete solution

$15.99

Krijg toegang tot het volledige document:

Geschreven door studenten die geslaagd zijn

Direct beschikbaar na je betaling

Online lezen of als PDF

Maak kennis met de verkoper

smartzone

3.6

(610)

Maak kennis met de verkoper

smartzone Liberty University

Bekijk profiel

Volgen

Verkocht

3369

Lid sinds

6 jaar

Aantal volgers

2295

Documenten

14608

Laatst verkocht

19 uur geleden

AMAIZING EDUCATION WORLD

GET ALL KIND OF EXAMS ON THIS PAGE ,COMPLETE TEST BANKS,SUMMARIES,STUDY GUIDES,PROJECT PAPERS,ASSIGNMENTS,CASE STUDIES, YOU CAN ALSO COMMUNICATE WITH THE SELLER FOR ANY PRE-ORDER,ORDER AND ETC.

3.6

610 beoordelingen

271

106

105

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper smartzone. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor $15.99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 51658 samenvattingen verkocht Opgericht in 2010, al 16 jaar dé plek om samenvattingen te kopen

ALL ABOUT ISYE 6501 WEEK 8 COMPLETE SOLUTION

Voorbeeld van de inhoud

Geschreven voor

Documentinformatie

Onderwerpen

Maak kennis met de verkoper

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Niet tevreden? Kies een ander document

Betaal zoals je wilt, start meteen met leren

Bezig met je bronvermelding?

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?