Geschreven door studenten die geslaagd zijn Direct beschikbaar na je betaling Online lezen of als PDF Verkeerd document? Gratis ruilen 4,6 TrustPilot
logo-home
Tentamen (uitwerkingen)

ISYE 6501 WEEK 1 HOMEWORK – SAMPLE SOLUTIONS

Beoordeling
-
Verkocht
-
Pagina's
9
Cijfer
A+
Geüpload op
15-03-2022
Geschreven in
2022/2023

The file credit_card_ contains a dataset with 654 data points, 6 continuous and 4 binary predictor variables. It has anonymized credit card applications with a binary response variable (last column) indicating if the application was positive or negative. The dataset is the “Credit Approval Data Set” from the UCI Machine Learning Repository ( 1. Using the support vector machine function ksvm contained in the R package kernlab, find a good classifier for this data. Show the equation of your classifier, and how well it classifies the data points in the full data set. (Don’t worry about test/validation data yet; we’ll cover that topic soon.) Notes on ksvm • You can use scaled=TRUE to get ksvm to scale the data as part of calculating a classifier. • The term λ we used in the SVM lesson to trade off the two components of correctness and margin is called C in ksvm. One of the challenges of this homework is to find a value of C that works well; for many values of C, almost all predictions will be “yes” or almost all predictions will be “no”. • ksvm does not directly return the coefficients a0 and a1...am. Instead, you need to do the last step of the calculation yourself. Here’s an example of the steps to take (assuming your data is stored in a matrix called data):1 # call ksvm. Vanilladot is a simple linear kernel. model - ksvm(x(data[,1:10]),r(data[,11]),type=”C- svc”,kernel=”vanilladot”,C=100,scaled=TRUE) # calculate a1...am # a - colSums(data[model@SVindex,1:10] * model@coef[[1]]) # for unscaled data a - colSums(data[model@xmatrix[[1]]] * model@coef[[1]]) # for scaled data a # calculate a0 a0 - – model@b a0 # see what the model predicts pred - predict(model,data[,1:10]) pred # see what fraction of the model’s predictions match the actual classification sum(pred == data[,11]) / nrow(data) SOLUTION: There are multiple possible answers. See file HW1-Q2-1.R for the R code for one answer. Please note that a good solution doesn’t have to try both of the possibilities in the code; they’re both shown to help you learn, but they’re not necessary. One possible linear classifier you can use, for scaled data z, is -0.8z1 - 0.8z2 - 0.7z3 + 0.3z4 + 1.1z5 - 0.2z6 + 0.5z7 - 0.1z8 - 0.8z9 + 0.5z10 + 0. = 0. It predicts 565 points (about 86.4%) correctly. (Note that this is its performance on the training data; as you saw in Module 3, that’s not a reliable estimate of its true predictive ability.) This quality of linear classifier can be found for a wide range of values of C (from 0.01 to 1000, and beyond). Using unscaled data, it’s a lot harder to find a C that does this well. It’s also possible to find a better nonlinear classifier using a different kernel; kudos to those of you who went even deeper and tried this! 2. Using the k-nearest-neighbors classification function kknn contained in the R kknn package, suggest a good value of k, and show how well it classifies that data points in the full data set. Don’t forget to scale the data (scale=TRUE in kknn).

Meer zien Lees minder
Instelling
Vak

Voorbeeld van de inhoud

ISYE 6501 WEEK 1 HOMEWORK – SAMPLE SOLUTIONS




IMPORTANT NOTE
These homework solutions show multiple approaches and some optional extensions for most of
the questions in the assignment. You don’t need to submit all this in your assignments; they’re
included here just to help you learn more – because remember, the main goal of the homework
assignments, and of the entire course, is to help you learn as much as you can, and develop
your analytics skills as much as possible!




Question 1

Describe a situation or problem from your job, everyday life, current events, etc., for which
a classification model would be appropriate. List some (up to 5) predictors that you might
use.

One possible answer:

Being students at Georgia Tech, the Teaching Assistants for the course suggested the following
example. A college admissions officer has a large pool of applicants must decide who will make
up the next incoming class. The applicants must be put into different categories – admit,
waitlist, and deny – so a classification model is appropriate. Some common factors used in
college admissions classification are high school GPA, rank in high school class, SAT and/or ACT
score, number of advanced placement courses taken, quality of written essay(s), quality of
letters of recommendation, and quantity and depth of extracurricular activities.

If the goal of the model was to automate a process to make decisions that are similar to those
made in the past, then previous admit/waitlist/deny decisions could be used as the response.
Alternatively, if the goal of the model was to make better admissions decisions, then a different

, measure could be used as the response – for example, if the goal is to maximize the academic
success of students, then each admitted student’s college GPA could be the response; if the
goal is to maximize the post-graduation success of admitted students, then some measure of
career success could be the response; etc.

Question 2

The file credit_card_data.txt contains a dataset with 654 data points, 6 continuous and 4 binary
predictor variables. It has anonymized credit card applications with a binary response variable
(last column) indicating if the application was positive or negative. The dataset is the “Credit
Approval Data Set” from the UCI Machine Learning Repository
(https://archive.ics.uci.edu/ml/datasets/Credit+Approval ) without the categorial variables and
without data points that have missing values.

1. Using the support vector machine function ksvm contained in the R package kernlab, find a
good classifier for this data. Show the equation of your classifier, and how well it classifies
the data points in the full data set. (Don’t worry about test/validation data yet; we’ll cover
that topic soon.)

Notes on ksvm

• You can use scaled=TRUE to get ksvm to scale the data as part of calculating a classifier.

• The term λ we used in the SVM lesson to trade off the two components of correctness and
margin is called C in ksvm. One of the challenges of this homework is to find a value of C
that works well; for many values of C, almost all predictions will be “yes” or almost all
predictions will be “no”.

• ksvm does not directly return the coefficients a0 and a1...am. Instead, you need to do the last
step of the calculation yourself. Here’s an example of the steps to take (assuming your data
is
1
stored in a matrix called data):

# call ksvm. Vanilladot is a simple linear kernel.
model <-
ksvm(as.matrix(data[,1:10]),as.factor(data[,11]),type=”C-
svc”,kernel=”vanilladot”,C=100,scaled=TRUE)
# calculate a1...am
# a <- colSums(data[model@SVindex,1:10] * model@coef[[1]]) # for unscaled
data a <- colSums(data[model@xmatrix[[1]]] * model@coef[[1]]) # for scaled data

Geschreven voor

Vak

Documentinformatie

Geüpload op
15 maart 2022
Aantal pagina's
9
Geschreven in
2022/2023
Type
Tentamen (uitwerkingen)
Bevat
Vragen en antwoorden

Onderwerpen

$7.99
Krijg toegang tot het volledige document:

Verkeerd document? Gratis ruilen Binnen 14 dagen na aankoop en voor het downloaden kun je een ander document kiezen. Je kunt het bedrag gewoon opnieuw besteden.
Geschreven door studenten die geslaagd zijn
Direct beschikbaar na je betaling
Online lezen of als PDF

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
NurseAmy Vrije Universiteit Amsterdam
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
17
Lid sinds
4 jaar
Aantal volgers
14
Documenten
569
Laatst verkocht
1 jaar geleden
Nurse Amy

Find us on Facebook for better discounts on all Ebooks. Let's chat.

3.7

3 beoordelingen

5
2
4
0
3
0
2
0
1
1

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Bezig met je bronvermelding?

Maak nauwkeurige citaten in APA, MLA en Harvard met onze gratis bronnengenerator.

Bezig met je bronvermelding?

Veelgestelde vragen