ISYE 6414 FINAL EXAM (REAL EXAM) QUESTIONS
AND ANSWERS 2022-2024/ GRADED A | VERSION 2
Instructions
This R Markdown file includes the questions, the empty code chunk sections for your code, and the text
blocks for your responses. Answer the questions below by completing this R Markdown file. You must answer
the questions using this file. You can change the format from pdf to Word or html and make other slight
adjustments to get the file to knit but otherwise keep the formatting the same. Once you’ve finished answering
the questions, submit your responses in a single knitted file (just like the homework peer assessments).
There are 3 sections. Partial credit may be given if your code is correct but your conclusion is incorrect or
vice versa.
Next Steps:
1. Save this .Rmd file in your R working directory - the same directory where you will download the
heart.csv data file into. Having both files in the same directory will help in reading the .csv file.
2. Read the question and create the R code necessary within the code chunk section immediately below
each question. Knitting this file will generate the output and insert it into the section below the code
chunk.
3. Type your code and/or answer(s) to the questions in the text block provided immediately after the
question prompt.
4. We recommend knitting the file often not only at the end of the exam to avoid working through knitting
problems right before the exam submission. We will apply a 10% grade reduction if you will not submit
the knitted file. We will also apply 20% grade reduction if you don’t submit the file via Canvas.
5. Submit the knitted file on Canvas.
Example Question Format:
(8a) This will be the exam question - each question is already copied from Canvas and inserted into individual
text blocks below, you do not need to copy/paste the questions from the online Canvas exam.
Response to question (8a):
# Example code chunk area. Enter your code below the comment and
# between the ```{r} and ```
This is the section where you type your written answers to the question. Depending on the question asked,
your typed response may be a number, a list of variables, a few sentences, or a combination of these elements.
Ready? Let’s begin. We wish you the best of luck!
Final Exam Part 2 - Data Set Background
For this exam, you will be building a logistic regression model to predict if an individual has heart disease,
and you will be building a standard linear regression model to predict resting blood pressure.
The heart.csv data set consists of the following 10 variables:
1
, 1. age: age in years
2. sex: (M, F)
3. cp: chest pain type
4. trestbps: resting blood pressure (in mm Hg on admission to the hospital)
5. chol: serum cholestoral in mg/dl
6. fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
7. restecg: resting electrocardiographic results (3 levels)
8. thalach: maximum heart rate achieved
9. exang: exercise induced angina (1 = yes; 0 = no)
10. oldpeak: ST depression induced by exercise relative to rest
11. slope: the slope of the peak exercise ST segment
12. ca: number of major vessels (0-3) colored by flourosopy
13. target: have disease or not (1=yes, 0=no)
Read the data and answer the questions below.
Read Data
# Import the libraries
library(car)
## Loading required package: carData
library(glmnet)
## Loading required package: Matrix
## Loaded glmnet 3.0-2
# Ensure that the sampling type is correct
RNGkind(sample.kind="Rejection")
# Read the data
data = read.csv('heart.csv', header= TRUE)
# Create a dummy variable for males
data$sexM = ifelse(data$sex=='M', 1, 0)
data$sex = NULL
# Split into training and testing sets
set.seed(6414)
smp_siz = floor(0.75*nrow(data))
train_ind = sample(seq_len(nrow(data)),size = smp_siz)
train = data[train_ind,]
test = data[-train_ind,]
1: 19pts - Classification
For this section you will be building logistic regression models that classify whether an individual has heart
disease. For GLMs, we need replications in order to perform residual analysis. The following code aggregates
the training data using a subset of the predictors to ensure that we have replications.
# Aggregate the training data
train.agg.n = aggregate(target~sexM+exang+slope+ca,data=train,FUN=length)
train.agg.y = aggregate(target~sexM+exang+slope+ca,data=train,FUN=sum)
train.agg = cbind(train.agg.y,total=train.agg.n$target)
2
AND ANSWERS 2022-2024/ GRADED A | VERSION 2
Instructions
This R Markdown file includes the questions, the empty code chunk sections for your code, and the text
blocks for your responses. Answer the questions below by completing this R Markdown file. You must answer
the questions using this file. You can change the format from pdf to Word or html and make other slight
adjustments to get the file to knit but otherwise keep the formatting the same. Once you’ve finished answering
the questions, submit your responses in a single knitted file (just like the homework peer assessments).
There are 3 sections. Partial credit may be given if your code is correct but your conclusion is incorrect or
vice versa.
Next Steps:
1. Save this .Rmd file in your R working directory - the same directory where you will download the
heart.csv data file into. Having both files in the same directory will help in reading the .csv file.
2. Read the question and create the R code necessary within the code chunk section immediately below
each question. Knitting this file will generate the output and insert it into the section below the code
chunk.
3. Type your code and/or answer(s) to the questions in the text block provided immediately after the
question prompt.
4. We recommend knitting the file often not only at the end of the exam to avoid working through knitting
problems right before the exam submission. We will apply a 10% grade reduction if you will not submit
the knitted file. We will also apply 20% grade reduction if you don’t submit the file via Canvas.
5. Submit the knitted file on Canvas.
Example Question Format:
(8a) This will be the exam question - each question is already copied from Canvas and inserted into individual
text blocks below, you do not need to copy/paste the questions from the online Canvas exam.
Response to question (8a):
# Example code chunk area. Enter your code below the comment and
# between the ```{r} and ```
This is the section where you type your written answers to the question. Depending on the question asked,
your typed response may be a number, a list of variables, a few sentences, or a combination of these elements.
Ready? Let’s begin. We wish you the best of luck!
Final Exam Part 2 - Data Set Background
For this exam, you will be building a logistic regression model to predict if an individual has heart disease,
and you will be building a standard linear regression model to predict resting blood pressure.
The heart.csv data set consists of the following 10 variables:
1
, 1. age: age in years
2. sex: (M, F)
3. cp: chest pain type
4. trestbps: resting blood pressure (in mm Hg on admission to the hospital)
5. chol: serum cholestoral in mg/dl
6. fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
7. restecg: resting electrocardiographic results (3 levels)
8. thalach: maximum heart rate achieved
9. exang: exercise induced angina (1 = yes; 0 = no)
10. oldpeak: ST depression induced by exercise relative to rest
11. slope: the slope of the peak exercise ST segment
12. ca: number of major vessels (0-3) colored by flourosopy
13. target: have disease or not (1=yes, 0=no)
Read the data and answer the questions below.
Read Data
# Import the libraries
library(car)
## Loading required package: carData
library(glmnet)
## Loading required package: Matrix
## Loaded glmnet 3.0-2
# Ensure that the sampling type is correct
RNGkind(sample.kind="Rejection")
# Read the data
data = read.csv('heart.csv', header= TRUE)
# Create a dummy variable for males
data$sexM = ifelse(data$sex=='M', 1, 0)
data$sex = NULL
# Split into training and testing sets
set.seed(6414)
smp_siz = floor(0.75*nrow(data))
train_ind = sample(seq_len(nrow(data)),size = smp_siz)
train = data[train_ind,]
test = data[-train_ind,]
1: 19pts - Classification
For this section you will be building logistic regression models that classify whether an individual has heart
disease. For GLMs, we need replications in order to perform residual analysis. The following code aggregates
the training data using a subset of the predictors to ensure that we have replications.
# Aggregate the training data
train.agg.n = aggregate(target~sexM+exang+slope+ca,data=train,FUN=length)
train.agg.y = aggregate(target~sexM+exang+slope+ca,data=train,FUN=sum)
train.agg = cbind(train.agg.y,total=train.agg.n$target)
2