Answers Verified 100% Correct
what is a common % of data split for training vs. validation? what about training vs.
validation vs. testing? - ANSWER around 70/30 common for training/validation.
70/15/15 or somewhere around that number is common for training vs validation vs
testing
if someone describes a time series model as 'double exponential smoothing', which
components will be involved? which one will NOT be involved - ANSWER level and
trend will be involved. seasonality would not be
how are ARIMA and GARCH similar? how are they different? - ANSWER ARIMA and
GARCH both can be used for time series data and smoothing.
assuming ARIMA(p, d, q), what values for p, d, and q would essentially represent a
basic exponential smoothing model - ANSWER ARIMA (0, 1, 1)
Reference = lecture slides
When does ARIMA perform better than exponential smoothing in terms of short-term
forecasting? how many samples are generally required for ARIMA model? - ANSWER
when the data is more stable, with fewer peaks, valleys, and outliers. want 40 PAST
data points for ARIMA to work well.
which set of the following equation is used to detect a decrease vs. increase in the
context of CUSUM analysis
St = max{0, St-1 + (Xt - u - C)}
Is St >= T
vs
St = max(0, St-1 + (u - Xt - C)}
Is St >= T? - ANSWER Detecting an increase
St = max{0, St-1 + (Xt - u - C)}Is St >= T
Detecting a decrease
St = max(0, St-1 + (u - Xt - C)}
Is St >= T
See ISYE-6501 Module 6(L1-L2) Playlist v0822 timestamp 5::49
, in a regression model, given two variables A, B, what does it mean if the coefficient is a
lot larger (and positive) for variable A (years of education), relative to variable B (age in
years)
e.g.
income = a0 + a1 (years of education) + a2 (age in years) - ANSWER larger coefficient
(assuming coefficient for both variables are positive) would represent bigger impact on
the predictor variable
two lm models. A has R^2 of 0.99 whereas B has R^2 of 0.80 on the training data set,
which one will likely have better performance on the validation or test data set? -
ANSWER it is not possible to tell
What model would you use for any nonseasonal series of numbers that exhibits
patterns and is not a series of random events?
a. ARIMA g. k-means clustering b. CART h. k-nearest-neighbor classification c.
Crossvalidation i. Linear regression d. CUSUM j. Logistic regression e. Exponential
smoothing k. Support vector machine f. GARCH - ANSWER a
If you wanted to forecast volatility of stock data, which model would you choose? Keep
in mind, this data may have high heteroskedasticity.
a. ARIMA g. k-means clustering b. CART h. k-nearest-neighbor classification c.
Crossvalidation i. Linear regression d. CUSUM j. Logistic regression e. Exponential
smoothing k. Support vector machine f. GARCH - ANSWER f
Which model is a decision tree where each fork is a split in a predictor variable and
each end node contains a prediction for the outcome variable?
a. ARIMA g. k-means clustering b. CART h. k-nearest-neighbor classification c.
Crossvalidation i. Linear regression d. CUSUM j. Logistic regression e. Exponential
smoothing k. Support vector machine f. GARCH - ANSWER b
What is the difference between logistical and linear regression models? - ANSWER
Linear regression is used for predicting the continuous dependent variable using a given
set of independent features whereas Logistic Regression is used to predict the
categorical. Linear regression is used to solve regression problems whereas logistic
regression is used to solve classification problems.
What model is often used when differentiating between two things? Often the ANSWER
is