ISYE 6501 MIDTERM 1 EXAM
REPORTED QUESTIONS AND
ANSWERS
ARIMA Moving average - Answer-Previous errors et as predictors. Order-q moving
average goes back q time periods for errors.
ARIMA(p,d,q) types - Answer-White noise: ARIMA(0,0,0)
Random walk: ARIMA(0,1,0)
AR (Autoregressive model): ARIMA(p,0,0)
MA (moving average) model: ARIMA(0,0,q)
Basic exponential smoothing model: ARIMA(0,1,1)
When should you use an ARIMA model? - Answer-Generally with longer term
forecasting -- need 40 past data points at least for an ARIMA to work well.
What does GARCH stand for? - Answer-Generalize Autoregressive Conditional
Heteroskedasticity
GARCH - Answer-estimates or forecasts the variance of which we have time series data
(estimates the amount of error)
2 differences between GARCH and ARIMA - Answer-1) variances/squared errors are
not observations or linear errors
2) raw variances (not differences of variances)
In a simple linear regression, how do you measure the quality of the line? - Answer-By
the sum of squared errors, so we want the values of a0 and a1 that minimize the sum of
squared errors as it is the best fit regression line
Maximum likelihood - Answer-parameters that give the highest probability
How to find largest value of maximum likelihood expression - Answer-Find the largest
value of the exponent, as the exponential function gets larger as exponents get larger
Maximum Likelihood Fit - Answer-IF errors are normally distributed, independly and
identically distributed, this is the set of parameters that minimizes the sum of squares
AIC (Akaike Information Criterion) - Answer-2k - 2ln(L*)
L*: max likelihood value
k: number of parameters being estimated
, 2k is penalty term that balances likelihood with simplicity and avoids overfitting
Prefer models with smaller AIC
Nice properties if there are inf many data points
BIC (Bayesian Information Criterion) - Answer-Similar to AIC excepting it deals with
number of parameters and data points
PEnalty term is larger than AICsso encourages models with fewer parameters than AIC
Only use if you have more data than parameters
BIC = kln(n) - 2ln(L*)
BIC rule of thumb - Answer-|BIC1 - BIC2| > 10: smallerBIC model is very likely better
6< |BIC1 - BIC2| <10: smaller BIC model is likely better
2<|BIC1 - BIC2|< 6: smaller model is somewhat likely better
0<|BIC1 - BIC2|<2: smaller BIC model is slightly likely better
When is there causation? - Answer-cause is before effect
idea of causation makes sense
no outside factors causing the relationship
Two warnings with p-values - Answer-with large amounts of data, pvalues get small
even when attributes are not at all related to the response.
p-values are only probabilities even when meaningful
Confidence Interval - Answer-where the coefficient probably lies and how close that is to
0
T-statistic - Answer-coefficient divided by its standard error
related to p-value
Coefficient - Answer-when multipied by the attribute value
not much difference even if v. low p-value
Rsquared versus adjusted rsquared - Answer-r-squared: estimate of how much
variability your model accounts for
adj: adjusts for the number of attributes used.
Box Cox Transformation - Answer-logarithmic transformation that stretches out the
smaller range to enlarge its variability and shrinks the larger range to reduce its
variability
Detrending data - Answer-detrend factor by factor
Principal Component Analysis - Answer-PCA -- have a lot of factors and you wish to
know which subset of them is most important for predicting a response
PCA transforms data to (1) remove correlations within data and (2) rank coordinates by
importance
REPORTED QUESTIONS AND
ANSWERS
ARIMA Moving average - Answer-Previous errors et as predictors. Order-q moving
average goes back q time periods for errors.
ARIMA(p,d,q) types - Answer-White noise: ARIMA(0,0,0)
Random walk: ARIMA(0,1,0)
AR (Autoregressive model): ARIMA(p,0,0)
MA (moving average) model: ARIMA(0,0,q)
Basic exponential smoothing model: ARIMA(0,1,1)
When should you use an ARIMA model? - Answer-Generally with longer term
forecasting -- need 40 past data points at least for an ARIMA to work well.
What does GARCH stand for? - Answer-Generalize Autoregressive Conditional
Heteroskedasticity
GARCH - Answer-estimates or forecasts the variance of which we have time series data
(estimates the amount of error)
2 differences between GARCH and ARIMA - Answer-1) variances/squared errors are
not observations or linear errors
2) raw variances (not differences of variances)
In a simple linear regression, how do you measure the quality of the line? - Answer-By
the sum of squared errors, so we want the values of a0 and a1 that minimize the sum of
squared errors as it is the best fit regression line
Maximum likelihood - Answer-parameters that give the highest probability
How to find largest value of maximum likelihood expression - Answer-Find the largest
value of the exponent, as the exponential function gets larger as exponents get larger
Maximum Likelihood Fit - Answer-IF errors are normally distributed, independly and
identically distributed, this is the set of parameters that minimizes the sum of squares
AIC (Akaike Information Criterion) - Answer-2k - 2ln(L*)
L*: max likelihood value
k: number of parameters being estimated
, 2k is penalty term that balances likelihood with simplicity and avoids overfitting
Prefer models with smaller AIC
Nice properties if there are inf many data points
BIC (Bayesian Information Criterion) - Answer-Similar to AIC excepting it deals with
number of parameters and data points
PEnalty term is larger than AICsso encourages models with fewer parameters than AIC
Only use if you have more data than parameters
BIC = kln(n) - 2ln(L*)
BIC rule of thumb - Answer-|BIC1 - BIC2| > 10: smallerBIC model is very likely better
6< |BIC1 - BIC2| <10: smaller BIC model is likely better
2<|BIC1 - BIC2|< 6: smaller model is somewhat likely better
0<|BIC1 - BIC2|<2: smaller BIC model is slightly likely better
When is there causation? - Answer-cause is before effect
idea of causation makes sense
no outside factors causing the relationship
Two warnings with p-values - Answer-with large amounts of data, pvalues get small
even when attributes are not at all related to the response.
p-values are only probabilities even when meaningful
Confidence Interval - Answer-where the coefficient probably lies and how close that is to
0
T-statistic - Answer-coefficient divided by its standard error
related to p-value
Coefficient - Answer-when multipied by the attribute value
not much difference even if v. low p-value
Rsquared versus adjusted rsquared - Answer-r-squared: estimate of how much
variability your model accounts for
adj: adjusts for the number of attributes used.
Box Cox Transformation - Answer-logarithmic transformation that stretches out the
smaller range to enlarge its variability and shrinks the larger range to reduce its
variability
Detrending data - Answer-detrend factor by factor
Principal Component Analysis - Answer-PCA -- have a lot of factors and you wish to
know which subset of them is most important for predicting a response
PCA transforms data to (1) remove correlations within data and (2) rank coordinates by
importance