ISYE 6501 Study Guide With Complete Questions And Answers All Correct.
Classification problems are commonly solved using what model(s)? - correct answer Support Vector Machine Clustering problems are commonly solved using what model(s)? - correct answer k-means Response Prediction questions are commonly solved using what model(s)? - correct answer -ARIMA -CART -Exponential smoothing -linear regression -logistic regression -Random Forest Validation questions are commonly solved using what model(s)? - correct answer -Cross Validation Variance Estimation questions are commonly solved using what model(s)? - correct answer -GARCH Examples of models that are designed for use with time series data - correct answer -ARIMA -CUSUM -Exponential Smoothing -GARCH In the soft classification SVM model where we select coefficients a_0 ... a_m to minimize sum(max(0, 1 - (sum(a_i * x_ij) + a_0 ) * y_j ) + C * sum(a_i ^ 2) ). If we want to have a larger margin even though it means possibly having more classification error, the value of C should get: - correct answer Larger Best way to split data - correct answer -70% for training -15% for validation -15% for test Purpose of a test set - correct answer Estimate quality of selected model Purpose of a training set - correct answer Fit parameters of all models Purpose of a validation set - correct answer compare all models and select best True or False: The most useful classification models are the ones that correctly classify the highest fraction of data points. - correct answer False. Explanation: Sometimes the cost of a false positive is so high that it's worth accepting more false negatives, or vice versa. Lesson 10.6 A model is built to determine whether data points belong to a category or not. A "true negative" result is: - correct answer A data point that is not in the category, and the model correctly says so. Explanation: True' and 'false' refer to whether the model is correct or not, and 'positive' and 'negative' refer to whether the model says the point is in the category. Lesson 10.5 A logistic regression model can be especially useful when the response... - correct answer is binary (zero or one) or is a probability (a number between zero and one). Lesson 10.4 True or False: When using a random forest model, it's easy to interpret how its results are determined. - correct answer False Explanation: Unlike a model like regression where we can show the result as a simple linear combination of each attribute times its regression coefficient, in a random forest model there are so many different trees used simultaneously that it's difficult to interpret exactly how any factor or factors affect the result. Lesson 10.3 A common rule of thumb is to stop branching if a leaf would contain less than 5% of the data points. Why not keep branching and allow models to find very close fits to each very small subset of data? - correct answer Fitting to very small subsets of data will cause overfitting. Explanation: With too few data points, the models will fit to random patterns as well as real ones. Lesson 10.2 True or false: In a regression tree, every leaf of the tree has a different regression model that might use different attributes, have different coefficients, etc. - correct answer True. Explanation: Each leaf's individual model is tailored to the subset of data points that follow all of the branches leading to the leaf. Lesson 10.1 True or false: Tree-based approaches can be used for other models besides regression. - correct answer True. Explanation: For example, a classification tree might have a different SVM or KNN model at each leaf. It might even use SVM at some leaves and KNN at others (though that's probably rare). Lesson 10.1 What does "heteroscedasticity" mean? - correct answer The variance is different in different ranges of the data. Lesson 9.1 You might want to de-trend data before... - correct answer ...using time-series data in a regression model. Explanation: Factor-based models like regression generally don't account for time-based effects like trend. Lesson 9.2 Which of the following does principal component analysis (PCA) do? - correct answer Transform data so theres no correlation between dimensions. Rank the new dimensions in likely order of importance.
Written for
- Institution
- ISYE 6501.
- Course
- ISYE 6501.
Document information
- Uploaded on
- May 22, 2024
- Number of pages
- 10
- Written in
- 2023/2024
- Type
- Exam (elaborations)
- Contains
- Questions & answers
Subjects
-
isye 6501
Also available in package deal