ISYE 6501 Final Exam Questions and Answers | Updated 2026 | Graded A+
ISYE 6501 Final Exam Questions and Answers | Updated 2026 | Graded A+ What do descriptive questions ask? - answer-What happened? (e.g., which customers are most alike) What do predictive questions ask? - answer-What will happen? (e.g., what will Google's stock price be?) What do prescriptive questions ask? - answer-What action(s) would be best? (e.g., where to put traffic lights) What is a model? - answer-Real-life situation expressed as math. What do classifiers help you do? - answer-differentiate What is a soft classifier and when is it used? - answer-In some cases, there won't be a line that separates all of the labeled examples. So we use a classifier that minimizes the number of mistakes. What does it mean when the classifier/decision boundary is almost parallel to the vertical xaxis? - answer-The horizontal attribute is all that is needed. What does it mean when the classifier/decision boundary is almost parallel to the horizontal yaxis? - answer-The vertical attribute is all that is needed.What is time-series data? - answer-The same data recorded over time often recorded at equal intervals What is quantitative data? - answer-Number with a meaning: higher means more, lower means less (e.g., age, sales, temperature, income) What is categorical data? - answer-Numbers w/o meaning (e.g., zip codes), non-numeric (e.g., hair color), binary data (e.g., male/female, yes/no, on/off) Which of these is time series data? A. The average cost of a house in the United States every year since 1820 B. The height of each professional basketball player in the NBA at the start of the season - answer-A Which of these is structured data? A. The contents of a person's Twitter feed B. The amount of money in a person's bank account - answer-B What is structured data? - answer-Data that can be stores in a structured way What is unstructured data? - answer-Data that is not easily described and stored (e.g., written text) A survey of 25 people recorded each person's family size and type of car. Which of these is a data point? A. The 14th person's family size and car type B. The 14th person's family size C.The car type of each person - answer-A.A data point is all the information about one observation The farther the wrongly classified point is from the line ___ - answer-The bigger the mistake we've made The term including the margin gets larger so the importance of a large margin out weights avoiding mistakes and classifying known data samples. - answer-As lambda gets larger That term also drops towards zero, so the importance of minimizing mistakes and classifying known data points outweighs having a large margin. - answer-As lambda drops towards zero What can SVMs be used for - answer-to find a classifier with maximum seperation or margin between the two sets of points? When to use SVM? - answer-If it's impossible to avoid classification errors, SVM can find a classifier that trades off reducing errors and enlarging the margin. Error for data point j - answer-What does this formula describe? Total error - answer-What does this formula describe ? To maximize the distance between the two lines what do we need to minimize? - answerm_j 1 - answer-What value do we give for more costly errors Giving a bad loan is twice as costly as withholding a good loan? - answer-What does this mean in the context of giving a loan?m_j 1 - answer-What value do we give for less costly errors? Why is it important to scale our data when using SVM? - answer-We're looking to minimize the sum of the squares of the coefficients, but if our data has very different scales a small change in one could swamp a huge change in the other. what does it signify when a coefficient for a classifier is close to zero - answer-it means the corresponding attribute is probably not relevant What do kernel methods allow for in SVMs - answer-nonlinear classifiers What is the common range for scaled data? - answer-between 0 and 1 What is the formula for min-max scaling? - answer-find min and max for a factor what is common standardization and its formula? - answer-scaling to a normal distribution with a mean of 0 and standard deviation of 1. what is the formula for general scaling between b and a - answerWhen do you use scaling? - answer-Data in a bounded range (e.g., neural networks, RGB values, SAT scores, batting averages) When do you use standardization? - answer-PCA or clustering When is KNN used? - answer-Used for solving classification problems in which there are more than two classes.How do you deal with attributes that might be more important than others in KNN? - answerYou weight each dimension's distance different. The larger the weight the higher the impact. A large value of K will lead to - answer-a large variance in predictios Setting a large value of k will ... - answer-lead to a large model bias. What are real effects? - answer-Real relationships between attributes and responses. They are the same in all data sets, What are random effects? - answer-They are random but look like real effects. They are different in all data sets. Why can't we measure a model's effectiveness on data it was trained on? - answer-The model's performance on its training data is usually too optimistic, the model is fit to both real and random pattenrs in the data, so it becomes overly specialized to the specific randomness in the training set, that doesn't exist in other data. If we use the same data to fit a model as we do to estimate how good it is, what is likely to happen? - answer-The model will appear to be better than it really is. The model will be fit to both real and random patterns in the data. The model's effectiveness on this data set will include both types of patterns, but its true effectiveness on other data sets (with different random patterns) will only include the real patterns When comparing models, if we use the same data to pick the best model as we do to estimate how good the best one is, what is likely to happen? - answer-The model will appear to be better than it really is.The model with the highest measured performance is likely to be both good and lucky in its fit to random patterns. What is a training set used for - answer-used to fit the models What is a validation set used for? - answer-used to choose best model Why would we use two sets? - answer-Reason to use two different sets is because if the first set, the training set, had unique random effects that the classifer was designed for, we wouldn't be counting those benefits when we measure effectiveness on the validation set. What effects does randomness have on training /validation performance? - answer-sometimes the randomness will make the performance look worse than it really is, and sometimes the randomness will make the performance look better than it really is how are high-performing models affected by randomness? - answer-They are often boosted by above average random effects making it look better what is a test data set used for? - answer-to estimate performance of chosen model When do we need a validation set? - answer-When we are choosing between multiple models. What are the data splits when working with one model? - answer-70-90% training, 10-30% test What are the data splits when comparing models? - answer-50-70% training, split the rest between validation and test What are two methods of splitting data? - answer-random and roationWhat is the rotation method of splitting data? - answer-You take turns selecting points. 5 data point rotation sequence: (Training - Validation - Training - Test - Training What is the advantage of rotation over randomness? - answer-We make sure each part of the data is equally separated. What is the disadvantage of using rotation? - answer-We have to make sure we aren't creating some other type of bias when we assign points. what is k-fold cross validation? - answer-split the training/validation data into k-parts; we train on k-1 parts and validate on the remaining part. What metric do you use for k-fold cross validation when comparing models? - answer-The average of all k evaluations. What do we use when important data only appears in the validation or test sets? - answercross-validation
Geschreven voor
- Instelling
- ISYE 6501
- Vak
- ISYE 6501
Documentinformatie
- Geüpload op
- 30 maart 2026
- Aantal pagina's
- 47
- Geschreven in
- 2025/2026
- Type
- Tentamen (uitwerkingen)
- Bevat
- Vragen en antwoorden
Onderwerpen
-
isye 6501 final exam questions and answers updat