LATEST ISYE 6501 Exam 2 QUESTIONS WITH 100% VERIFIED SOLUTIONS LATEST UPDATE 2024
What are some of the consequences of outliers in k-means? - ANSWER It will drag the cluster center artificially to one side Because k-means is a heuristic and thus fast what can we do? - ANSWER run it several times choosing different clusters centers and choose the best one and we can choose different values of k How does bias/variance change as k changes in KNN? - ANSWER the higher the k, the higher the bias; the lower the k, the higher the variance. when k = 1, that is the most complex model and thus likely to overfit the data How do we find the best value of k in k means? - ANSWER Elbow method: we calculate the total distance of each data point to its cluster center and plot it in two dimensions. We look for the kik in the graph When clustering for prediction, how do we choose the prediction? - ANSWER When we see a new point, we just choose whichever cluster center is closest What is the difference between classification and clustering - ANSWER With classification models, we know each data point's attributes and we already know the right classification for the data points (supervised). In clustering (unsupervised), we know the attributes, but we don't know what group any of these data points are in What is the difference between supervised learning and unsupervised learning? - ANSWER Supervised - the response is known Unsupervised - the response is not known The k-means algorithm for clustering is a "heuristic" because - ANSWER it isn't guaranteed to get the best answer but it will get to a solution quickly A group of astronomers has set of long-exposure CCD images of various distant objects. They do not know yet which types of object each one is, and would like your help using analytics to determine which ones look similar. Which is more appropriate: classification or clustering? - ANSWER clustering Suppose one astronomer has categorized hundreds of the images by hand, and now wants your help using analytics to automatically determine which category each new image belongs to. Which is more appropriate: classification or clustering? - ANSWER Classification Which of these is generally a good reason to remove an outlier from your data set? A. The outlier is an incorrectly-entered data, not real data. B. Outliers like this only happen occasionally - ANSWER A. If the data point isn't a true one, you should remove it from your data set. What is an outlier? - ANSWER A data point that is very different from the rest What graph or plot can we use to find outliers? - ANSWER box and whisker plot What are the parts of a box and whisker plot? - ANSWER The bottom and top of the box are the 25th and 75th percentile. The middle value is the median. The whiskers stretch up and down to reasonable range of values (10 and 90th or 5th and 95th percentiles) Where would outliers exist in a box and whisker plot? - ANSWER Outside of the whiskers What are some ways to deal with outliers that are bad data? - ANSWER Omit them or use imputation
Written for
- Institution
- ISYE 6501
- Course
- ISYE 6501
Document information
- Uploaded on
- February 21, 2024
- Number of pages
- 3
- Written in
- 2023/2024
- Type
- Exam (elaborations)
- Contains
- Questions & answers
Subjects
-
isye 6501
-
verified answers
-
isye 6501 exam 2 questions with 100 verified solu
-
isye 6501 exam 2 questions