SCRIPT 2026 QUESTIONS WITH SOLUTIONS
GRADED A+
◍ When C is = 1.1 what does that mean?.
Answer: 10% higher just because of that interval of cycle
◍ When clustering for prediction how do we choose the prediction?.
Answer: When we see a new point, we just choose whichever cluster center
is closest.
◍ Why is k-means an expectation-maximization.
Answer: finding the mean of all the points in cluster is similar to finding an
expectation.Assigning data points to cluster centers is the maximization
step. Really we are minimizing, but we could think of it as maximizing the
negative of the distance to a cluster center
◍ How do we find the cluster centers?.
Answer: We take the mean of all the data points in cluster.
◍ What happens to p-values when you have a lot of data?.
Answer: they get small when attributes are not all related to the response
◍ when does exponential smoothing work well.
Answer: when the data is stationary (i.e., mean, variance and other measures
are all expected to be constant over time)
◍ what are non-parametric methods.
Answer: we don't force any specific form onto the predictor (knn)
◍ When should you use Bayesian Regression?.
Answer: When there's not much data and want to combine expert opinion.
◍ what does it signify when a coefficient for a classifier is close to zero.
, Answer: it means the corresponding attribute is probably not relevant
◍ True or False: When using a random forest model, it's easy to interpret how
its results are determined..
Answer: False. Unlike a model like regression where we can show the result
as a simple linear combination of each attribute times its regression
coefficient, in a random forest model there are so many different trees used
simultaneously that it's difficult to interpret exactly how any factor or
factors affect the result.
◍ how do you find the implied regression coefficients in PCR?.
Answer: you multiply the eigen vector by the new coefficient
◍ What is a validation set used for?.
Answer: used to choose best model
◍ How is the prediction calculated in Random Forests when doing regression
trees?.
Answer: use the average of the predicted response
◍ How are the steps in random forests?.
Answer: 1. Introduce randomness via bootstrapping. Branching: randomly
choose a small number of factors, set X. The common number of factors to
use is log(n). Choose the best factor within X to branch on.
◍ what does TP mean?.
Answer: point in the category, correctly classified
◍ How do you deal with attributes that might be more important than others in
KNN?.
Answer: You weight each dimension's distance different. The larger the
weight the higher the impact.
◍ What does ROC/AUC give you and what doesn't it.
Answer: gives a quick-and-dirty estimate of quality but does not
differentiate between the coset of FN and FP
◍ what are the parameters in a GARCH model?.
, Answer: p and q
◍ what does FN mean?.
Answer: point in the category model says no
◍ What is time-series data?.
Answer: The same data recorded over time often recorded at equal intervals
◍ What is an outlier?.
Answer: A data point that is very different from the rest
◍ When do you use standardization?.
Answer: PCA or clustering
◍ What is the idea behind random forests?.
Answer: Introduce radomness. We generate many different trees. They will
have different strengths and weaknesses. The average of all these trees is
better than a single tree with specific strengths and weaknesses
◍ What graph or plot can we use to find outliers?.
Answer: box-and-whisker plot
◍ When trying to answer questions about how a system works what is
important.
Answer: the coefficients
◍ What is Cumulative sum (CUSUM) used for.
Answer: detect in crease, decrease or both
◍ What effects does randomness have on training /validation performance?.
Answer: sometimes the randomness will make the performance look worse
than it really is, and sometimes the randomness will make the performance
look better than it really is
◍ A large value of K will lead to.
Answer: a large variance in predictios
◍ how do you calculate the observed trend?.
Answer: It is S_t - S_{t-1}. The difference between the two baselines