STUDY GUIDE 2026 FULL QUESTIONS AND
SOLUTIONS GRADED A+
◍ For Weibull distribution, when k = 1.
Answer: · Modeling when failure rate is constant with time· Weibull =
exponential
◍ what are constraints.
Answer: restrictions on variable names
◍ What is Cox proportional hazard model?.
Answer: Like logistic regression model, uses exponential function to find
survival probability
◍ What is the points where the different functions connect?.
Answer: they are called knots
◍ What is stochastic simulations?.
Answer: · Use when system has randomness, may get a different output
even with the same input
◍ Bias-variance tradeoff.
Answer: · Underfit model has high bias and low variance - underfitting the
real effects while eliminating variance from random effect· Overfit models
has low bias and high variance - good fits to real patterns, getting lot of
unwanted variance from random patterns
◍ What are two differences between the GARCH and ARIMA model?.
Answer: 1. GARCH uses variances/squared error whereas ARIMA uses
observations and linear error terms2. GARCH uses the raw variances,
whereas ARIMA uses differences
,◍ Non-zero sum game.
Answer: total benefit might be higher or lower
◍ When C is = 1 what does that mean?.
Answer: no effect
◍ what is a common error measure for simple linear regression.
Answer: sum of squared error
◍ what is a general non-convex program.
Answer: Optimization problem is not convex
◍ maximum likelihood.
Answer: parameters that give the highest probability
◍ What are the three levels of neurons.
Answer: input level, hidden level, and output level
◍ What are variables in optimization model?.
Answer: decisions that the optimization solver will pick the best value for;
must be something we can alter or change
◍ what are the basic steps to solve an optimization problem.
Answer: 1) Initialization: pick values for all the variables (they may be
simple, bad and not satisfy all of the constraints) 2.) find an improving
direction t and make a change in that direction of some amount called the
step size (theta) 3.) repeat using the the old solution plus the improving
direction times the step size
◍ what is a test data set used for?.
Answer: to estimate performance of chosen model
◍ ARIMA Moving average.
Answer: Previous errors et as predictors. Order-q moving average goes back
q time periods for errors.
◍ how could we detect outliers when there are multiple dimensions?.
Answer: we could fit a model and then determine the points with a large
, error
◍ Coefficient.
Answer: when multipied by the attribute value not much difference even if
v. low p-value
◍ Columns.
Answer: The 'answer' for each data point (response/outcome)
◍ Setting a large value of k will ....
Answer: lead to a large model bias.
◍ Types of Variation Estimate Models.
Answer: GARCH
◍ What is unstructured data?.
Answer: Data that is not easily described and stored (e.g., written text)
◍ What are the drawbacks of random forests?.
Answer: Harder to explain/interpret results. Can't give us a specific
regression or classification model from the data.
◍ What can change detection be used for?.
Answer: Determining whether action might be needed, determining impact
of past action, determining changes to help plan.
◍ what are modules.
Answer: parts of process (queues, storage, etc)
◍ what type of models are network models.
Answer: linear program
◍ What is elastic net?.
Answer: A variable selection method that works by minimizing the squared
error and constraining the combination of absolute values of coefficients and
their squares
◍ Describe what the K-means algorithm does step-by-step..
Answer: 1. Pick cluster centers within data range for k number of clusters2.
, Assign each point to closest cluster center3. Recalculate cluster centers4.
Repeat steps 2 and 3 until there is no change
◍ What does p equal for euclidean distance?.
Answer: 2
◍ What are first order differences?.
Answer: observation at time t minus previous observation at time
t-1x(t)-x(t-1)
◍ Why do we need to use the same number of random numbers (seed).
Answer: Because if we wouldn't be able to accurately compare the two sets
of replications because the replications while the same distribution are still
different
◍ when is it useful to empirical bayesian modeling.
Answer: In the absence of lots of data
◍ When you have higher p-values ....
Answer: increase the possibility of including irrelevant factors
◍ How do you validate a simulation?.
Answer: Use real data to validate your simulation is giving reasonable
results· Real and simulated averages don't match = problem· Averages
match, variances don't match = problem
◍ what can you detrend?.
Answer: The response and predictors in factor-based models
◍ How do the relationships of λ differ when using C in computation?.
Answer: (SVM) C has inverse relationships to λ. As C increases,
minimization of error is prioritized. As C decreases, maximization of margin
is prioritized
◍ Total error.
Answer: What does this formula describe ?
◍ What is a deterministic simulations?.
Answer: · Same inputs give the same outputs (no randomness)