ISYE 6501 MIDTERM 1QUESTIONS
AND ANSWERS
Rows/-/ans---Data/points/are/values/in/data/tables
Columns/-/ans---The/'answer'/for/each/data/point/(response/outcome)
Structured/Data/-/ans---Quantitative,/Categorical,/Binary,/Unrelated,/Time/Series
Unstructured/Data/-/ans---Text
Support/Vector/Model/-/ans---Supervised/machine/learning/algorithm/used/for/both/
classification/and/regression/challenges./
Mostly/used/in/classification/problems/by/plotting/each/data/item/as/a/point/in/n-
dimensional/space/(n/is/the/number/of/features/you/have)/with/the/value/of/each/feature/
being/the/value/of/a/particular/coordinate./
Then/you/classify/by/finding/a/hyperplane/that/differentiates/the/2/classes/very/well./
Support/vectors/are/simply/the/coordinates/of/individual/observation/--/it/best/segregates/
the/two/classes/(hyperplane///line).
What/do/you/want/to/find/with/a/SVM/model?/-/ans---Find/values/of/a0,/a1,...,up/to/am/that/
classifies/the/points/correctly/and/has/the/maximum/gap/or/margin/between/the/parallel/
lines.
What/should/the/sum/of/the/green/points/in/a/SVM/model/be?/-/ans---The/sum/of/green/
points/should/be/greater/than/or/equal/to/1
What/should/the/sum/of/the/red/points/in/a/SVM/model/be?/-/ans---The/sum/of/red/points/
should/be/less/than/or/equal/to/-1
What/should/the/total/sum/of/green/and/red/points/be?/-/ans---The/total/sum/of/all/green/
and/red/points/should/be/equal/to/or/greater/than/1/because/yj/is/1/for/green/and/-1/for/red.
First/principal/component/-/ans---PCA/--/a/linear/combination/of/original/predictor/
variables/which/captures/the/maximum/variance/in/the/data/set./It/determines/the/
direction/of/highest/variability/in/the/data./Larger/the/variability/captured/in/first/
, component,/larger/the/information/captured/by/component./No/other/component/can/
have/variability/higher/than/first/principal/component.
it/minimizes/the/sum/of/squared/distance/between/a/data/point/and/the/line.
Second/principal/component/-/ans---PCA/--/also/a/linear/combination/of/original/
predictors/which/captures/the/remaining/variance/in/the/data/set/and/is/uncorrelated/with/
Z¹./In/other/words,/the/correlation/between/first/and/second/component/should/is/zero.
What/if/it's/not/possible/to/separate/green/and/red/points/in/a/SVM/model?/-/ans---
Utilize/a/soft/classifier/--/In/a/soft/classification/context,/we/might/add/an/extra/multiplier/
for/each/type/of/error/with/a/larger/penalty,/the/less/we/want/to/accept/mis-classifying/that/
type/of/point.
Soft/Classifier/-/ans---Account/for/errors/in/SVM/classification./Trading/off/minimizing/
errors/we/make/and/maximizing/the/margin.
To/trade/off/between/them,/we/pick/a/lambda/value/and/minimize/a/combination/of/error/
and/margin./As/lambda/gets/large,/this/term/gets/large.
The/importance/of/a/large/margin/outweighs/avoiding/mistakes/and/classifying/known/
data/points.
Should/you/scale/your/data/in/a/SVM/model?/-/ans---Yes,/so/the/orders/of/magnitude/are/
approximately/the/same.
Data/must/be/in/bounded/range.
Common/scaling:/data/between/0/and/1
a./Scale/factor/by/factor
b./Linearly
How/should/you/find/which/coefficients/to/hold/value/in/a/SVM/model?/-/ans---If/there/is/a/
coefficient/who's/value/is/very/close/to/0,/means/the/corresponding/attribute/is/probably/
not/relevant/for/classification.
Does/SVM/work/the/same/for/multiple/dimensions?/-/ans---Yes
Does/a/SVM/classifier/need/to/be/a/straight/line?/-/ans---No,/SVM/can/be/generalized/
using/kernel/methods/that/allow/for/nonlinear/classifiers./Software/has/a/kernel/SVM/
function/that/you/can/use/to/solve/for/both/linear/and/nonlinear/classifiers.
Can/classification/questions/be/answered/as/probabilities/in/SVM?/-/ans---Yes.
K/Nearest/Neighbor/Algorithm/-/ans---Find/the/class/of/the/new/point,/Pick/the/k/closest/
points/to/the/new/one,/the/new/points/class/is/the/most/common/amongst/the/k/neighbors.
What/should/you/do/about/varying/level/of/importance/across/attributes/with/K/Nearest/
Neighbors?/-/ans---Some/attributes/might/be/more/important/than/others/to/the/
classification/---/can/deal/with/this/by/weighting/each/dimension's/distance/differently.
AND ANSWERS
Rows/-/ans---Data/points/are/values/in/data/tables
Columns/-/ans---The/'answer'/for/each/data/point/(response/outcome)
Structured/Data/-/ans---Quantitative,/Categorical,/Binary,/Unrelated,/Time/Series
Unstructured/Data/-/ans---Text
Support/Vector/Model/-/ans---Supervised/machine/learning/algorithm/used/for/both/
classification/and/regression/challenges./
Mostly/used/in/classification/problems/by/plotting/each/data/item/as/a/point/in/n-
dimensional/space/(n/is/the/number/of/features/you/have)/with/the/value/of/each/feature/
being/the/value/of/a/particular/coordinate./
Then/you/classify/by/finding/a/hyperplane/that/differentiates/the/2/classes/very/well./
Support/vectors/are/simply/the/coordinates/of/individual/observation/--/it/best/segregates/
the/two/classes/(hyperplane///line).
What/do/you/want/to/find/with/a/SVM/model?/-/ans---Find/values/of/a0,/a1,...,up/to/am/that/
classifies/the/points/correctly/and/has/the/maximum/gap/or/margin/between/the/parallel/
lines.
What/should/the/sum/of/the/green/points/in/a/SVM/model/be?/-/ans---The/sum/of/green/
points/should/be/greater/than/or/equal/to/1
What/should/the/sum/of/the/red/points/in/a/SVM/model/be?/-/ans---The/sum/of/red/points/
should/be/less/than/or/equal/to/-1
What/should/the/total/sum/of/green/and/red/points/be?/-/ans---The/total/sum/of/all/green/
and/red/points/should/be/equal/to/or/greater/than/1/because/yj/is/1/for/green/and/-1/for/red.
First/principal/component/-/ans---PCA/--/a/linear/combination/of/original/predictor/
variables/which/captures/the/maximum/variance/in/the/data/set./It/determines/the/
direction/of/highest/variability/in/the/data./Larger/the/variability/captured/in/first/
, component,/larger/the/information/captured/by/component./No/other/component/can/
have/variability/higher/than/first/principal/component.
it/minimizes/the/sum/of/squared/distance/between/a/data/point/and/the/line.
Second/principal/component/-/ans---PCA/--/also/a/linear/combination/of/original/
predictors/which/captures/the/remaining/variance/in/the/data/set/and/is/uncorrelated/with/
Z¹./In/other/words,/the/correlation/between/first/and/second/component/should/is/zero.
What/if/it's/not/possible/to/separate/green/and/red/points/in/a/SVM/model?/-/ans---
Utilize/a/soft/classifier/--/In/a/soft/classification/context,/we/might/add/an/extra/multiplier/
for/each/type/of/error/with/a/larger/penalty,/the/less/we/want/to/accept/mis-classifying/that/
type/of/point.
Soft/Classifier/-/ans---Account/for/errors/in/SVM/classification./Trading/off/minimizing/
errors/we/make/and/maximizing/the/margin.
To/trade/off/between/them,/we/pick/a/lambda/value/and/minimize/a/combination/of/error/
and/margin./As/lambda/gets/large,/this/term/gets/large.
The/importance/of/a/large/margin/outweighs/avoiding/mistakes/and/classifying/known/
data/points.
Should/you/scale/your/data/in/a/SVM/model?/-/ans---Yes,/so/the/orders/of/magnitude/are/
approximately/the/same.
Data/must/be/in/bounded/range.
Common/scaling:/data/between/0/and/1
a./Scale/factor/by/factor
b./Linearly
How/should/you/find/which/coefficients/to/hold/value/in/a/SVM/model?/-/ans---If/there/is/a/
coefficient/who's/value/is/very/close/to/0,/means/the/corresponding/attribute/is/probably/
not/relevant/for/classification.
Does/SVM/work/the/same/for/multiple/dimensions?/-/ans---Yes
Does/a/SVM/classifier/need/to/be/a/straight/line?/-/ans---No,/SVM/can/be/generalized/
using/kernel/methods/that/allow/for/nonlinear/classifiers./Software/has/a/kernel/SVM/
function/that/you/can/use/to/solve/for/both/linear/and/nonlinear/classifiers.
Can/classification/questions/be/answered/as/probabilities/in/SVM?/-/ans---Yes.
K/Nearest/Neighbor/Algorithm/-/ans---Find/the/class/of/the/new/point,/Pick/the/k/closest/
points/to/the/new/one,/the/new/points/class/is/the/most/common/amongst/the/k/neighbors.
What/should/you/do/about/varying/level/of/importance/across/attributes/with/K/Nearest/
Neighbors?/-/ans---Some/attributes/might/be/more/important/than/others/to/the/
classification/---/can/deal/with/this/by/weighting/each/dimension's/distance/differently.