A+ Grade
Modeling Error
- correct answer Given a particular NN architecture, the actual model that represents the real world may
not be in that space.
When model complexity increases, modeling error reduces, but optimization error increases.
Estimation Error
- correct answer Even if finding the best hypothesis, weights, and parameters that minimize training
error, may not generalize to test set
Optimization Error
- correct answer Even if your NN can perfectly model the world, your algo may not find good weights
that model the function.
When model complexity increases, modeling error reduces, but optimization error increases.
Effectiveness of transfer learning under certain conditions
- correct answer Remove last FC layer of CNN and initialize it randomly, then run new data through
network to train only that layer
In order to train the NN for transfer learning -freeze the CNN layers or early layers and learn parameters
in the FC layers.
Performs very well on very small amount of training, if similar to the original data
Does not work very well if the target task's dataset is very different
If you have enough data in the target domain, and is different than the source, better to just train on the
new data
, Transfer learning = reuse features we learn on a very large dataset on a completely new thing
Steps:
Train on very large dataset
Take custom dataset and initialize network with weights trained in Step 1 (replace last fully connected
layer since classes in new network will be different)
Final step -> continue training on new dataset
Can either retrain all weights ("finetune") or freeze (ie: not update) weights in certain layers (freezing
reduces number of parameters that you need to learn)
AlexNet
- correct answer 2x(CONV=>MAXPOOL=>NORM)=>3xCONV=>MAXPOOL=>3xFC
ReLU, specialized normalization layers, PCA-based data augmentation, Dropout, Ensembling (used 7 NN
with different random weights)
Critical development: More depth and ReLU
VGGNet
- correct answer 2x(2xCONV=>POOL)=>3x(3xCONV=>POOL)=>3xFC
Repeated Application of 3x3 Conv (stride of 1, padding of) & 2x2 Max Pooling (stride 2) blocks
Very large number of parameters (most in FC) layers, most memory in Conv Layers (you are storing
activation produced in forward pass)
Critical Development: Blocks of repeated structures
Inception Net
- correct answer Deeper and more complex than VGGNet
Average Pooling before FC Layer
Repeated blocks that are repeated over again to form NN
Blocks are made of simple layers, FC, Conv, MaxPool, and softmax
Parallel filters of different sizes to get features at multiple scales
Critical Development: Blocks of parallel paths