Modeling Error - Answers Given a particular NN architecture, the actual model that represents
the real world may not be in that space.
When model complexity increases, modeling error reduces, but optimization error increases.
Estimation Error - Answers Even if finding the best hypothesis, weights, and parameters that
minimize training error, may not generalize to test set
Optimization Error - Answers Even if your NN can perfectly model the world, your algo may not
find good weights that model the function.
When model complexity increases, modeling error reduces, but optimization error increases.
Effectiveness of transfer learning under certain conditions - Answers Remove last FC layer of
CNN and initialize it randomly, then run new data through network to train only that layer
In order to train the NN for transfer learning -freeze the CNN layers or early layers and learn
parameters in the FC layers.
Performs very well on very small amount of training, if similar to the original data
Does not work very well if the target task's dataset is very different
If you have enough data in the target domain, and is different than the source, better to just train
on the new data
Transfer learning = reuse features we learn on a very large dataset on a completely new thing
Steps:
Train on very large dataset
Take custom dataset and initialize network with weights trained in Step 1 (replace last fully
connected layer since classes in new network will be different)
Final step -> continue training on new dataset
Can either retrain all weights ("finetune") or freeze (ie: not update) weights in certain layers
(freezing reduces number of parameters that you need to learn)
, AlexNet - Answers 2x(CONV=>MAXPOOL=>NORM)=>3xCONV=>MAXPOOL=>3xFC
ReLU, specialized normalization layers, PCA-based data augmentation, Dropout, Ensembling
(used 7 NN with different random weights)
Critical development: More depth and ReLU
VGGNet - Answers 2x(2xCONV=>POOL)=>3x(3xCONV=>POOL)=>3xFC
Repeated Application of 3x3 Conv (stride of 1, padding of) & 2x2 Max Pooling (stride 2) blocks
Very large number of parameters (most in FC) layers, most memory in Conv Layers (you are
storing activation produced in forward pass)
Critical Development: Blocks of repeated structures
Inception Net - Answers Deeper and more complex than VGGNet
Average Pooling before FC Layer
Repeated blocks that are repeated over again to form NN
Blocks are made of simple layers, FC, Conv, MaxPool, and softmax
Parallel filters of different sizes to get features at multiple scales
Critical Development: Blocks of parallel paths
Uses Network In Network concept i.e 1x1 Convolution -sort of Dimensionality reduction see
slide
Negative things: Increased Computational Work
ResNet - Answers Allow information from a layer to propagate to a future layer
Passes residuals of a layer at depth x and adds it to the output of the layer at x+1
Averaging block at end
Critical Development: Passing residuals of previous layers forward
Convolutional layers and how they work (forward/backward) - Answers
https://www.youtube.com/watch?v=Lakz2MoHy6o&t=1299s
(Don't have a good short summary)
Equivariance - Answers If the input changes, the output changes in the same way if f(g(x)