Update with complete solution
Convolution Features - answer-edges
colors
textures
motifs (corners, shapes)
Receptive field - answer-A region of an image (image patch) from which the
node receives input. Usually denoted by a K1 x K2 matrix.
Convolution vs Cross-correlation - answer-Convolution: flip the kernel (rotate
180) and take the dot product with image patch
Cross-correlation: do not flip the kernel to take the dot product with image
patch
Advantage of using image patch - answer-1./ Reduces the input parameters to
K1 x K2 + 1 (bias)
for each output node. Thus, the total number of input parameters:
N x (K1 + K2 + 1)
2./ Explicitly maintains spatial information
, How many learned parameters does a max pooling layer have? - answer-None
Invariance - answer-If the feature changes, moves or rotates slightly on the
image, the output value remains the same. (For example, we classify the image
of a cat regardless of where the cat is in the image)
Equivariance - answer-If the feature translates or moves a little bit, the output
values move by the same translation and can be detected in the new location.
Why different kernels would learn different features? - answer-Because we
initialize them to different values, and the local minima on the weight space
will different, and so the gradient will be different --> kernels are learning
different features.
If cross-correlation is the forward pass,
then gradient w.r.t. the input is ... - answer-CONVOLUTION between the
upstream and the kernel weights
If cross-correlation is the forward pass,
then gradient w.r.t the kernel is ... - answer-CROSS-CORRELATION between the
upstream gradient and the input
LeNet - answer-simple conv architecture:
Conv - MaxPool - Conv - MaxPool - FC - FC - Gaussian (=MSE loss)