network with sigmoid activation function. Derive the update equations for weights and
biases.
Backpropagation is a commonly used technique for training neural network. Compute
gradients layer by layer (reverse order).
The Neural Network will be built on three layers: Input layer with two inputs neurons, one
hidden layer with two neurons and output layer with a single neuron.
Gradient descent is an iterative optimization algorithm for finding the minimum of a
function; in our case we want to minimize the error function. To find a local minimum of a
function using gradient descent, one takes steps proportional to the negative of the gradient of
the function at the current point.
Forward Pass: Given input data X and target output Y, calculate the predicted
output Ŷ using the current weights and biases. Forward Pass: Input: X = [𝑥1, … , 𝑥𝑑 ]. Hidden
layer: Net input to neuron j: z𝑗 ℎ = Σ(𝑤𝑗𝑘 × 𝑥𝑘 ) + 𝑏𝑗 ℎ . Output of neuron j: 𝑎𝑗 ℎ = σ(z𝑗 ℎ ).
Output layer: Net input to output neuron: z𝑜 = Σ(𝑤𝑗𝑜 * 𝑎𝑗 ℎ ) + 𝑏𝑜 . Output (prediction): y =
σ(z𝑜 ).
Then apply the sigmoid activation function to the hidden layer and output layer. The most
1
commonly used activation functions: Sigmoid: σ(z) = 1+𝑒𝑥𝑝(−𝑧)
, Calculate Error: to find out how our network performed, compute the error between the
predicted output Ŷ and the actual output Y using the mean squared error as follows: error=0
1
if prediction=actual. Error = (𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 − 𝐴𝑐𝑡𝑢𝑎𝑙)2 .
2
Reducing error: the main goal of the training is to reduce the error or the difference
between prediction and actual output. Since actual output is constant, “not changing”, the
only way to reduce the error is to change prediction value. This can be done by
decomposing prediction into its basic elements we can find that weights are the variable
elements affecting prediction value. In other words, in order to change prediction value, we
need to change weights values.
Prediction= ℎ1 𝑤5 + ℎ2 𝑤6
= (𝑖1 𝑤1 + 𝑖2 𝑤2 )𝑤5 + (𝑖1 𝑤3 + 𝑖2 𝑤4 )𝑤6
To change or update the weights value so that the error is reduced, back-propagation is
used.
Backward Pass: Compute the gradient of the loss function with respect to the output layer
weights and biases. Backpropagation, short for “backward propagation of errors”, is a
mechanism used to update the weights using gradient descent. It calculates the gradient of the
error function with respect to the neural network’s weights. The calculation proceeds
backwards through the network. Neural network training is about finding weights that
minimize prediction error. We usually start our training with a set of randomly generated
weights. Then, backpropagation is used to update the weights in an attempt to correctly map
arbitrary inputs to outputs. Let the initial weights be: 𝑤1 , … , 𝑤6 .
𝜕 𝑒𝑟𝑟𝑜𝑟
𝑊𝑥 =𝑊𝑥 − 𝛼
𝜕 𝑊𝑥
New Weight = Old Weight – learning rate * Gradient