Incomplete - Need to study more Batch Normalization is important to train deep nns. VGG 16 and 19 were developed before BN. So they trained 11 layer first added few layers trained again and so on. Inception used auxillary losses to train, not necessary but to propagate loss into first layers. Residual Nets: 1 important property, if the weights of residual network is zero, then it behaves as identity transformation, so that network can choose what is not required. Easy for the model to learn to not to use the layers it doesn't need. L2 regularization - making the weights to zero. 2 Gradient flow in backward pass is easy, so deeper nets can be designed. DenseNet and FractalNet - Study! Recurrent Neural Network - Variable Size Data x -------> [ RNN ] --------> y Everytime x is inputted, RNN's hidden state will be updated, here's the difference: the internal hidden state will be fed back to the model on next input and so on. So input -> updat...