Skip to main content

Posts

Showing posts from January, 2019

Transpose Convolution or Deconvolution

Transpose Convolution Explained Source: Primary Read More Upsampling vs Transpose Convolution :  https://stackoverflow.com/questions/48226783/ Upsampling: [Not trainable] In keras, Upsampling, provided you use tensorflow backend, what actually happens is keras calls tensorflow resize_images function, which essentially is an interpolation and not trainable. Like resizing. For Caffe:  https://stackoverflow.com/questions/40872914/ layer { name: "upsample" , type: "Deconvolution" bottom: "{{bottom_name}}" top: "{{top_name}}" convolution_param { kernel_size: {{2 * factor - factor % 2}} stride: {{factor}} num_output: {{C}} group: {{C}} pad: {{ceil((factor - 1) / 2.)}} weight_filler: { type: "bilinear" } bias_term: false } param { lr_mult: 0 decay_mult: 0 } } By specifying num_output: {{C}} group: {{C}}, it behaves as channel-wise convolution. The filter shape of this deconvolu

Weight Initialization

Weight Initialization Why do we initialize random weights? (Why don't we start with all zero or all equal weights?) Because the update will be done equally across all neurons. Why do we initialize with small weights? (Why not large weights?) And are there any problems with too small weights? There are problems with too small weights - also depends on activation, say we have tanh So as we move forward through networks - mean is kind of zero (because tanh) but the SD starts diminishing.. ultimately at the last layers weights are tooo small and final output is almost zero. and in backprop the gradients are also small so weights won't update - vanishing gradient For higher weights - will fall in the saturation region, thus same thing at saturation region, the gradient is low. - but what about ReLU ------------------------------- So based on activation Function Weight Initialization must be different: https://machinelearningmastery.com/weight-initialization-for-deep-learning-neura

Implement Neural Network

Implementing Neural Network All the features of input are weighted to obtain one output, if we want more outputs we need to increase weight accordingly, so if we have D features (as row/width) in input number of weights will be D (the column/height of Weight Matrix if the implementation is considered). If we want more outputs we need to add more columns. https://www.draw.io/ File Depending on where you keep your input or structure it, output weight matrix dimension or structure becomes accordingly. There are two cases: Case 1: Y = X * W [This is the case the above figure considers] Where each row of X is a single data and thus number of columns of X denotes the number of input features. Then, Size of Weight Matrix = Feature Length of Input x Feature Length of Output W = Input x Output So in the output i.e. Y, each row is a single data, and number of output features = number of columns. Case 2: Y = W * X [https://medium.com/usf-msds/deep-learning-best-practices-1

Activation Functions in Convolutional Neural Network / Data Preprocessing

Slides from Stanford cs231n - Lec 6 Sigmoid: Output Range : 0 (for -inf) to 1 (for inf) Problem: a) Saturation - At saturation the gradient is zero - so weights will stop updating - or update very slowly. So stop updating may mean the model is perfect - fits training data perfectly. But say on initialization the neuron was saturated - then problem! Thus saturation is a problem because it erodes the plasticity of neural networks and usually results in worse test performance. Like overfit. [ Source ] b)  Outputs are not zero centered - across y axis. [output range is 0-1] Time 12:00  So gradient will always be all positive or all negative dL/df * df/dw, L = f(wx+b), f = wx + b, df/dw is always +ve because x is +ve, dL/df can be +ve or -ve, so all weights gradient will be all +ve or all -ve. So we want x to be mixture of +ve and -ve. c) exp() function is computationally expensive - not big issue though Tanh: Output range -1 to 1 [ Zero Centered !]