Algidus

Posts

Transpose Convolution or Deconvolution

Transpose Convolution Explained Source: Primary Read More Upsampling vs Transpose Convolution : https://stackoverflow.com/questions/48226783/ Upsampling: [Not trainable] In keras, Upsampling, provided you use tensorflow backend, what actually happens is keras calls tensorflow resize_images function, which essentially is an interpolation and not trainable. Like resizing. For Caffe: https://stackoverflow.com/questions/40872914/ layer { name: "upsample" , type: "Deconvolution" bottom: "{{bottom_name}}" top: "{{top_name}}" convolution_param { kernel_size: {{2 * factor - factor % 2}} stride: {{factor}} num_output: {{C}} group: {{C}} pad: {{ceil((factor - 1) / 2.)}} weight_filler: { type: "bilinear" } bias_term: false } param { lr_mult: 0 decay_mult: 0 } } By specifying num_output: {{C}} group: {{C}}, it behaves as channel-wise convolution. The filter shape of this deconvolu...

Weight Initialization

Weight Initialization Why do we initialize random weights? (Why don't we start with all zero or all equal weights?) Because the update will be done equally across all neurons. Why do we initialize with small weights? (Why not large weights?) And are there any problems with too small weights? There are problems with too small weights - also depends on activation, say we have tanh So as we move forward through networks - mean is kind of zero (because tanh) but the SD starts diminishing.. ultimately at the last layers weights are tooo small and final output is almost zero. and in backprop the gradients are also small so weights won't update - vanishing gradient For higher weights - will fall in the saturation region, thus same thing at saturation region, the gradient is low. - but what about ReLU ------------------------------- So based on activation Function Weight Initialization must be different: https://machinelearningmastery.com/weight-initialization-for-deep-learning-neura...

Implement Neural Network

Implementing Neural Network All the features of input are weighted to obtain one output, if we want more outputs we need to increase weight accordingly, so if we have D features (as row/width) in input number of weights will be D (the column/height of Weight Matrix if the implementation is considered). If we want more outputs we need to add more columns. https://www.draw.io/ File Depending on where you keep your input or structure it, output weight matrix dimension or structure becomes accordingly. There are two cases: Case 1: Y = X * W [This is the case the above figure considers] Where each row of X is a single data and thus number of columns of X denotes the number of input features. Then, Size of Weight Matrix = Feature Length of Input x Feature Length of Output W = Input x Output So in the output i.e. Y, each row is a single data, and number of output features = number of columns. Case 2: Y = W * X [https://medium.com/usf-msds/deep-learning-best-practic...

Activation Functions in Convolutional Neural Network / Data Preprocessing

Slides from Stanford cs231n - Lec 6 Sigmoid: Output Range : 0 (for -inf) to 1 (for inf) Problem: a) Saturation - At saturation the gradient is zero - so weights will stop updating - or update very slowly. So stop updating may mean the model is perfect - fits training data perfectly. But say on initialization the neuron was saturated - then problem! Thus saturation is a problem because it erodes the plasticity of neural networks and usually results in worse test performance. Like overfit. [ Source ] b) Outputs are not zero centered - across y axis. [output range is 0-1] Time 12:00 So gradient will always be all positive or all negative dL/df * df/dw, L = f(wx+b), f = wx + b, df/dw is always +ve because x is +ve, dL/df can be +ve or -ve, so all weights gradient will be all +ve or all -ve. So we want x to be mixture of +ve and -ve. c) exp() function is computationally expensive - not big issue though Tanh: Output range -1 to 1 [ Zero Cent...

Number of Filters in CNN

Let's say we have 3 channels in input [layer] and 64 channels/feature maps in output [layer]. How are 3 channels converted to 64 channels? How many filters do we need? To produce each / one feature map in output you will need 3 filters - one for each input channel. R - (one filter conv) - | G - (one filter conv) - | --> (summed after conv)--> | one output B - (one filter conv) - | So to generate 64 channels/feature maps in output, you'll need 3x64 filters/kernels. Thus, simply, Number of Filters = No. of Input Channels x No. of Output Channels. Or, You can think the filter is not 2D at all. Say we have an input of HxWxN, where HxW is the height and width of the image and N is the depth or number of channels, so our filter will be AxBxN, so not a 2D but an n-D filter. And we do the n-D convolution (basically a dot product of each pixel, whatever the dimension). [PS: Each filter has a bias, so AxNxN + 1 = number for params for 1 output] [Source]

Mutually Exclusive Vs. Statistically Independent

Source: https://math.stackexchange.com/questions/941150/what-is-the-difference-between-independent-and-mutually-exclusive-events Summary: Mutually Exclusive: Events cannot happen at the same time. [In one experiment] Statistically Independent: Occurrence of one event doesn't affect other. [In two experiments] Mutually Exclusive: [Wikipedia] In logic and probability theory, two events (or propositions) are mutually exclusive or disjoint if they cannot both occur at the same time(be true). A clear example is the set of outcomes of a single coin toss, which can result in either heads or tails, but not both. Statistically Independent: When two events are said to be independent of each other, what this means is that the probability that one event occurs in no way affects the probability of the other event occurring. An example of two independent events is as follows; say you rolled a die and flipped a coin. The probability of getting any number face on the die in ...

Setting Up Sublime Text 3 for Python (2 and 3) Development

This is the continuation of this tutorial: http://algidus.blogspot.com/2017/12/installing-python-3-and-2-in-windows-10.html We will now set up Sublime Text 3 to run codes of both Python 2 and 3. First Download Sublime Text 3: https://www.sublimetext.com/3 By Default, the "Build System" of Sublime Text 3 is set to "Automatic" . To change the "Build System" , go to "Menu" Bar (Press "Alt" if it's not visible) and select "Tools" and "Build System" . Then select "Python" from the options. In the previous tutorial, we kept Python 3 as our default Python Version (By setting its path in the "Environmental Variables") . So, in Sublime Text 3, when you select "Python" in the "Build System", you'll be running the code in "Python 3". Now we're going to add a Custom Build System for "Python 2". To add a Build System for "Python 2...

Installing Python 3 and 2 in Windows 10 (x64)

This is a tutorial on how to install Python 3 and Python 2 on Windows 10 (x64) and how to use them independently and interchangeably. Download both versions of python from the official website: https://www.python.org/downloads For this tutorial I downloaded: Python 3: https://www.python.org/ftp/python/3.6.3/python-3.6.3-amd64.exe Python 2: https://www.python.org/ftp/python/2.7.14/python-2.7.14.amd64.msi You can either install Python 3 or Python 2 first. The idea behind running both versions independently and not interfering with each other is to set the path of one of them in the Environment Variables and other not. This will be clear later in this tutorial. For me, I wanted to set Python 3 as my primary python (i.e. set in the PATH of Environment Variables) so, I installed it first. Installing Python 3 1. Open the downloaded Python 3 exe file. 2. [For Default Installation] We will be installing it in the default location [If you want to install it in your desired lo...