Skip to main content

Posts

Showing posts from February, 2019

Functions Vs PDF

Functions Vs Probability Distribution Functions (PDF): Is the neural network learning a function or a pdf? PDF: PDF is also a function but with certain restrictions and rules. Example: The input of PDF is restricted - possible inputs can only be taken from a sample space ( containing possible values of Random Variable (RV)) Input is finite [bounded] so the sum of output of those finite input is 1. [https://stats.stackexchange.com/questions/347431] Strictly speaking, neural networks are fitting a non-linear function. They can be interpreted as fitting a probability density function if suitable activation functions are chosen and certain conditions are respected (Values must be positive and ≤ 1, etc...). But that is a question of how you choose to interpret their output, not of what they are actually doing. Under the hood, they are still non-linear function estimators, which you are choosing to apply to the specific problem of PDF estimation. Classifier

Frequentist Vs Byesian Probability

Probability and Information theory [Goodfellow Book]: Quantifying uncertainity and then derviving new uncertain statements. Information Theory: Quantify the amount of uncertainity in a probability distribution. Probability Theory: Make uncertain statements and reason in presence of uncertainity. Uncertain: Means you don't know anything, doesn't have a probability distribution. Stochastic (non-deterministic): Means it changes in the way that are fully predictible. Stochasticity = Uncertainity + Probability Distribution Probability: Frequentist Probability: Repeated experiment 'inf' times, then, 40% of the experiments will have such. Bayesian Probability: But what is experiment not repeatable? Like diagnosing a patient with flu / finding if the sun has exploded. Here we use probability as a degree of belief (Qualitative)

Maximum Likelihood Estimation

https://stats.stackexchange.com/questions/2641/what-is-the-difference-between-likelihood-and-probability#2647 Maximum Likelihood Estimation Maximum likelihood is like regression. Given the data, we try to find the best probability distribution for it, i.e. best parameters of the distribution. Likelihood (theta | Data) = P (Data | theta) [ http://complx.me/2017-01-22-mle-linear-regression/ ][ http://complx.me/2017-01-22-mle-linear-regression/ ] Statistics: Statistics is not a statement about individuals - it is a statement about the parameters of the distribution that was used to model the 'data'. Even among those studies we began with, some might be wrong. Statistics is summarizing the data. Maximum Likelihood: MLE is finding the 'model' / 'distribution' that is 'more consistent' with your data . - Say you have a data and you are willing to assume it's gaussian - or a combination / mixture of gaussian. But there are infinite number of

Numpy Axis

Numpy Axis: Source

Batch Normalization

[Mini] Batch Normalization:         First, need to read - Weight initialization - "Previous Post" We want unit gaussian activations (outputs)? We just make them so [forcefully]. We have N x D, N = data or current batch (size) or number of training examples in a forward pass, D = dimension of each data or number of features We compute the empirical mean and variance independently for each dimension i.e. each feature. - For Fully Connected Layers [This is highly dependent on layer type - mainly Fully Connected Layers or Convolutional Layers as we'll see later] We compute this over batch, our current mini-batch that we have. Usually inserted after fully connected or convolutional layers and before nonlinearity. --> FC --> BN --> tanh --> FC --> BN --> tanh Scaling the input connected to each neuron, each neuron is a feature. We can apply this the same way to fully connected network, the only difference is that, with