Skip to main content

Posts

Showing posts from April, 2022

NLP - Topic Modeling - LDA - Latent Dirichlet Allocation

 NLP - processing natural text (or speech) - find patterns  https://www.youtube.com/watch?v=xvqsFTUsOmc Can be used for  1. sentiment analysis - if a document/text is a positive response or a negative one. 2. topic modeling - finding topic(s) in a document, example in email, if it is financial, personal or project email. - a document can be mixture of topics - for example 80%financial, 15% project, 5% personal. 3. Text generation - Example: autocomplete - Markov Chains (only looks at previous state), LSTMs (look at a lot of previous states) 2. Topic Modeling Popular method - LDA (Latent Dirichlet Allocation) Latent = Hidden Dirichlet = Probability distribution A document or a text can be considered a distribution of topics. Identifying topic and it's percentage is done by LDA A topic can be a distribution of words. So, every document is a mix of topics and every topic is a mix of words LDA doesn't tell what is the topic. It just gives number of topics and percentage of topics i

Naive Bayes algorithm

Probability = Given the distribution, what's the probability of data occuring. Likelihood = Give the data, what's the likelihood that the distribution is correct? Or given data, what is the probability it belongs to a distribution Bayes' Theorem: https://www.youtube.com/watch?v=R13BD8qKeTg How is it different from Naive Bayes' Theorem? Naive Bayes is considered “Naive” because it makes an assumption that is virtually impossible to see in real-life data: the conditional probability is calculated as the pure product of the individual probabilities of components. This implies the absolute independence of features — a condition probably never met in real life. Lets take Naive Bayes on a intuition level to figure out what food tastes good together! Naive Bayes will make the following assumption: If you like Pickles, and you like Ice Cream, naive bayes will assume independence and give you a Pickle Ice Cream and think that you'll like it. Another way to look: https://www.

Null Hypothesis - Significance Testing

 Null Hypothesis: https://www.reddit.com/r/explainlikeimfive/comments/7q46fd/eli5_what_is_the_null_hypothesis/ Null Hypothesis basically says: Nothing is wrong - everything is fair - an event happened merely by chance https://inst.eecs.berkeley.edu/~cs174/sp08/lecs/lec10/lec10.pdf Let's say: You roll a die 5 times, everytime you get 6. You might think something is wrong with the die. So, The Null Hypothesis becomes: The die is fair. Now we do testing and see if it is actually fair. To prove this right/wrong we run a likelihood experiment. So let's say we try to test the hypothesis by running experiments. We take the data and see if it makes sense with the distribution we assumed. Let's say we assume normal distribution We have a value and we find out what's the probability that the value occurs, if it's really low, then we reject null hypothesis else we assume it's random p-value is the value that we get the probability that it occurred so if p value is 0.05 or