NLP - Topic Modeling - LDA - Latent Dirichlet Allocation

NLP - processing natural text (or speech) - find patterns

https://www.youtube.com/watch?v=xvqsFTUsOmc

Can be used for

1. sentiment analysis - if a document/text is a positive response or a negative one.

2. topic modeling - finding topic(s) in a document, example in email, if it is financial, personal or project email. - a document can be mixture of topics - for example 80%financial, 15% project, 5% personal.

3. Text generation - Example: autocomplete - Markov Chains (only looks at previous state), LSTMs (look at a lot of previous states)

2. Topic Modeling

Popular method - LDA (Latent Dirichlet Allocation)

Latent = Hidden

Dirichlet = Probability distribution

A document or a text can be considered a distribution of topics.

Identifying topic and it's percentage is done by LDA

A topic can be a distribution of words.

So, every document is a mix of topics and every topic is a mix of words

LDA doesn't tell what is the topic. It just gives number of topics and percentage of topics in a document.

For each topic it also gives distribution of words.

How many topics? Who decides? - you do during initialization

So we chose the number of topics.

LDA then goes through each word in a document and randomly assigns each word a topic.

SpaCy is like a good version of NLTK might replace it one day

So that's how NLP helps.

We are trying to find out why I like someone.

We do these techniques to find cluster the different movies into groups

Find out what you liked about the group

Then recommend some other from the same group

Algidus

Search This Blog

NLP - Topic Modeling - LDA - Latent Dirichlet Allocation