NLP - processing natural text (or speech) - find patterns
https://www.youtube.com/watch?v=xvqsFTUsOmc
Can be used for
1. sentiment analysis - if a document/text is a positive response or a negative one.
2. topic modeling - finding topic(s) in a document, example in email, if it is financial, personal or project email. - a document can be mixture of topics - for example 80%financial, 15% project, 5% personal.
3. Text generation - Example: autocomplete - Markov Chains (only looks at previous state), LSTMs (look at a lot of previous states)
2. Topic Modeling
Popular method - LDA (Latent Dirichlet Allocation)
Latent = Hidden
Dirichlet = Probability distribution
A document or a text can be considered a distribution of topics.
Identifying topic and it's percentage is done by LDA
A topic can be a distribution of words.
So, every document is a mix of topics and every topic is a mix of words
LDA doesn't tell what is the topic. It just gives number of topics and percentage of topics in a document.
For each topic it also gives distribution of words.
How many topics? Who decides? - you do during initialization
So we chose the number of topics.
LDA then goes through each word in a document and randomly assigns each word a topic.