You're listening to:

Machine Learning Guide

19. Natural Language Processing 2

600x600bb

19. Natural Language Processing 2

July 11th, 2017

Natural Language Processing classical/shallow algorithms. ## Episode - Edit distance: Levenshtein distance - Stemming/lemmatization: Porter Stemmer - N-grams, Tokens: regex - Language models ** Machine translation, spelling correction, speech recognition - Classification / Sentiment Analysis: SVM, Naive bayes - Information Extraction (POS, NER): Models: MaxEnt, Hidden Markov Models (HMM), Conditional Random Fields (CRF) - Generative vs Discriminative models ** Generative: HMM, Bayes, LDA ** Discriminative: SVMs, MaxEnt / LogReg, ANNs ** Pros/Cons ** Generative depends on fewer data (NLP tends to be few data) ** MaxEnt vs Naive Bayes: Independence assumption of Bayes, etc ("Hong" "Kong") - Topic Modeling and keyword extraction: Latent Dirichlet Allocation (LDA) ** LDA ~= LSA ~= LSI: Latent diriclet allocation, latent semantic indexing, latent semantic analysis - Search / relevance / document-similarity: Bag-of-words, TF-IDF - Similarity: Jaccard, Cosine, Euclidean ## Resources - Speech and Language Processing (http://amzn.to/2uZaNyg) - Stanford NLP YouTube (https://www.youtube.com/playlist?list=PL6397E4B26D00A269) ** Setup youtube-dl (https://github.com/rg3/youtube-dl) and run `youtube-dl -x --audio-format mp3 https://www.youtube.com/playlist?list=PL6397E4B26D00A269` - NLTK Book (http://www.nltk.org/book)