Course Outline
Detailed training outline
- Introduction to NLP
- Understanding NLP
- NLP Frameworks
- Commercial applications of NLP
- Scraping data from the web
- Working with various APIs to retrieve text data
- Working and storing text corpora saving content and relevant metadata
- Advantages of using Python and NLTK crash course
- Practical Understanding of a Corpus and Dataset
- Why do we need a corpus?
- Corpus Analysis
- Types of data attributes
- Different file formats for corpora
- Preparing a dataset for NLP applications
- Understanding the Structure of a Sentences
- Components of NLP
- Natural language understanding
- Morphological analysis - stem, word, token, speech tags
- Syntactic analysis
- Semantic analysis
- Handling ambigiuty
- Text data preprocessing
- Corpus- raw text
- Sentence tokenization
- Stemming for raw text
- Lemmization of raw text
- Stop word removal
- Corpus-raw sentences
- Word tokenization
- Word lemmatization
- Working with Term-Document/Document-Term matrices
- Text tokenization into n-grams and sentences
- Practical and customized preprocessing
- Corpus- raw text
- Analyzing Text data
- Basic feature of NLP
- Parsers and parsing
- POS tagging and taggers
- Name entity recognition
- N-grams
- Bag of words
- Statistical features of NLP
- Concepts of Linear algebra for NLP
- Probabilistic theory for NLP
- Vectorization
- Encoders and Decoders
- Normalization
- Probabilistic Models
- Advanced feature engineering and NLP
- Basics of word2vec
- Components of word2vec model
- Logic of the word2vec model
- Extension of the word2vec concept
- Application of word2vec model
- Case study: Application of bag of words: automatic text summarization using simplified and true Luhn's algorithms
- Basic feature of NLP
- Document Clustering, Classification and Topic Modeling
- Document clustering and pattern mining (hierarchical clustering, k-means, clustering, etc.)
- Comparing and classifying documents using TFIDF, Jaccard and cosine distance measures
- Document classifcication using Naïve Bayes and Maximum Entropy
- Identifying Important Text Elements
- Reducing dimensionality: Principal Component Analysis, Singular Value Decomposition non-negative matrix factorization
- Topic modeling and information retrieval using Latent Semantic Analysis
- Entity Extraction, Sentiment Analysis and Advanced Topic Modeling
- Positive vs. negative: degree of sentiment
- Item Response Theory
- Part of speech tagging and its application: finding people, places and organizations mentioned in text
- Advanced topic modeling: Latent Dirichlet Allocation
- Case studies
- Mining unstructured user reviews
- Sentiment classification and visualization of Product Review Data
- Mining search logs for usage patterns
- Text classification
- Topic modelling
Knowledge and awareness of NLP principals and an appreciation of AI application in business
21 Hours
Testimonials
