Get in Touch

Course Outline

Comprehensive training syllabus

  1. Introduction to NLP
    • Understanding NLP
    • NLP Frameworks
    • Commercial applications of NLP
    • Web data scraping
    • Leveraging various APIs for text data retrieval
    • Managing and storing text corpora, including content and relevant metadata
    • Benefits of using Python and a crash course on NLTK
  2. Practical Insights into Corpora and Datasets
    • The necessity of corpora
    • Corpus analysis
    • Types of data attributes
    • Different file formats for corpora
    • Preparing datasets for NLP applications
  3. Understanding Sentence Structure
    • Components of NLP
    • Natural language understanding
    • Morphological analysis - stems, words, tokens, speech tags
    • Syntactic analysis
    • Semantic analysis
    • Handling ambiguity
  4. Text Data Preprocessing
    • Corpus - raw text
      • Sentence tokenization
      • Stemming for raw text
      • Lemmatization of raw text
      • Stop word removal
    • Corpus - raw sentences
      • Word tokenization
      • Word lemmatization
    • Working with Term-Document/Document-Term matrices
    • Tokenizing text into n-grams and sentences
    • Practical and customized preprocessing
  5. Analyzing Text Data
    • Basic features of NLP
      • Parsers and parsing
      • POS tagging and taggers
      • Named entity recognition
      • N-grams
      • Bag of words
    • Statistical features of NLP
      • Linear algebra concepts for NLP
      • Probabilistic theory for NLP
      • TF-IDF
      • Vectorization
      • Encoders and Decoders
      • Normalization
      • Probabilistic Models
    • Advanced feature engineering and NLP
      • Word2vec fundamentals
      • Components of the word2vec model
      • Logic behind the word2vec model
      • Extending the word2vec concept
      • Applications of the word2vec model
    • Case study: Applying the bag of words model for automatic text summarization using simplified and authentic Luhn algorithms
  6. Document Clustering, Classification, and Topic Modeling
    • Document clustering and pattern mining (hierarchical clustering, k-means, clustering, etc.)
    • Comparing and classifying documents using TFIDF, Jaccard, and cosine distance measures
    • Document classification using Naïve Bayes and Maximum Entropy
  7. Identifying Key Text Elements
    • Dimensionality reduction: Principal Component Analysis, Singular Value Decomposition, non-negative matrix factorization
    • Topic modeling and information retrieval using Latent Semantic Analysis
  8. Entity Extraction, Sentiment Analysis, and Advanced Topic Modeling
    • Positive vs. negative: degree of sentiment
    • Item Response Theory
    • Part of speech tagging and its application: identifying people, places, and organizations mentioned in text
    • Advanced topic modeling: Latent Dirichlet Allocation
  9. Case studies
    • Mining unstructured user reviews
    • Sentiment classification and visualization of Product Review Data
    • Mining search logs for usage patterns
    • Text classification
    • Topic modelling

Requirements

Foundational knowledge and awareness of NLP principles, along with an understanding of how AI is applied within business contexts.

 21 Hours

Number of participants


Price per participant

Testimonials (1)

Upcoming Courses

Related Categories