Deep Learning For Natural Language Processing

Complete crash-course on Natural Language Processing. From basic text classification, all the way to Large Language Models, Reinforcement Learning From Human Feedback etc.

Term: Summer Semester

Time: Tuesdays, 13:30-15:20 AM

Course Overview

This course covers both foundation and up-to-date methodologies for Natural Language Processing (NLP), that today build the backbone of popular AI tools. Starting from the basics (mathematics of deep learning, backpropagation etc.), you will learn how more and more advanced models can understand natural language. By the end of this course, you will be gain understanding of:

  • Key machine learning paradigms and concepts
  • Both basic and advanced machine learning algorithms/models applied to NLP
  • Evaluation, optimisation, and comparison of NLP models
  • Applying NLP models and techniques to real-world problems

Prerequisites

  • Basic knowledge of linear algebra and calculus
  • Programming experience in Python
  • Probability and statistics fundamentals

Material

  • Available publicly on YouTube.
  • Other material (slides, exercises, homeworks, and exam solutions etc.) on Moodle.

Schedule

Week Date Topic
1 Apr 22 NLP tasks and evaluation
  • Course Logistics
  • Definition of text classification and generation task
  • Evaluation
2 Apr 29 Mathematical foundations of deep learning
  • Function minimization
  • Efficient computation of gradients
  • Stochastic Gradient Descent (SGD)
3 May 6 Log-linear models
  • Ambiguity of human language
  • Fundamentals of vectors and linear functions
  • Binary classification
  • Tokenization
  • Bag-Of-Words (BoW) and Byte-Pair-Econding (BPE)
  • Nonlinear mapping and sigmoid function
  • Log-linear model (Logistic Regression)
4 May 13 Deep Neural Networks
  • Loss functions and Binary Cross-Entropy
  • Online and Minibatch SGD Multi-class classification
  • Continuous Bag-Of-Words (CBOW)
  • Softmax and Temperature
  • Categorical Cross-Entropy Loss
  • Linearity and non-linearity in neural networks
  • ReLU
5 May 20 Language models and word embeddings
  • Fundamentals of probability
  • Classic Language Model
  • Markov chain property for word probability
  • Maximum Likelihood Estimation
  • Perplexity
  • Neural language models
  • Word vector lookup
  • Basic text generation with neural LM
  • Greedy decoding and sampling
6 May 27 Learning Word Embeddings
  • Dot product among vectors and cosine similarity
  • Distributional Hypothesis
  • Negative sampling
  • Word2Vec (CBOW and Skip-gram variants)
  • FastText
  • Limits of cosine similarity
7 Jun 3 Recurrent neural networks (RNNs)
  • RNN abstraction
  • States and outputs
  • Acceptor and encoder RNNs
  • Bidirectional RNN
  • Simple RNN (Elmann network)
  • Vanishing/exploding gradient
  • Gates (hard and soft)
  • LSTMs (input, keep, and forget gates; solving the vanishing gradient)
8 Jun 10 Autoregressive encoder-decoder with RNNs and attention
  • NLP “sequence” tasks (classification, labeling, generation)
  • The issue of variable length generation
  • PAD and EOS tokens as “dirty” solutions
  • Encoder-Decoder architecture
  • Teacher forcing
  • Fundamentals of attention (formalization; explainability; generalization, calculation)
  • Cross vs self attention
9 Jun 24 Transformers, Self-attention and BERT (double lecture)
  • Motivation for the Transformer architecture
  • Contextualized text representations
  • The encoder block
  • Scaled dot-product
  • Multi-head attention
  • Parallelizing MHA
  • Residual connection
  • Positional embeddings
  • Encoder vs Decoder block
  • Transfer learning
  • BERT
  • Pretraining objectives of BERT (MLM, NSP)
  • On LM development in academia vs industry
  • Model complexity and explainability
  • Finetuning
  • Decoder heads in BERT
  • Finetuning tasks
  • Pretraining variants
  • Pretrained LMs architectures
10 Jul 1 Decoder-only Models and GPT
  • Types of Transformers (encoder-decoder, encoder-only, decoder-only)
  • Attention masks
  • Full, prefix and masked language modelling
  • Autoregressive decoder-only transformers (GPT-2)
  • Zero-shot, one-shot, and few-shot learning
  • In-context-learning
  • Zero-shot, one-shot and few-shot prompting
  • Hallucinations
  • Brief intro on reasoning and LLMs
  • Data contamination
  • Continuous prompts
11 Jul 8 Contemporary LLMs and Explainability
  • Instruction tuning
  • RLHF
  • Toolformer
  • Motivations for Explainable AI
  • Elements of Explainable AI
  • Local VS global explanations
  • Ante-hoc and post-hoc explanations
  • Saliency VS Textual explanations
  • Evaluating explanations
  • Cognitive biases in humans
  • Wrong agreement
  • Likeability-Effectiveness tradeoff
  • Fairwashing
  • Manipulation
  • bouncer problem
  • Antropomorphization of AI
12 Jul 15 Exam simulation
13 Jul 22 Guest lecture (Mechanistic interpretability, Privacy and Security in LLMs, Culture-aware LLMs)
  • Do Language Models Dream of Electric Clocks? (Federico Tiblias, UKP lab)
  • Privacy and Security in Large Language Models (Anmol Goel, UKP Lab)
  • Global Voices, Responsible Models; Cultural Evaluation and Adaptation in LLMs (Cecilia Liu, UKP lab)