Deep Learning For Natural Language Processing

Term: Summer Semester

Time: Tuesdays, 13:30-15:20 AM

Course Overview

This course covers both foundation and up-to-date methodologies for Natural Language Processing (NLP), that today build the backbone of popular AI tools. Starting from the basics (mathematics of deep learning, backpropagation etc.), you will learn how more and more advanced models can understand natural language. By the end of this course, you will be gain understanding of:

Key machine learning paradigms and concepts
Both basic and advanced machine learning algorithms/models applied to NLP
Evaluation, optimisation, and comparison of NLP models
Applying NLP models and techniques to real-world problems

Prerequisites

Basic knowledge of linear algebra and calculus
Programming experience in Python
Probability and statistics fundamentals

Material

Available publicly on YouTube.
Other material (slides, exercises, homeworks, and exam solutions etc.) on Moodle.

Schedule

Week	Date	Topic
1	Apr 22	NLP tasks and evaluation Course Logistics Definition of text classification and generation task Evaluation
2	Apr 29	Mathematical foundations of deep learning Function minimization Efficient computation of gradients Stochastic Gradient Descent (SGD)
3	May 6	Log-linear models Ambiguity of human language Fundamentals of vectors and linear functions Binary classification Tokenization Bag-Of-Words (BoW) and Byte-Pair-Econding (BPE) Nonlinear mapping and sigmoid function Log-linear model (Logistic Regression)
4	May 13	Deep Neural Networks Loss functions and Binary Cross-Entropy Online and Minibatch SGD Multi-class classification Continuous Bag-Of-Words (CBOW) Softmax and Temperature Categorical Cross-Entropy Loss Linearity and non-linearity in neural networks ReLU
5	May 20	Language models and word embeddings Fundamentals of probability Classic Language Model Markov chain property for word probability Maximum Likelihood Estimation Perplexity Neural language models Word vector lookup Basic text generation with neural LM Greedy decoding and sampling
6	May 27	Learning Word Embeddings Dot product among vectors and cosine similarity Distributional Hypothesis Negative sampling Word2Vec (CBOW and Skip-gram variants) FastText Limits of cosine similarity
7	Jun 3	Recurrent neural networks (RNNs) RNN abstraction States and outputs Acceptor and encoder RNNs Bidirectional RNN Simple RNN (Elmann network) Vanishing/exploding gradient Gates (hard and soft) LSTMs (input, keep, and forget gates; solving the vanishing gradient)
8	Jun 10	Autoregressive encoder-decoder with RNNs and attention NLP “sequence” tasks (classification, labeling, generation) The issue of variable length generation PAD and EOS tokens as “dirty” solutions Encoder-Decoder architecture Teacher forcing Fundamentals of attention (formalization; explainability; generalization, calculation) Cross vs self attention
9	Jun 24	Transformers, Self-attention and BERT (double lecture) Motivation for the Transformer architecture Contextualized text representations The encoder block Scaled dot-product Multi-head attention Parallelizing MHA Residual connection Positional embeddings Encoder vs Decoder block Transfer learning BERT Pretraining objectives of BERT (MLM, NSP) On LM development in academia vs industry Model complexity and explainability Finetuning Decoder heads in BERT Finetuning tasks Pretraining variants Pretrained LMs architectures
10	Jul 1	Decoder-only Models and GPT Types of Transformers (encoder-decoder, encoder-only, decoder-only) Attention masks Full, prefix and masked language modelling Autoregressive decoder-only transformers (GPT-2) Zero-shot, one-shot, and few-shot learning In-context-learning Zero-shot, one-shot and few-shot prompting Hallucinations Brief intro on reasoning and LLMs Data contamination Continuous prompts
11	Jul 8	Contemporary LLMs and Explainability Instruction tuning RLHF Toolformer Motivations for Explainable AI Elements of Explainable AI Local VS global explanations Ante-hoc and post-hoc explanations Saliency VS Textual explanations Evaluating explanations Cognitive biases in humans Wrong agreement Likeability-Effectiveness tradeoff Fairwashing Manipulation bouncer problem Antropomorphization of AI
12	Jul 15	Exam simulation
13	Jul 22	Guest lecture (Mechanistic interpretability, Privacy and Security in LLMs, Culture-aware LLMs) Do Language Models Dream of Electric Clocks? (Federico Tiblias, UKP lab) Privacy and Security in Large Language Models (Anmol Goel, UKP Lab) Global Voices, Responsible Models; Cultural Evaluation and Adaptation in LLMs (Cecilia Liu, UKP lab)