Detailed Calendar

Required and additional readings, to be updated (bi)weekly. Additional readings are not mandatory.

Introduction to Language Models

Lecture 1: Introduction Aug 21

Required Readings

None

Lecture 2: n-gram LMs I Aug 23

Required Readings

Jurafsky and Martin, Chap 3.1-3.3

Lecture 3: n-gram LMs II Aug 28

Required Readings

Jurafsky and Martin, Chap 3.4-3.7

Additional Readings

Mitchell, Chap 2, Estimating Probabilities

Early Neural Language Models

Lecture 4: Word Embeddings Aug 30

Required Readings

Jurafsky and Martin, Chap 6.1-6.7

Lecture 5: Word Embeddings II Sep 6

Required Readings

Jurafsky and Martin, Chap 6.8-6.12

Additional Readings

Mikolov et al., ICLR 2013. Efficient Estimation of Word Representations in Vector Space
Mikolov et al., NeurIPS 2013 Distributed Representations of Words and Phrases and their Compositionality
Jay Al Ammar. Illustrated Word2Vec

Lecture 6: Logistic Regression I Sep 11

Required Readings

Jurafsky and Martin, Chap 5

Lecture 7: Logistic Regression II Sep 13

Required Readings

Jurafsky and Martin, Chap 5

Lecture 8: Feedforward Neural Network Language Models Sep 18

Required Readings

Jurafsky and Martin, Chap 7

Lecture 9: Recurrent Neural Network Language Models Sep 20

Required Readings

Jurafsky and Martin, Chap 9.1-9.2

Modern Neural Language Models

Lecture 10: Sequence-to-Sequence and Attention Sep 25

Required Readings

Jurafsky and Martin, Chap 9.3.2-9.3.3; 9.7-9.8

Lecture 11: Transformers Building Blocks Sep 27

Required Readings

Jurafsky and Martin, Chap 10.1

Lecture 12: Invited Lecture - Language Grounding by Jesse Thomason Oct 2

Lecture 13: PyTorch for Transformers Oct 4

Additional Readings

Iyyer CS685 Spring 2023 Tokenization

Lecture 14: Transformer Building Blocks II Oct 16

Required Readings

Jurafsky and Martin, Chap 10.2

Large Language Models

Lecture 15: Pre-training Transformers Oct 18

Required Readings

Jurafsky and Martin, Chap 11.1-11.2

Lecture 16: Pretraining Transformers II Oct 23

Required Readings

Jurafsky and Martin, Chap 11.3

Lecture 17: Generating from Language Models Oct 25

Required Readings

Jurafsky and Martin, Chap 10.4

Lecture 18: Generating from Language Models II Oct 30

Required Readings

Jurafsky and Martin, Chap 13.5.2
Additional Readings
Holtzmann et al., 2020
WordPiece Modeling

Lecture 19: Generating from Language Models II Nov 1

Additional Readings

Lecture 20: LLMs: Limitations and Harms Nov 6

Additional Readings

Lecture 21: RLHF Nov 8

Additional Readings

Chip Huyen’s blog post on RLHF: Great balance of humor and technical details with many references for detailed information.
HuggingFace Blog Post: Illustrating RLHF by Nathan Lambert et al.: mainly focuses on the RLHF algorithm itself, providing a brief history of RL and sharing seminal work that led to RLHF and practical tools for using RLHF.
Argilla Blog Post: Finetuning an LLM: RLHF and alternatives
Yoav Goldberg’s post: Hypotheses on why RLHF works.
Proximal Policy Optimization (PPO): The Key to LLM Alignment: more detail on the PPO algorithm and how it improves on previous RL algorithms.