Detailed Calendar

Required and additional readings, to be updated (bi)weekly. Additional readings are not mandatory.

Datasets in NLP

Weeks 1 and 2

Aug 22 Lecture Introduction, Historical Perspective and Overview

Additional Readings

Fair ML Book Chapter 7. Datasets
Sambasivan et al., 2021: “Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI

Aug 24 Lecture Data Collection and Data Ethics

Additional Readings

Paullada et al., 2021 Data and its (dis)contents
Raji et al., 2022 Ethical Challenges of Data Collection & Use in Machine Learning Research

Aug 29 More on Collection

Deng et al., 2009 ImageNet: A large-scale hierarchical image database
Kwiatkowski et al., 2019 Natural Questions: A Benchmark for Question Answering Research
Sakaguchi et al., 2019 WinoGrande: An Adversarial Winograd Schema Challenge at Scale

Additional Readings

Bowman et al. 2015 A large annotated corpus for learning natural language inference
Nie et al., 2020 Adversarial NLI: A New Benchmark for Natural Language Understanding

Aug 31 More on Data Ethics

Bender et al., 2021 On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜
Koch et al., 2021 Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research

Additional Readings

Klein and D’Ignazio, 2020. Data Feminism Book: Intro and Chapter 1
Strubell et al., 2019 Energy and Policy Considerations for Deep Learning in NLP

Biases and Mitigation

Weeks 3, 4 and 5

Sep 7 Lecture Biases: An Overview

Additional Readings

Geirhos et al., 2020 Shortcut Learning in Deep Neural Networks
Hort et al., 2022 Bias Mitigation for Machine Learning Classifiers: A Comprehensive Survey
Feder et al., 2021 Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond

Sep 12 Spurious Biases I

Torralba & Efros, 2011 Unbiased Look at Dataset Bias
Geva et al., 2019 Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets
McCoy et al., 2019 Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in NLI

Sep 14 Spurious Biases II

Gardner et al., 2021 Competency Problems: On Finding and Removing Artifacts in Language Data
Eisenstein, 2022 Informativeness and Invariance: Two Perspectives on Spurious Correlations in Natural Language

Sep 19 Data-Centric Bias Mitigation

Srivastava et al., 2020 Robustness to spurious correlations via human annotations
Dixon et al., 2018 Measuring and mitigating unintended bias in text classification
Gardner et al., 2019 On Making Reading Comprehension More Comprehensive

Sep 21 Data Augmentation for Bias Mitigation

Ng et al., 2020 SSMBA: Self-Supervised Manifold Based Data Augmentation for Improving O.O.D. Robustness
Kaushik et al., 2019 Learning the Difference that Makes a Difference with Counterfactually-Augmented Data

Project Proposal due latest by 11:59 PM PT.

Estimating Data Quality

Weeks 6, 7 and 8

Sep 26 Lecture Estimates of Data Quality

Additional Readings

Le Bras et al., 2020 Adversarial Filters of Dataset Biases
Swayamdipta et al., 2020 Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics
Liu et al., 2022 WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation
Ethayarajh et al., 2022 Understanding Dataset Difficulty with V-Usable Information

Sep 28 Aggregate vs. Point-wise Estimates of Data Quality

Ghorbani & Zou, 2019 Data Shapley: Equitable Valuation of Data for Machine Learning
Perez et al., 2021 Rissanen Data Analysis: Examining Dataset Characteristics via Description Length
Mindermann et al., 2022 Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt

Oct 3 Anomalies, Outliers, and Out-of-Distribution Examples

Hendrycks et al., 2018 Deep Anomaly Detection with Outlier Exposure
Ren et al., 2019 Likelihood Ratios for Out-of-Distribution Detection

Oct 5 Disagreements, Subjectivity and Ambiguity I

Pavlick et al., 2019 Inherent Disagreements in Human Textual Inferences
Röttger et al., 2022 Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks
Denton et al., 2021 Whose Ground Truth? Accounting for Individual and Collective Identities Underlying Dataset Annotation

Oct 12 Disagreements, Subjectivity and Ambiguity II

Miceli et al., 2020 Between Subjectivity and Imposition: Power Dynamics in Data Annotation for Computer Vision
Davani et al., 2021 Dealing with Disagreements: Looking Beyond the Majority Vote in Subjective Annotations

Data for Accountability

Weeks 9 and 10

Oct 17 Creating Evaluation Sets

Recht et al., 2019 Do ImageNet Classiers Generalize to ImageNet?
Card et al., 2020 With Little Power Comes Great Responsibility
Clark et al. 2021 All That’s ‘Human’ Is Not Gold: Evaluating Human Evaluation of Generated Text

Additional Readings

Ethayarajh & Jurafsky, 2020 Utility is in the eye of the user: a critique of NLP leaderboards

Oct 19 Counterfactual Evaluation

Gardner et al., 2020 Evaluating Models’ Local Decision Boundaries via Contrast Sets
Ross et al., 2021 Tailor: Generating and Perturbing Text with Semantic Controls

Oct 24 Adversarial Evaluation

Jia and Liang, 2017 Adversarial Examples for Evaluating Reading Comprehension Systems
Kiela et al., 2021 Dynabench: Rethinking Benchmarking in NLP
Li and Michael, 2022 Overconfidence in the Face of Ambiguity with Adversarial Data

Oct 26 Contextualizing Decisions

Gebru et al., 2018 Datasheets for Datasets
Bender and Friedman, 2018 Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science

Oct 28

Project Proposal due latest by 11:59 PM PT.

Beyond Labeled Datasets

Weeks 11, 12 and 13

Oct 31 Unlabeled Data

Dodge et al., 2021 Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus
Lee et al., 2022 Deduplicating Training Data Makes Language Models Better
Gururangan et al., 2022 Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection

Nov 2 Prompts as Data?

Wei et al., 2022 Chain of Thought Prompting Elicits Reasoning in Large Language Models

Nov 7 Data Privacy and Security

Amodei et al., 2016 Concrete Problems in AI Safety
Carlini et al., 2020 Extracting Training Data from Large Language Models
Henderson et al., 2022 Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset

Nov 9 Towards Better Data Citizenship

Jo & Gebru, 2019 Lessons from Archives: Strategies for Collecting Sociocultural Data in Machine Learning
Hutchinson et al., 2021 Towards Accountability for Machine Learning Datasets: Practices from Software Engineering and Infrastructure

Detailed Calendar

Datasets in NLP

Aug 22 Lecture Introduction, Historical Perspective and Overview

Additional Readings

Aug 24 Lecture Data Collection and Data Ethics

Additional Readings

Aug 29 More on Collection

Additional Readings

Aug 31 More on Data Ethics

Additional Readings

Biases and Mitigation

Sep 7 Lecture Biases: An Overview

Additional Readings

Sep 12 Spurious Biases I

Sep 14 Spurious Biases II

Sep 19 Data-Centric Bias Mitigation

Sep 21 Data Augmentation for Bias Mitigation

Estimating Data Quality

Sep 26 Lecture Estimates of Data Quality

Additional Readings

Sep 28 Aggregate vs. Point-wise Estimates of Data Quality

Oct 3 Anomalies, Outliers, and Out-of-Distribution Examples

Oct 5 Disagreements, Subjectivity and Ambiguity I

Oct 12 Disagreements, Subjectivity and Ambiguity II

Data for Accountability

Oct 17 Creating Evaluation Sets

Additional Readings

Oct 19 Counterfactual Evaluation

Oct 24 Adversarial Evaluation

Oct 26 Contextualizing Decisions

Oct 28

Beyond Labeled Datasets

Oct 31 Unlabeled Data

Nov 2 Prompts as Data?

Nov 7 Data Privacy and Security

Nov 9 Towards Better Data Citizenship

Outro and Presentations

Nov 14 Lecture Outro

Nov 16 Project Presentations

Nov 21 Project Presentations

Nov 28 Project Presentations

Nov 30 Project Presentations

Dec 7