Swabha Swayamdipta

Postdoctoral Investigator • MOSAICAllen Institute for AI

My research focuses on studying biases in datasets and models. Good biases, such as structural inductive biases help language understanding - check out my PhD thesis on these. But biases can be undesirable, e.g. spurious correlations commonly found in crowd-sourced, large-scale datasets due to annotation artifacts, or social prejudices of human annotators and task designers (coming soon!).


I obtained my PhD from Carnegie Mellon University in May 2019, where I was advised by Noah Smith and Chris Dyer. During most of my PhD I was a visiting student at the University of Washington in Seattle.

Update I am looking for academic positions in Winter / Spring 2021!


Nov 2, 2020 Was delighted to be an invited speaker for Responsible AI at the Microsoft E+D Product Leaders Conference.
Sep 22, 2020 Preprint for EMNLP acceptance Dataset Cartography is now available on ArXiv. Camera-ready version and code coming soon!
Sep 15, 2020 Paper titled Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics is now accepted to the Proceedings of EMNLP, and GDaug is accepted to Findings of EMNLP.
Aug 13, 2020 Completed one year as a postdoctoral investigator at AI2!
Jul 8, 2020 Our paper Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks received an Honorable Mention Award at ACL 2020!