My research focuses on studying biases in datasets and models. Good biases, such as structural inductive biases help language understanding - check out my PhD thesis on these. But biases can be undesirable, e.g. spurious correlations commonly found in crowd-sourced, large-scale datasets due to annotation artifacts, or social prejudices of human annotators and task designers (coming soon!).
I obtained my PhD from Carnegie Mellon University in May 2019, where I was advised by Noah Smith and Chris Dyer. During most of my PhD I was a visiting student at the University of Washington in Seattle.
Update I am looking for academic positions in Winter / Spring 2021!
|Nov 2, 2020||Was delighted to be an invited speaker for Responsible AI at the Microsoft E+D Product Leaders Conference.|
|Sep 22, 2020||Preprint for EMNLP acceptance Dataset Cartography is now available on ArXiv. Camera-ready version and code coming soon!|
|Sep 15, 2020||Paper titled Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics is now accepted to the Proceedings of EMNLP, and GDaug is accepted to Findings of EMNLP.|
|Aug 13, 2020||Completed one year as a postdoctoral investigator at AI2!|
|Jul 8, 2020||Our paper Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks received an Honorable Mention Award at ACL 2020!|