An illustration of data distributions. Task data is comprised of an observable task distribution, usually non-randomly sampled from a wider distribution (light grey ellipsis) within an even larger target domain, which is not necessarily one of the domains included in the original LM pretraining domain – though overlap is possible. We explore the benefits of continued pretraining on data from the task distribution and the domain distribution.

Our paper was nominated as a honorable mention for the best paper award at ACL 2020.

@inproceedings{gururangan-etal-2020-dont,
title = "Don{'}t Stop Pretraining: Adapt Language Models to Domains and Tasks",
author = "Gururangan, Suchin  and Marasovi{\'c}, Ana  and Swayamdipta, Swabha  and
Lo, Kyle  and Beltagy, Iz  and Downey, Doug  and Smith, Noah A.",
booktitle = "Proc. of the 58th Meeting of the Association for Computational Linguistics",
month = jul,
year = "2020",