Gradient Localization Improves Lifelong Pretraining of Language Models

Jared Fernandez
Jared Fernandez

PhD student at CMU LTI working on ML efficiency.