Training

Efficient Hardware Scaling and Diminishing Returns in Large-Scale Training of Language Models

Transactions on Machine Learning Research, 2025.