Shane Bergsma, Nolan Dey, Gurpreet Gosal, Gavia Gray, Daria Soboleva, Joel Hestness, Power Lines: Scaling Laws for Weight Decay and Batch Size in LLM Pre-training, Under review (arXiv, 2025)
[paper]
Shane Bergsma, Nolan Dey, Gurpreet Gosal, Gavia Gray, Daria Soboleva, Joel Hestness, Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs, ICLR, 2025
[paper]
Daria Soboleva, Faisal Al-Khateeb, Robert Myers, et al., SlimPajama: A 627B token cleaned and deduplicated version of RedPajama, 2023
[dataset][blog post][github]
Nolan Dey*, Daria Soboleva*, Faisal Al-Khateeb, et al., BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model, Efficient Natural Language and Speech Processing, NeurIPS, 2023 (* Equal contribution)
[model][paper][Davis Bialock's roundup][Jeremy Howard (fast.ai)]
Faisal Al-Khateeb, Nolan Dey, Daria Soboleva, Joel Hestness, Position Interpolation Improves ALiBi Extrapolation, arXiv, 2023
[paper]
Zhiqiang Shen, Tianhua Tao, Liqun Ma, Willie Neiswanger, Joel Hestness, Natalia Vassilieva, Daria Soboleva, Eric Xing, SlimPajama-DC: Understanding Data Combinations for LLM Training, arXiv, 2023
[paper]
Daria Soboleva, Ondrej Skopek, Márius Šajgalík, et al., Replacing Human Audio with Synthetic Audio for On-device Unspoken Punctuation Prediction, ICASSP, 2021
[paper]
Patents
Aleksandr Boymel, Daria Soboleva, Multi-phase training of machine learning models for search ranking, Filed, 2022 (Yandex)
[patent]