Tag: training pipeline
Checkpoint Averaging and EMA: Stabilizing Large Language Model Training
Checkpoint averaging and EMA stabilize large language model training by combining model snapshots to improve performance and reduce variance - delivering 1-2% gains with minimal overhead. Now standard for models over 1B parameters.
Read more