Momentum

Muon is a second-order optimizer for deep learning models, designed to accelerate training and reduce memory usage. It leverages information about the curvature of the loss landscape to achieve faster convergence and more efficient memory utilization. By overcoming historical computational barriers and standardizing its usage, Muon brings the theoretical advantages of second-order optimization to the scale required for LLMs, potentially reshaping both practice and expectations in deep learning.