↓Skip to main content

Memory Efficiency

DenseNet: How Connections Revolutionized Deep Learning

10 September 2025·4377 words·21 mins

This series explores DenseNet’s revolutionary approach to neural connectivity that solved vanishing gradients and improved feature reuse, examines its mathematical foundations and practical implementation, and discusses how its limitations eventually paved the way for Vision Transformers. We trace the evolution from convolutional networks to hybrid architectures, showing how each innovation built upon previous breakthroughs while addressing their shortcomings in the endless pursuit of more efficient and powerful deep learning models.

Muon: Second Order Optimizer for Hidden Layers

18 July 2025·1209 words·6 mins

Muon is a second-order optimizer for deep learning models, designed to accelerate training and reduce memory usage. It leverages information about the curvature of the loss landscape to achieve faster convergence and more efficient memory utilization. By overcoming historical computational barriers and standardizing its usage, Muon brings the theoretical advantages of second-order optimization to the scale required for LLMs, potentially reshaping both practice and expectations in deep learning.