Attention

DenseNet: How Connections Revolutionized Deep Learning

10 September 2025·4377 words·21 mins

This series explores DenseNet’s revolutionary approach to neural connectivity that solved vanishing gradients and improved feature reuse, examines its mathematical foundations and practical implementation, and discusses how its limitations eventually paved the way for Vision Transformers. We trace the evolution from convolutional networks to hybrid architectures, showing how each innovation built upon previous breakthroughs while addressing their shortcomings in the endless pursuit of more efficient and powerful deep learning models.

Transformers & Attention

5 August 2024·866 words·5 mins

This blog post explains how self-attention and softmax function in Transformer models, crucial for modern NLP. It breaks down how self-attention helps models understand relationships between tokens and how softmax ensures efficient computation and numerical stability.

Infini-Attention Paper Review

3 May 2024·438 words·3 mins

Infini-Attention introduces a novel approach to scaling Transformer models for infinitely long inputs while managing memory and computation.