Skip to main content

Attention

Transformers & Attention
·866 words·5 mins
This blog post explains how self-attention and softmax function in Transformer models, crucial for modern NLP. It breaks down how self-attention helps models understand relationships between tokens and how softmax ensures efficient computation and numerical stability.
Infini-Attention Paper Review
·438 words·3 mins
Infini-Attention introduces a novel approach to scaling Transformer models for infinitely long inputs while managing memory and computation.