↓Skip to main content

Softmax

Transformers & Attention

5 August 2024·866 words·5 mins

This blog post explains how self-attention and softmax function in Transformer models, crucial for modern NLP. It breaks down how self-attention helps models understand relationships between tokens and how softmax ensures efficient computation and numerical stability.

17 April 2024·1713 words·9 mins

Softmax stands as a pivotal component in neural network architectures, offering a means to convert raw scores into interpretable probabilities.