Skip to main content

Softmax

Transformers & Attention
·866 words·5 mins
This blog post explains how self-attention and softmax function in Transformer models, crucial for modern NLP. It breaks down how self-attention helps models understand relationships between tokens and how softmax ensures efficient computation and numerical stability.
Softmax
·1713 words·9 mins
Softmax stands as a pivotal component in neural network architectures, offering a means to convert raw scores into interpretable probabilities.