Loading Book
Attention Is All You Need - Visually
Slideshow
Edit
Introduction
Introduction
1. Introduction to the Transformer Model
Introduction
2. Dominant Sequence Transduction Models
Model Architecture
Model Architecture
3. Transformer: A New Architecture
Model Architecture
4. Encoder and Decoder Stacks
Model Architecture
5. Attention Mechanisms
Attention Mechanisms
Attention Mechanisms
6. Scaled Dot-Product Attention
Attention Mechanisms
7. Multi-Head Attention
Applications of Attention
Applications of Attention
8. Self-Attention
Model Architecture
Model Architecture
9. Position-Wise Feed-Forward Networks
Embeddings and Softmax
Embeddings and Softmax
10. Embeddings and Positional Encoding
Why Self-Attention
Why Self-Attention
11. Why Self-Attention?
Training
Training
12. Training and Optimizations
Results
Results
13. Machine Translation Results
Model Variations
Model Variations
14. Model Variations and Performance
Parsing Task
Parsing Task
15. English Constituency Parsing
Conclusion
Conclusion
16. Conclusion and Future Work