,OUTLINE
Introduction Conclusion
Background Appendix
Model Architecture
Why Self-Attention
Training
Results
,Introduction
, Introduction
RNN, LSTM, GRU have been firmly established as state of the art
approaches in sequence modeling and transduction problems.
such as language modeling and machine translation
recurrent language models and encoder-decoder architectures
Recurrent models typically factor computation along the symbol positions of
the input and output sequences.
The inherently sequential nature precludes parallelization within training examples