In deep NLP, recurrent neural networks (RNNs) are used to generate a sequence of words from an image, video, or another sentence. However, the information in the input must be compressed into lower dimensional vectors that suffer from a lack of information. This is particularly problematic when generating long sequences of words. Even LSTMs have a finite memory!
Attention mechanisms allow the RNN to attend to any part of the input image/video/sentence in order to generate the next word. This leads to better translation and new interesting ways to introspect our deep NLP models. In this session we'll dive into the seminal work of Bahdanau, Cho, and Bengio to get a better understanding of how and why these architectures work so well.
Blog: Attention and Memory in Deep Learning and NLP - Wild ML
Paper: Neural Machine Translation by Jointly Learning to Align and Translate, D Bahdanau, K Cho, Y Bengio - ICLR 2015
Code: a TensorFlow implementation of a sequence-to-sequence model with an attention mechanism is described here.
YOU MAY ALSO LIKE: