Skillscast coming soon.
In deep NLP, recurrent neural networks (RNNs) are used to generate a sequence of words from an image, video, or another sentence. However, the information in the input must be compressed into lower dimensional vectors that suffer from a lack of information. This is particularly problematic when generating long sequences of words. Even LSTMs have a finite memory!
Attention mechanisms allow the RNN to attend to any part of the input image/video/sentence in order to generate the next word. This leads to better translation and new interesting ways to introspect our deep NLP models. In this session we'll dive into the seminal work of Bahdanau, Cho, and Bengio to get a better understanding of how and why these architectures work so well.
Blog: Attention and Memory in Deep Learning and NLP - Wild ML
Paper: Neural Machine Translation by Jointly Learning to Align and Translate, D Bahdanau, K Cho, Y Bengio - ICLR 2015
Code: a TensorFlow implementation of a sequence-to-sequence model with an attention mechanism is described here.
YOU MAY ALSO LIKE:
- Got The Perfect Agile Model? Here’s What You Should Know About Agnostic Agile (in London on 16th August 2017)
- Uncle Bob's Advanced TDD (in London on 30th - 31st October 2017)
- Whole Team Approach to Agile Testing (in London on 6th - 8th November 2017)
- Agile Testing & BDD eXchange 2017 (in London on 9th - 10th November 2017)