Attention Mechanisms in Deep Learning

26th June 2017 in London at CodeNode

This SkillsCast was filmed at Attention Mechanisms in Deep Learning

Skillscast coming soon.

In deep NLP, recurrent neural networks (RNNs) are used to generate a sequence of words from an image, video, or another sentence. However, the information in the input must be compressed into lower dimensional vectors that suffer from a lack of information. This is particularly problematic when generating long sequences of words. Even LSTMs have a finite memory!

Attention mechanisms allow the RNN to attend to any part of the input image/video/sentence in order to generate the next word. This leads to better translation and new interesting ways to introspect our deep NLP models. In this session we'll dive into the seminal work of Bahdanau, Cho, and Bengio to get a better understanding of how and why these architectures work so well.

Blog: Attention and Memory in Deep Learning and NLP - Wild ML

Paper: Neural Machine Translation by Jointly Learning to Align and Translate, D Bahdanau, K Cho, Y Bengio - ICLR 2015

Code: a TensorFlow implementation of a sequence-to-sequence model with an attention mechanism is described here.

Background Material

Oxford CS: Deep Learning for Natural Language Processing 2016-2017 Lecture 8: slides and recording.


Thanks to our sponsors

Attention Mechanisms in Deep Learning