PALS0039 Introduction to Deep Learning for Speech and Language Processing
SPEECH, HEARING & PHONETIC SCIENCES
UCL Division of Psychology and Language Sciences

Week 6 - Recurrent Networks

In which we discuss the construction of networks that can process variable length sequences by maintaining a memory of past input, and demonstrate how they can be applied to spoken and written utterances.

Learning Objectives

By the end of the session the student will be able to:

Outline

  1. Text and speech as sequences
  2. We contrast the typical way in which people process images with how they process language. The interpretation of language involves building some mental representation over time, implying that we need to treat an utterance as a sequence of events that are processed in order. This means that we need to have some kind of memory of earlier elements of the sequence as we process new elements. A simple way to incorporate memory in sequence processing is to give the system at each time step access to both past inputs and to past outputs. This is analogous to recursive filter design.

  3. Recurrent networks
  4. We imagine a neural network unfolded over time, taking as input one input vector at a time from a sequence of vectors. Then at each time step, each node in the network has access to both the current input and the output from nodes in the previous time step. When the network reaches the last input in the sequence, the final output could be considered the overall processing result for the whole sequence (i.e. an input sequence goes to one output label). Alternatively, the network might produce one output per time step (i.e. an input sequence goes to another sequence of the same length).

  5. Advanced RNN nodes
  6. The simple recurrent network model has a number of limitations: it can be difficult to train and tends to have rather limited memory capacity for past inputs. It has been suggested that this problem is caused by the fact that the estimated gradients of the error w.r.t. weights get small very quickly as the distance along the sequence grows - the problem of the "vanishing gradient". We introduce two more recurrent node types that attempt to address this problem: the long short term memory node (LSTM) and the gated recurrent node (GRU). The LSTM allow for richer context status information to pass between nodes across time, rather than just the node outputs. The LSTM also has specific operations for adding to and removing information from this context. The GRU is a simpler version of the LSTM which can give similar performance in some instances for less computation. We also note that RNNs can be employed bi-directionally to give simultaneous left-to-right and righ-to-left analysis.

  7. Contextualised word embeddings
  8. RNNs can used to generate word embeddings which are sensitive to the context in which words occur. The ELMO approach to deep contextualised embeddings (Peters, et al, 2018), starts with the context-free embeddings and then finds an embedding for each word which is useful for predicting the whole training sequence in order.

Research Paper of the Week

Web Resources

Readings

Be sure to read one or more of these discussions of recurrent neural networks:

Exercises

Implement answers to the problems described in the notebooks below. Save your completed notebooks into your personal Google Drive account.

    1. Sentiment Analysis from text using RNNs
    2. Isolated Word Recognition from audio using RNNs

Word count: . Last modified: 22:45 11-Mar-2022.