## Week 6 - Recurrent Networks

In which we discuss the construction of networks that can process variable length sequences by maintaining a memory of past input, and demonstrate how they can be applied to spoken and written utterances.

### Learning Objectives

By the end of the session the student will be able to:

- describe the need for systems with memory when processing sequences
- describe the structure and operation of simple recurrent networks
- implement a recurrent network for a simple problem in classifying sequences
- understand the limitations of simple recurrent layers and training
- describe some more advanced types of recurrent nodes
- use Keras to implement and train recurrent networks to solve sequence classification problems in both speech and text

### Outline

- Text and speech as sequences
- Recurrent networks
- Advanced RNN nodes
- S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Computation, 1997.
- J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modelling. arXiv, 2014.
- Contextualised word embeddings
- M. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, L. Zettlemoyer, Deep contextualised word representations, arXiv, 2018.

We contrast the typical way in which people process images with how they process language. The interpretation of language involves building some mental representation over time, implying that we need to treat an utterance as a sequence of events that are processed in order. This means that we need to have some kind of memory of earlier elements of the sequence as we process new elements. A simple way to incorporate memory in sequence processing is to give the system at each time step access to both past inputs and to past outputs. This is analogous to recursive filter design.

We imagine a neural network unfolded over time, taking as input one input vector at a time from a sequence of vectors. Then at each time step, each node in the network has access to both the current input and the output from nodes in the previous time step. When the network reaches the last input in the sequence, the final output could be considered the overall processing result for the whole sequence (i.e. an input sequence goes to one output label). Alternatively, the network might produce one output per time step (i.e. an input sequence goes to another sequence of the same length).

The simple recurrent network model has a number of limitations: it can be difficult to train and tends to have rather limited memory capacity for past inputs. It has been suggested that this problem is caused by the fact that the estimated gradients of the error w.r.t. weights get small very quickly as the distance along the sequence grows - the problem of the "vanishing gradient". We introduce two more recurrent node types that attempt to address this problem: the long short term memory node (LSTM) and the gated recurrent node (GRU). The LSTM allow for richer context status information to pass between nodes across time, rather than just the node outputs. The LSTM also has specific operations for adding to and removing information from this context. The GRU is a simpler version of the LSTM which can give similar performance in some instances for less computation. We also note that RNNs can be employed bi-directionally to give simultaneous left-to-right and righ-to-left analysis.

RNNs can used to generate word embeddings which are sensitive to the context in which words occur. The ELMO approach to deep contextualised embeddings (Peters, et al, 2018), starts with the context-free embeddings and then finds an embedding for each word which is useful for predicting the whole training sequence in order.

### Research Paper of the Week

- S. Karlekar, T. Niu, M. Bansal, Detecting Linguistic Characteristics of Alzheimer’s Dementia by Interpreting Neural Models, Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018.

### Web Resources

- LinkedIn Video Course:
**Building recommender systems with machine learning and ai**, in particular:**Intro to recurrent neural networks**. Accessible to all UCL staff and students through this sign on.

### Readings

Be sure to read one or more of these discussions of recurrent neural networks:

- Understanding RNNs and LSTMs
- Introduction to recurrent neural networks (With calculations).
- Introduction to recurrent neural networks

### Exercises

Implement answers to the problems described in the notebooks below. Save your completed notebooks into your personal Google Drive account.

Word count: . Last modified: 22:45 11-Mar-2022.