PALS0039 Introduction to Deep Learning for Speech and Language Processing

Week 7 - Language Modelling

In which we build models that predict the continuation of sentences and which can be used to generate text and improve the accuracy of speech recognition and machine translation.

Learning Objectives

By the end of the session the student will be able to:

explain the purpose of a language model
describe how language models can be built from corpora using n-gram statistics
describe some of the problems in using n-grams for language modelling
explain how the performance of language models can be computed using perplexity
describe how language models can be built using recurrent neural networks
list some of the advantages of neural language models over n-gram models
describe some applications of language models
use Keras to implement, train and evaluate a language model

Outline

Probability of a sentence

We discuss what it means to ascribe a probability to a sentence, and how that relates to conventional linguistic views of meaningfulness and grammaticality.

Colorless green ideas sleep furiously - Wikipedia
Rethinking language: How probabilities shape the words we use by Thomas Griffiths.

Statistical language models

We introduce a probabilistic model of text based on n-grams, describing how n-grams are used to estimate probabilities. We discuss problems with the n-gram approach, and one method for assessing the quality of a language model using perplexity.

Probability for linguists. An introduction to probability theory for Linguists.
R. Sanketh, Language modelling with NLTK. A tutorial introduction to building an n-gram model using NLTK.
NLTK language modelling documentation.
Perplexity - from Wikipedia.

Neural language models

We look at how recurrent networks can be used to build neural language models, and compare their performance with n-gram models on the same data. We demonstrate the training of a neural language model and compare it to an n-gram model trained on the same data.

T. Mikolov, G. Zweig, Context dependent recurrent neural network language model, 2012 IEEE Spoken Language Technology Workshop.

Applications of language models

We look at how language models can be used to generate text, and how they can be used in machine translation and speech recognition to improve the quality of the output.

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, Language models are unsupervised multitask learners, 2019. Paper describes the GPT-2 language model.
Talk to Transformer. See how a state-of-the-art language model can generate text given a starting prompt.
T. Mikolov, G. Zweig, Context dependent recurrent neural network language model, 2012 IEEE Spoken Language Technology Workshop.

Research Paper of the Week

D. Kauchak, Improving Text Simplification Language Modeling Using Unsimplified Text Data Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013.

Readings

Be sure to read one or more of these discussions of language models:

N-gram Language models from Spoken Language processing (3rd edition), Jurafsky & Martin, 2019.
Sequence Processing with Recurrent Networks from Spoken Language processing (3rd edition), Jurafsky & Martin, 2019.

Exercises

Implement answers to the problems described in the notebooks below. Save your completed notebooks into your personal Google Drive account.

Word count: . Last modified: 22:45 11-Mar-2022.