Week 7 - Language Modelling
In which we build models that predict the continuation of sentences and which can be used to generate text and improve the accuracy of speech recognition and machine translation.
Learning Objectives
By the end of the session the student will be able to:
- explain the purpose of a language model
- describe how language models can be built from corpora using n-gram statistics
- describe some of the problems in using n-grams for language modelling
- explain how the performance of language models can be computed using perplexity
- describe how language models can be built using recurrent neural networks
- list some of the advantages of neural language models over n-gram models
- describe some applications of language models
- use Keras to implement, train and evaluate a language model
Outline
- Probability of a sentence
We discuss what it means to ascribe a probability to a sentence, and how that relates to conventional linguistic views of meaningfulness and grammaticality.
- Colorless green ideas sleep furiously - Wikipedia
- Rethinking language: How probabilities shape the words we use by Thomas Griffiths.
- Statistical language models
We introduce a probabilistic model of text based on n-grams, describing how n-grams are used to estimate probabilities. We discuss problems with the n-gram approach, and one method for assessing the quality of a language model using perplexity.
- Probability for linguists. An introduction to probability theory for Linguists.
- R. Sanketh, Language modelling with NLTK. A tutorial introduction to building an n-gram model using NLTK.
- NLTK language modelling documentation.
- Perplexity - from Wikipedia.
- Neural language models
We look at how recurrent networks can be used to build neural language models, and compare their performance with n-gram models on the same data. We demonstrate the training of a neural language model and compare it to an n-gram model trained on the same data.
- T. Mikolov, G. Zweig, Context dependent recurrent neural network language model, 2012 IEEE Spoken Language Technology Workshop.
- Applications of language models
We look at how language models can be used to generate text, and how they can be used in machine translation and speech recognition to improve the quality of the output.
- A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, Language models are unsupervised multitask learners, 2019. Paper describes the GPT-2 language model.
- Talk to Transformer. See how a state-of-the-art language model can generate text given a starting prompt.
- T. Mikolov, G. Zweig, Context dependent recurrent neural network language model, 2012 IEEE Spoken Language Technology Workshop.
Research Paper of the Week
- D. Kauchak, Improving Text Simplification Language Modeling Using Unsimplified Text Data Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013.
Readings
Be sure to read one or more of these discussions of language models:
- N-gram Language models from Spoken Language processing (3rd edition), Jurafsky & Martin, 2019.
- Sequence Processing with Recurrent Networks from Spoken Language processing (3rd edition), Jurafsky & Martin, 2019.
Exercises
Implement answers to the problems described in the notebooks below. Save your completed notebooks into your personal Google Drive account.
Word count: . Last modified: 22:45 11-Mar-2022.