PALS0039 Introduction to Deep Learning for Speech and Language Processing
UCL Division of Psychology and Language Sciences

Week 5 - Lexical Semantics and Word Embeddings

In which we investigate vector representations of word semantics, methods for computing word similarity and the Word2Vec method for unsupervised learning of word embeddings.

Learning Objectives

By the end of the session the student will be able to:


  1. Preamble: What is "Meaning"?
  2. We reflect on what we mean by the proposition that words have meaning. We contrast (i) Dictionary meanings, in which words are defined in terms of other words, (ii) Grounded meanings, in which words are defined with respect to the world, and (iii) Understanding, in which a conscious being knows what a word means.

  3. Semantic Relationships
  4. We discuss how we might represent the meanings of words to a computer system. Using the analogy of a thesaurus, we can represent different kinds of word relations: Synonym and Antonym, Hyponym and Hypernym, Meronym and Holonym. Then using a network of relations we can traverse links between words which in turn can provide numerical estimates of word similarity.

  5. Vector Semantics
  6. We introduce the idea that the meanings of words can be expressed as numerical vectors such that the distance between vectors captures word similarity. These vectors can be estimated using the distributional hypothesis: that words of related meanings will occur in similar contexts. We show how vectors can be estimated from co-occurrence statistics, and how they may be compared using Euclidean and Cosine distance metrics.

  7. Word Embeddings with Word2Vec
  8. We discuss how the problem of estimating semantic vectors can be formulated as a problem in deep learning. We describe the Word2Vec approach of Mikolov et al (2013), and show some examples of word clustering and the application of vector arithmetic to word meaning relationships.

  9. Other Approaches to Word Embeddings
  10. We briefly mention other approaches to the calculation of word embeddings. The Glove approach by Pennington et al (2014) frames the problem as one of factorising large matrices of co-occurrence counts. The FastText approach of Joulin et al (2017), studies embeddings of fixed length character chunks rather than words. More recent approaches to contextualised embeddings (such as BERT) will be discussed in a later lecture.

  11. Using pre-trained word embeddings
  12. We show how pre-trained word embeddings can be used within Colab to calculate word similarity and also act as a first layer input for a DNN classifier in Keras.

Research Paper of the Week

Web Resources


Be sure to read one or more of these discussions of word embedding:


Implement answers to the problems described in the notebooks below. Save your completed notebooks into your personal Google Drive account.

    1. WordNet and SimLex-999
    2. Word2Vec
    3. GloVe

Word count: . Last modified: 22:45 11-Mar-2022.