PALS0039 Introduction to Deep Learning for Speech and Language Processing

Week 5 - Lexical Semantics and Word Embeddings

In which we investigate vector representations of word semantics, methods for computing word similarity and the Word2Vec method for unsupervised learning of word embeddings.

Learning Objectives

By the end of the session the student will be able to:

explain the advantages in representing the meaning of words and sentences as feature vectors
explain how vector representations of word meaning arise from the distributional hypothesis
outline the Word2Vec method for unsupervised learning of word embeddings
describe some applications of word embeddings
use the Keras toolkit to train word embeddings and use them for text processing

Outline

Preamble: What is "Meaning"?

We reflect on what we mean by the proposition that words have meaning. We contrast (i) Dictionary meanings, in which words are defined in terms of other words, (ii) Grounded meanings, in which words are defined with respect to the world, and (iii) Understanding, in which a conscious being knows what a word means.

The symbol grounding problem.

Semantic Relationships

We discuss how we might represent the meanings of words to a computer system. Using the analogy of a thesaurus, we can represent different kinds of word relations: Synonym and Antonym, Hyponym and Hypernym, Meronym and Holonym. Then using a network of relations we can traverse links between words which in turn can provide numerical estimates of word similarity.

WordNet, a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept.
F. Hill, R. Reichart, A. Korhonen, SimLEx-999: Evaluating Semantic Models with (Genuine) Similarity Estimation, Computational Linguistics (2015). SimLex data on Github.

Vector Semantics

We introduce the idea that the meanings of words can be expressed as numerical vectors such that the distance between vectors captures word similarity. These vectors can be estimated using the distributional hypothesis: that words of related meanings will occur in similar contexts. We show how vectors can be estimated from co-occurrence statistics, and how they may be compared using Euclidean and Cosine distance metrics.

Distributional semantics. From Wikipedia.

Word Embeddings with Word2Vec

We discuss how the problem of estimating semantic vectors can be formulated as a problem in deep learning. We describe the Word2Vec approach of Mikolov et al (2013), and show some examples of word clustering and the application of vector arithmetic to word meaning relationships.

T. Mikolov, I. Sutskever, K. Chen, G. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, Advances in Neural Information Processing Systems (2013).

Other Approaches to Word Embeddings

We briefly mention other approaches to the calculation of word embeddings. The Glove approach by Pennington et al (2014) frames the problem as one of factorising large matrices of co-occurrence counts. The FastText approach of Joulin et al (2017), studies embeddings of fixed length character chunks rather than words. More recent approaches to contextualised embeddings (such as BERT) will be discussed in a later lecture.

J. Pennington, R. Socher, C. Manning, GloVe: Global vectors for Word Representation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (2014)
A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of Tricks for Efficient Text Classification arXiv:1607.01759, 2017.

Using pre-trained word embeddings

We show how pre-trained word embeddings can be used within Colab to calculate word similarity and also act as a first layer input for a DNN classifier in Keras.

Research Paper of the Week

B. Eisner, T. Rocktäschel, I. Augenstein, M. Bošnjak, S. Reidel, emoji2vec: Learning Emoji Representations from their Description, Proceedings of the 4th International Workshop on Natural Language Processing for Social Media, EMNLP 2016.

Web Resources

Word Vector Representations: word2vec

Readings

Be sure to read one or more of these discussions of word embedding:

Vector Semantics and Embeddings from Spoken Language processing (3rd edition), Jurafsky & Martin, 2019.
Word2Vec tutorial: the skip-gram model
Word2Vec in Keras tutorial

Exercises

Implement answers to the problems described in the notebooks below. Save your completed notebooks into your personal Google Drive account.

Word count: . Last modified: 22:45 11-Mar-2022.