Week 5 - Lexical Semantics and Word Embeddings
In which we investigate vector representations of word semantics, methods for computing word similarity and the Word2Vec method for unsupervised learning of word embeddings.
Learning Objectives
By the end of the session the student will be able to:
- explain the advantages in representing the meaning of words and sentences as feature vectors
- explain how vector representations of word meaning arise from the distributional hypothesis
- outline the Word2Vec method for unsupervised learning of word embeddings
- describe some applications of word embeddings
- use the Keras toolkit to train word embeddings and use them for text processing
Outline
- Preamble: What is "Meaning"?
We reflect on what we mean by the proposition that words have meaning. We contrast (i) Dictionary meanings, in which words are defined in terms of other words, (ii) Grounded meanings, in which words are defined with respect to the world, and (iii) Understanding, in which a conscious being knows what a word means.
- Semantic Relationships
We discuss how we might represent the meanings of words to a computer system. Using the analogy of a thesaurus, we can represent different kinds of word relations: Synonym and Antonym, Hyponym and Hypernym, Meronym and Holonym. Then using a network of relations we can traverse links between words which in turn can provide numerical estimates of word similarity.
- WordNet, a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept.
- F. Hill, R. Reichart, A. Korhonen, SimLEx-999: Evaluating Semantic Models with (Genuine) Similarity Estimation, Computational Linguistics (2015). SimLex data on Github.
- Vector Semantics
We introduce the idea that the meanings of words can be expressed as numerical vectors such that the distance between vectors captures word similarity. These vectors can be estimated using the distributional hypothesis: that words of related meanings will occur in similar contexts. We show how vectors can be estimated from co-occurrence statistics, and how they may be compared using Euclidean and Cosine distance metrics.
- Distributional semantics. From Wikipedia.
- Word Embeddings with Word2Vec
We discuss how the problem of estimating semantic vectors can be formulated as a problem in deep learning. We describe the Word2Vec approach of Mikolov et al (2013), and show some examples of word clustering and the application of vector arithmetic to word meaning relationships.
- T. Mikolov, I. Sutskever, K. Chen, G. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, Advances in Neural Information Processing Systems (2013).
- Other Approaches to Word Embeddings
We briefly mention other approaches to the calculation of word embeddings. The Glove approach by Pennington et al (2014) frames the problem as one of factorising large matrices of co-occurrence counts. The FastText approach of Joulin et al (2017), studies embeddings of fixed length character chunks rather than words. More recent approaches to contextualised embeddings (such as BERT) will be discussed in a later lecture.
- J. Pennington, R. Socher, C. Manning, GloVe: Global vectors for Word Representation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (2014)
- A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of Tricks for Efficient Text Classification arXiv:1607.01759, 2017.
- Using pre-trained word embeddings
We show how pre-trained word embeddings can be used within Colab to calculate word similarity and also act as a first layer input for a DNN classifier in Keras.
Research Paper of the Week
- B. Eisner, T. Rocktäschel, I. Augenstein, M. Bošnjak, S. Reidel, emoji2vec: Learning Emoji Representations from their Description, Proceedings of the 4th International Workshop on Natural Language Processing for Social Media, EMNLP 2016.
Web Resources
-
Word Vector Representations: word2vec. Stanford University video course: Natural Language Processing with Deep Learning.
Readings
Be sure to read one or more of these discussions of word embedding:
- Vector Semantics and Embeddings from Spoken Language processing (3rd edition), Jurafsky & Martin, 2019.
- Word2Vec tutorial: the skip-gram model
- Word2Vec in Keras tutorial
Exercises
Implement answers to the problems described in the notebooks below. Save your completed notebooks into your personal Google Drive account.
Word count: . Last modified: 22:45 11-Mar-2022.