Week 4 - Preparation of Text and Speech for Machine Learning
In which we look at how to prepare text and speech materials to make them compatible with machine learning approaches to classification and regression.
Learning Objectives
By the end of the session the student will be able to:
- describe typical pre-processing steps used to generate numerical coding of text
- describe processes of tokenisation, normalisation applied to word tokens
- explain how words can be described in terms of one-hot vectors
- explain how documents can be encoded as bags of words
- describe how a DNN can be applied to document classification
- describe typical pre-processing steps used to generate numerical coding of speech recordings
- explain how spoken utterances can be described a series of short-time signals analyses
- explain how spoken utterances can be encoded as a fixed length vector
- describe how a DNN can be applied to spoken utterance classification
- describe how a DNN can be applied to spoken utterance regression
- use the Keras toolkit to implement, train and test DNN models for text and speech problems
Outline
- Data preparation for machine learning
We present a high-level overview of the problems of converting text and speech data into a form suited for machine learning. We discuss a general approach for summarising variable-length sequences to fixed-length vectors.
- Text preparation
We discuss different pre-processing steps necessary to convert text into numerical form suitable for machine learning: (i) Tokenisation, (ii) Stop word removal, (iii) Normalisation, stemming and lemmatisation (iv) Building a dictionary, (v) One-hot coding of words, (vi) Bag of words document model.
- NLTK: Natural Language Toolkit. A Text Book is also available.
- Term Frequency and Inverse Document Frequency from Wikipedia.
- Using TF-IDF for document ranking in Python
- Speech preparation
We discuss the different pre-processing steps necessary to convert speech recordings into a form suitable for machine learning: (i) Recording, (ii) Segmentation, (iii) Short-time analysis, (iv) Summarisation.
- Prompt and Record program
- OpenSMILE Toolkit
- Gaussian Mixture Models explained - mathematical description.
- Kerkeni, L.; Serrestou, Y.; Mbarki, M.; Raoof, K. and Mahjoub, M., Speech Emotion Recognition: Methods and Cases Study, In Proceedings of the 10th International Conference on Agents and Artificial Intelligence (2018) 175-182.
Research Paper of the Week
- A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of Tricks for Efficient Text Classification arXiv:1607.01759, 2017.
- M. Huckvale, A. Webb, A Comparison of Human and Machine Estimation of Speaker Age, Third International Conference Statistical Language and Speech Processing, Budapest, 2015.
Web Resources
- LinkedIn Video Course: NLP with Python for Machine Learning Essential Training. Accessible to all UCL staff and students through this sign on.
Readings
Be sure to read one or more of these discussions of text and speech processing
- Ch.3 Processing Raw Text from S. Bird, E. Klein and E. Loper, Natural Language Processing with Python – Analyzing Text with the Natural Language Toolkit.
- Hands On: Existing Toolkits and Practical Tutorial. Describes use of the OpenSMILE toolkit for analysing speech utterances into a bag of features.
Tutorial Notebooks
- Introduction to NLTK - natural language processing toolkit
Exercises
Implement answers to the problems described in the notebooks below. Save your completed notebooks into your personal Google Drive account.
Word count: . Last modified: 22:45 11-Mar-2022.