PALS0039 Introduction to Deep Learning for Speech and Language Processing

Week 4 - Preparation of Text and Speech for Machine Learning

In which we look at how to prepare text and speech materials to make them compatible with machine learning approaches to classification and regression.

Learning Objectives

By the end of the session the student will be able to:

describe typical pre-processing steps used to generate numerical coding of text
describe processes of tokenisation, normalisation applied to word tokens
explain how words can be described in terms of one-hot vectors
explain how documents can be encoded as bags of words
describe how a DNN can be applied to document classification
describe typical pre-processing steps used to generate numerical coding of speech recordings
explain how spoken utterances can be described a series of short-time signals analyses
explain how spoken utterances can be encoded as a fixed length vector
describe how a DNN can be applied to spoken utterance classification
describe how a DNN can be applied to spoken utterance regression
use the Keras toolkit to implement, train and test DNN models for text and speech problems

Outline

Data preparation for machine learning

We present a high-level overview of the problems of converting text and speech data into a form suited for machine learning. We discuss a general approach for summarising variable-length sequences to fixed-length vectors.

Example walkthrough of data cleaning.

Text preparation

We discuss different pre-processing steps necessary to convert text into numerical form suitable for machine learning: (i) Tokenisation, (ii) Stop word removal, (iii) Normalisation, stemming and lemmatisation (iv) Building a dictionary, (v) One-hot coding of words, (vi) Bag of words document model.

NLTK: Natural Language Toolkit. A Text Book is also available.
Term Frequency and Inverse Document Frequency from Wikipedia.
Using TF-IDF for document ranking in Python

Speech preparation

We discuss the different pre-processing steps necessary to convert speech recordings into a form suitable for machine learning: (i) Recording, (ii) Segmentation, (iii) Short-time analysis, (iv) Summarisation.

Prompt and Record program
OpenSMILE Toolkit
Gaussian Mixture Models explained - mathematical description.
Kerkeni, L.; Serrestou, Y.; Mbarki, M.; Raoof, K. and Mahjoub, M., Speech Emotion Recognition: Methods and Cases Study, In Proceedings of the 10th International Conference on Agents and Artificial Intelligence (2018) 175-182.

Research Paper of the Week

A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of Tricks for Efficient Text Classification arXiv:1607.01759, 2017.
M. Huckvale, A. Webb, A Comparison of Human and Machine Estimation of Speaker Age, Third International Conference Statistical Language and Speech Processing, Budapest, 2015.

Web Resources

LinkedIn Video Course: NLP with Python for Machine Learning Essential Training. Accessible to all UCL staff and students through this sign on.

Readings

Be sure to read one or more of these discussions of text and speech processing

Ch.3 Processing Raw Text from S. Bird, E. Klein and E. Loper, Natural Language Processing with Python – Analyzing Text with the Natural Language Toolkit.
Hands On: Existing Toolkits and Practical Tutorial. Describes use of the OpenSMILE toolkit for analysing speech utterances into a bag of features.

Tutorial Notebooks

Introduction to NLTK - natural language processing toolkit

Exercises

Implement answers to the problems described in the notebooks below. Save your completed notebooks into your personal Google Drive account.

Word count: . Last modified: 22:45 11-Mar-2022.