PALS0039 Introduction to Deep Learning for Speech and Language Processing

Week 2 - Principles of Machine Learning

In which we present the essential process of using machine learning for simple regression and classification tasks, including data preparation, algorithm selection and hyperparameter selection, with particular emphasis on the problem of overfitting and the need for cross-validation.

Learning Objectives

By the end of the session the student will be able to:

describe when machine learning methods are appropriate to solve a task
explain the difference between regression tasks and classification tasks
experience some methods for machine learning other than deep learning
describe the overfitting problem, and the importance of cross-validation
describe the basic steps in setting up a machine learning model
solve simple regression and classification tasks using Python

Outline

When to use Machine Learning

Fundamentally Machine Learning is about learning rules and patterns from data and then applying them to new samples of the problem. ML is preferred over hand-generated rules when the list of rules is long and complex, where no good solutions exist, if the problem is so hard that there are no insights into how it might be addressed, or if the data fluctuate over time. ML is not appropriate when causal models exist (except to approximate these with less computation), when only a small number of examples exist, where there is a need to test an explicit hypothesis about the problem, or where we cannot state the objective of learning in mathematical form.

A. Samuel, "Some studies in machine learning using the game of checkers", IBM Journal of Research and Development 3 1959.
A. Geron, "Hands on machine learning with Scikit-learn and Tensorflow", O'Reilly, 2019, Chapter 1.

The process of Machine Learning

For ML to be useful it must generalise from the given data to new data.So it is essential to evaluate ML using separate or held-out data. This is called "cross-validation". When planning a machine learning solution to a problem there are common steps to follow in collecting the data, designing the representations of the data, choosing an ML method and training a model, and evaluating the performance using a metric for success.

A. Geron, "Hands on machine learning with Scikit-learn and Tensorflow", O'Reilly, 2019, Chapter 2.

Machine Learning methods (not deep learning)

We give an example of supervised learning of a regression task using vowel formant F1 to predict vowel formant F2. We give an example of supervised learning of a classification task using prediction of gender from vowel formant frequencies. We give an example of unsupervised learning using a clustering problem. We give an example of reinforcement learning by training an agent to collect food but ignore poison when moving around a virtual 2D world.

Machine Learning step-by-step

We look in more detail at the steps necessary to implement a ML application: (1) Prepare the data, (2) Generate and select features, (3) Choose ML strategy, (4) Choose ML method, (5) Choose hyper-parameters, (6) Train model on data, (7) Final evaluation. We use the development of an application that recognises a speaker's emotional state from their speech as an example. We consider some of the problems in ML that can lead to poor performance.

Research Paper of the Week

The unreasonable effectiveness of data, Alon Halevy, Peter Norvig, Fernando Pereira, IEEE Intelligent Systems Magazine, 2009.

Web Resources

Linkedin Video course: Python-for-data-science-essential-training: introduction-to-machine-learning. Accessible to all UCL staff and students through this sign on.
Pandas tutorials.
Pandas tutorial for beginners.
Machine learning in Python step by step.

Readings

Be sure to read one or more of these descriptions of the practice of Machine Learning:

Chapter 2 - End-to-end Machine Learning project, in Hands-On Machine Learning with Scikit-Learn and TensorFlow, by Aurélien Géron.
A tour of machine learning algorithms.

Tutorial Notebooks

Introduction to Pandas

Exercises

Implement answers to the problems described in the notebooks below. Save your completed notebooks into your personal Google Drive account.

Word count: . Last modified: 22:45 11-Mar-2022.