Week 2 - Principles of Machine Learning
In which we present the essential process of using machine learning for simple regression and classification tasks, including data preparation, algorithm selection and hyperparameter selection, with particular emphasis on the problem of overfitting and the need for cross-validation.
Learning Objectives
By the end of the session the student will be able to:
- describe when machine learning methods are appropriate to solve a task
- explain the difference between regression tasks and classification tasks
- experience some methods for machine learning other than deep learning
- describe the overfitting problem, and the importance of cross-validation
- describe the basic steps in setting up a machine learning model
- solve simple regression and classification tasks using Python
Outline
- When to use Machine Learning
Fundamentally Machine Learning is about learning rules and patterns from data and then applying them to new samples of the problem. ML is preferred over hand-generated rules when the list of rules is long and complex, where no good solutions exist, if the problem is so hard that there are no insights into how it might be addressed, or if the data fluctuate over time. ML is not appropriate when causal models exist (except to approximate these with less computation), when only a small number of examples exist, where there is a need to test an explicit hypothesis about the problem, or where we cannot state the objective of learning in mathematical form.
- A. Samuel, "Some studies in machine learning using the game of checkers", IBM Journal of Research and Development 3 1959.
- A. Geron, "Hands on machine learning with Scikit-learn and Tensorflow", O'Reilly, 2019, Chapter 1.
- The process of Machine Learning
For ML to be useful it must generalise from the given data to new data.So it is essential to evaluate ML using separate or held-out data. This is called "cross-validation". When planning a machine learning solution to a problem there are common steps to follow in collecting the data, designing the representations of the data, choosing an ML method and training a model, and evaluating the performance using a metric for success.
- A. Geron, "Hands on machine learning with Scikit-learn and Tensorflow", O'Reilly, 2019, Chapter 2.
- Machine Learning methods (not deep learning)
We give an example of supervised learning of a regression task using vowel formant F1 to predict vowel formant F2. We give an example of supervised learning of a classification task using prediction of gender from vowel formant frequencies. We give an example of unsupervised learning using a clustering problem. We give an example of reinforcement learning by training an agent to collect food but ignore poison when moving around a virtual 2D world.
- Machine Learning step-by-step
We look in more detail at the steps necessary to implement a ML application: (1) Prepare the data, (2) Generate and select features, (3) Choose ML strategy, (4) Choose ML method, (5) Choose hyper-parameters, (6) Train model on data, (7) Final evaluation. We use the development of an application that recognises a speaker's emotional state from their speech as an example. We consider some of the problems in ML that can lead to poor performance.
Research Paper of the Week
- The unreasonable effectiveness of data, Alon Halevy, Peter Norvig, Fernando Pereira, IEEE Intelligent Systems Magazine, 2009.
Web Resources
- Linkedin Video course: Python-for-data-science-essential-training: introduction-to-machine-learning. Accessible to all UCL staff and students through this sign on.
- Pandas tutorials.
- Pandas tutorial for beginners.
- Machine learning in Python step by step.
Readings
Be sure to read one or more of these descriptions of the practice of Machine Learning:
- Chapter 2 - End-to-end Machine Learning project, in Hands-On Machine Learning with Scikit-Learn and TensorFlow, by Aurélien Géron.
- A tour of machine learning algorithms.
Tutorial Notebooks
Exercises
Implement answers to the problems described in the notebooks below. Save your completed notebooks into your personal Google Drive account.
Word count: . Last modified: 22:45 11-Mar-2022.