PALS0039 Introduction to Deep Learning for Speech and Language Processing

Week 3 - Artificial Neural Networks

In which we look at the general structure of artificial neural networks, including the mathematical description of processing within a node, and how networks can learn by gradient descent.

Learning Objectives

By the end of the session the student will be able to:

describe the key developments in the history of artificial neural networks for machine learning.
describe the perceptron learning algorithm
explain how gradient descent works in multiple-layer networks
use the Keras toolkit to implement, train and test neural network models for simple problems

Outline

Neural networks for machine learning

We view the brain as an information processing system, looking at the operation of neurons and how a network of neurons can perform complex calculations. We discuss how artificial neural networks can be motivated by the processing networks of the brain without being simulations of biological neurons.

R. Nagyfi, The differences between artificial and biological neural networks.

The Perceptron

We discuss a simple mathematical of a neuron proposed by McCulloch and Pitts, and a means to train such a model from data proposed by Rosenblatt: the Perceptron learning rule.

Warren McCulloch and Walter Pitts, A Logical Calculus of Ideas Immanent in Nervous Activity, 1943
Frank Rosenblatt, The Perceptron — a perceiving and recognizing automaton. 1957
Perceptron learning animation

Multiple layers of perceptrons

We look at the criticisms of perceptrons as a means to perform information processing, and a solution to the problem of training multiple layers of perceptrons through the use of gradient descent.

Marvin Minsky and Seymour Papert, Perceptrons, 1969
Introduction to learning by gradient descent
Automatic differentiation

Deep neural networks

We briefly review the history of how networks of multi-layer perceptrons turned into "deep" networks. Problems in extending gradient descent to large networks were gradually overcome by improvements in algorithms and an increase in computer power. We outline common activation functions, loss functions and optimisation methods.

A. Krizhevky, I. Sutskever, G. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NeurIPS, 2012.
D. Kingma, J. Ba, Adam: A Method for Stochastic Optimization, 2015.

Programing DNNs

We introduce the Keras toolkit for building, training and running deep neural networks.

Research Paper of the Week

I. Howard, M. Huckvale, Two-level recognition of isolated words using neural nets, First IEE International Conference on Artificial Neural Networks, 1989.

Web Resources

How Deep Neural Networks Work, excellent introductory video. [If you want to be challenged, try to build a version of this example in Keras - classify 4-pixel inputs as solid, or made up of a vertical, horizontal or diagonal line]
LinkedIn Video Course: Building recommender systems with machine learning and AI: History of artificial neural networks. Accessible to all UCL staff and students through this sign on.
LinkedIn Video Course: Artificial intelligence foundations neural networks. Accessible to all UCL staff and students through this sign on.
Introduction to Neural Networks YouTube Videos by 3Blue1Brown.

Readings

Be sure to read one or more of these discussions of deep learning:

Exercises

Implement answers to the problems described in the notebooks below. Save your completed notebooks into your personal Google Drive account.

Word count: . Last modified: 22:45 11-Mar-2022.