Speech Processing by Computer

LECTURE 9

SPEECH RECOGNITION

Objectives

By the end of the session you should:

• be able to describe the basic architecture of an isolated word recognition system

• be able to list the main sources of variability which present problems for such systems and how they are typically overcome

• be able to describe what end-point detection does

• be able to justify what acoustic parameters are used for matching

• be able to explain why non-linear time alignment is necessary and how it works in general terms

Outline

9. Isolated Word Recognition

9.1. Aims

9.1.1. Word identification

9.2. Applications

9.2.1. Command and Control

9.3. Problems

9.3.1. Segmentation

9.3.1.1.Isolated words

9.3.2. Speaker variability

9.3.2.1.Speaker dependency

9.3.3. Environmental variability

9.3.3.1.Close-talking microphone

9.3.4. Discrimination

9.3.4.1.Limited vocabulary

9.4. Construction

9.4.1. End-point detection

9.4.2. Acoustic Processing

9.4.2.1.Spectral shape parameters

9.4.3. Matching

9.4.3.1.Spectrum distance

9.4.3.2.Time alignment

Reading

G. Chollet, "Automatic Speech and Speaker Recognition", in Fundamentals of Speech Synthesis and Speech Recognition ed E. Keller, Wiley, 1994.

J.N. Holmes, "Speech recognition by pattern matching of whole words", in Speech Synthesis and Recognition, van Nostrand, 1988.