Speech Processing by Computer

 

LECTURE 9

SPEECH RECOGNITION

 

Objectives

 

By the end of the session you should:

be able to describe the basic architecture of an isolated word recognition system

be able to list the main sources of variability which present problems for such systems and how they are typically overcome

be able to describe what end-point detection does

be able to justify what acoustic parameters are used for matching

be able to explain why non-linear time alignment is necessary and how it works in general terms

 

Outline


 

9.      Isolated Word Recognition

9.1.   Aims

9.1.1.      Word identification

9.2.   Applications

9.2.1.      Command and Control

9.3.   Problems

9.3.1.      Segmentation

9.3.1.1.Isolated words

9.3.2.      Speaker variability

9.3.2.1.Speaker dependency

9.3.3.      Environmental variability

9.3.3.1.Close-talking microphone

9.3.4.      Discrimination

9.3.4.1.Limited vocabulary

9.4.   Construction

9.4.1.      End-point detection

9.4.2.      Acoustic Processing

9.4.2.1.Spectral shape parameters

9.4.3.      Matching

9.4.3.1.Spectrum distance

9.4.3.2.Time alignment

Reading

 

G. Chollet, "Automatic Speech and Speaker Recognition", in Fundamentals of Speech Synthesis and Speech Recognition ed E. Keller, Wiley, 1994.

 

J.N. Holmes, "Speech recognition by pattern matching of whole words", in Speech Synthesis and Recognition, van Nostrand, 1988.