Speech Processing by Computer

LAB 10

LARGE VOCABULARY SPEECH RECOGNITION

In this lab we will look at the phone probability table and the word lattice generated for some short input sentences by a research prototype large vocabulary recogniser.

The recogniser is called AbbotDemo, it uses a recursive neural network to estimate phone probabilities then a standard decoder and language model. More information can be found at http://svr-www.eng.cam.ac.uk/~ajr/abbotDoc/index.html.

1. Choose a sentence of about 3 or 4 words which is relatively ambiguous in terms of interpretation. An example would be “Wreck a nice beach” (“recognise speech”). We are particularly interested in how the system deals with assimilations, elisions and other contextual effects.

2. Record a version of your sentence using SFSWin with a sampling rate of 16,000 samples/sec. Save your SFS file to p:\lab\yourname.sfs

3. Mark will then run the AbbotRecogniser over your recording, saving the results back to your SFS file. He will tell you the ‘best’ interpretation.

4. View the results of the recognition and make sure you understand what you are seeing.

5. Print out a spectrogram and the phone probability table. Look at the phonetic interpretations the system has made.

a. Where are they correct?

b. Where are they incorrect?

c. Where does the system produce ambiguity?

d. Can you see places where contextual variation has confused the system?

6. Print out the word lattice on a separate sheet with an identical timescale to the spectrogram. Study the words that were hypothesised.

a. Can you see the reasons behind the hypotheses?

b. Why is the same word hypothesised multiple times?

c. What happens to the variety of word hypotheses when the phonetic probabilities are ambiguous?

7. Repeat for each student with a different sentence.