Speech Processing
by Computer
LAB 10
LARGE
VOCABULARY SPEECH RECOGNITION
In this lab we
will look at the phone probability table and the word lattice generated for
some short input sentences by a research prototype large vocabulary recogniser.
The recogniser is
called AbbotDemo, it uses a recursive neural network to estimate phone
probabilities then a standard decoder and language model. More information can be found at http://svr-www.eng.cam.ac.uk/~ajr/abbotDoc/index.html.
1.
Choose a sentence of about 3 or 4 words which is relatively ambiguous in
terms of interpretation. An example
would be “Wreck a nice beach”
(“recognise speech”). We are particularly
interested in how the system deals with assimilations, elisions and other
contextual effects.
2.
Record a version of your sentence using SFSWin with a sampling rate of
16,000 samples/sec. Save your SFS file
to p:\lab\yourname.sfs
3.
Mark will then run the AbbotRecogniser over your recording, saving the
results back to your SFS file. He will
tell you the ‘best’ interpretation.
4.
View the results of the recognition and make sure you understand what
you are seeing.
5.
Print out a spectrogram and the phone probability table. Look at the phonetic interpretations the
system has made.
a. Where are they
correct?
b. Where are they
incorrect?
c. Where does the
system produce ambiguity?
d. Can you see
places where contextual variation has confused the system?
6.
Print out the word lattice on a separate sheet with an identical
timescale to the spectrogram. Study the
words that were hypothesised.
a. Can you see the
reasons behind the hypotheses?
b. Why is the same
word hypothesised multiple times?
c. What happens to
the variety of word hypotheses when the phonetic probabilities are ambiguous?
7.
Repeat for each student with a different sentence.