Speech Processing by Computer






In this lab session we will test an isolated-word, template based recogniser on a relatively easy vocabulary, but with the problem of multiple speakers.


1.                  Vocabulary design and capture


a.       Design a vocabulary of about 20 words.  Choose words with a variety of segmental structure and length: place names or animal names for example.  Add in a few minimal pairs.


b.      Use the 'learn' script to record each student saying each word in isolation.  When asked to type in the word use the format <word>.<initials>, e.g. "cat.mh".  This way we can keep track over which templates the system uses to recognise unknown words. 


2.                  Testing


a.       Use the 'recog' script to test out the system's performance.  Test with 10 of the vocabulary words chosen randomly by each student.  Record the number of correct responses and the confusions.  Which words are most often confused?  Could you have predicted this?  Does the system tend to use templates spoken by the same speaker?


3.                  Internals


a.       Use the 'recog2' script to display the table of best-matched distances to an input.  Try out some vocabulary words and investigate the relative size of the smallest distances for the correct and first incorrect answer.  Can you see any pattern in the ordering of scores?


4.                  Architecture


a.       Sketch a schematic diagram of the components of the recogniser.  Look inside the scripts learn.bat and recog.bat to see how they work.


b.      What do you think are the problems associated with extending this system to 100, and to 100,000 words?