Research in Speech Recognition
If you are interested in doing graduate research in speech recognition in the department, please contact Mark Huckvale at .
Activities
The focus of our research in speech recognition has been how knowledge of human linguistic processing can aid the development of machine recognition of speech. This has two main threads: the application of traditional linguistic analysis within the decoder, and the analysis of machine and human recognition performance.
Recent work on linguistic analysis has looked at incorporating morphological analysis into the speech recognition decoder. Morphological analysis allows for smaller pronunciation lexicons, while at the same time increasing the dissimilarity of pronunciations. In combination with a word-level language model, we have shown that this approach can lead to improved word accuracy. See the article by Huckvale & Fang (2001).
Work on human-machine comparisons has investigated the psycholinguistic aspects of human morphological processing. Interesting parallels can be drawn between our morph recogniser and the results of priming experiments on human listeners. The most recent work here has been an investigation of how information processing accounts of perception can be informed by knowledge of pattern recognition systems engineering (see seminar).
Current work by Gordon Hunter and Mark Huckvale is looking at the statistical properties of words in dialogue turns. The hope is both to develop superior language models specifically for dialogue and also to find relationships between statistical models and conversational analysis.
Staff
Research work in speech recognition is led by Mark Huckvale with the assistance of postgraduate research students. Mark is an engineer by training who has worked in speech synthesis and recognition for over 15 years. He is also the author of the Speech Filing System tools which incorporate some speech recognition functionality.
Research projects
A recent research project on recognition:
1998-2001 Enhanced Language Modelling
Mark Huckvale, Alex Chengyu Fang
Funded by EPSRC.
Publications
Recent published papers in the area (in reverse order):
- Huckvale, M., Hunter, G., "Learning on the job: the application of machine learning within the speech decoder", IOA Workshop on Innovation in Speech Processing, Stratford-on-Avon, May 2001. Download PDF.
- Huckvale, M., Fang, A., "Experiments in applying morphological analysis in speech recognition and their cognitive explanation", IOA Workshop on Innovation in Speech Processing, Stratford-on-Avon, May, 2001. Download PDF.
- A.C. Fang and M. Huckvale, "Enhanced Language Modelling with Phonologically Constrained Morphological Analysis", in Proceedings of International Conference on Acoustics and Speech Signal Processing, Istanbul, Turkey, 5-9 June, 2000. Download PDF.
- A.C.Fang and M.Huckvale, "Out-of-Vocabulary Rate Reduction through Dispersion-Based Lexicon Acquisition", in Literary and Linguistic Computing, Vol. 15, No. 3, 2000. pp 251-263.
- M.Huckvale, "Opportunities for Re-convergence of Engineering and Cognitive Science Accounts of Spoken Word Recognition", Proc. IOA Conference Speech and Hearing, Windermere, November 1998. Download PDF.
- M.Huckvale, "10 Things Engineers have Discovered about Speech Recognition", Paper presented at NATO ASI Speech Pattern Processing, Jersey, 1997. Download PDF.
- M.Huckvale, "Learning from the experience of building automatic speech recognition systems", Speech Hearing and Language - Work in Progress, University College London, Dept. Phonetics and Linguistics, 1996. Download PDF.
- M.Huckvale, "Phonetic Characterisation and Lexical Access in Non-segmental Speech Recognition", Proc. Int. Congress Phonetic Science, Stockholm, Sweden, 1995. Download PDF.
- M.Huckvale, "Word Recognition from Tiered Phonological Models", Proc. Institute of Acoustics Conference on Speech and Hearing, Windermere, 1994. Download PDF.
Speech Data
The following data sets are available to research workers in speech recognition at nominal charge:
UCL1000 - 1000 read sentences
These are 1000 read sentences taken from the British National Corpus and read by 5 male and 5 female speakers of southern British English (100 sentences each). Recordings and transcriptions in standard format. Hear a sample.
If you are interested in these please contact Mark Huckvale ().
Software Tools
The following software tools are available to research workers in speech recognition free of charge:
Speech Filing System
Our general purpose speech analysis toolkit. Contains tools for isolated word recognition, hidden-Markov modelling and semi-automatic annotation. Refer to the Speech Filing System Home Page.
British National Corpus pre-processing tool
A tool for pre-processing the BNC texts into a standard format, removing punctuation and performing text normalisation.
Facilities
The Department has very good facilities for conducting research in speech recognition, including:
- an anechoic chamber for recording very high quality speech signals.
- facilities for multi-channel recording, including laryngographic analysis.
- networked Sun SPARC workstations and PCs running in-house (SFS) and commercial signal processing software (ESPS).
- software for HMM and RNN recognition
- software for language modelling
- access to standard speech and language corpora