Download Full Public Final Report (zipped Word 6 document: 429kB)
The aim of the OSCAR project is to develop and evaluate a novel acoustic and vibrotactile speech analysing hearing aid that is adaptable to the needs and sensory abilities of a wide range of the profoundly and totally hearing disabled population. This heterogeneous population is currently served by a diversity of specialised aids. The OSCAR aid anticipates a versatile product that can meet the needs of many, and so allow economies of scale in manufacture and a common approach to fitting an aid to the individual's residual sensory capacity. It is a DSP-based demonstrator designed for uni- or bi-modal stimulation of the auditory and tactile senses.
The acoustic processing approach has been defined jointly by UCL and Oticon A/S. The tactile stimulation approach has been determined by KTH and is described elsewhere (Spens et al., this meeting). The aid incorporates both conventional "whole-speech" presentation and speech selective analysis that extracts information chosen for its effectiveness in supplementing lipreading. These elements are presented in a form matched to the limited receptive capacity of residual hearing and the tactile sense. The speech-analytic and "whole-speech" processing are selected by a switch so that the user has instant access to the processing that is more appropriate to their immediate needs.
In the speech-analytic mode, the time-pattern of voiced and voiceless excitation is represented, together with amplitude envelope information. For the auditory sense, voice fundamental frequency information is also provided. Here, voiced speech is represented by a frequency and amplitude controlled sinusoid, and voiceless speech by a random noise with a spectrum and dynamic range matched to the user's comfortable hearing area. These acoustic signals can be readily controlled to have frequency and intensity ranges that match the range of residual hearing. They are designed to convey temporally encoded speech information that is psychoacoustically matched to the temporal processing capabilities of residual hearing. The choice of simple and purely temporally coded information is made because the auditory filtering required for the ear to extract this information from the spectrally complex speech signal is severely impaired or absent in profound hearing loss (Faulkner et al., 1990).
Recent analyses (Faulkner and Rosen, 1996) of the speech information transmission of these selected speech elements show that the time pattern of voiced and voiceless excitation is the principal supplementary information for audio-visual consonant identification. Variations in both voice fundamental frequency and speech amplitude contribute the prosodic information that is required for the effective perception of connected speech, but were found not to contribute significantly to the identification of English consonants. These same information elements have also proved effective in laboratory trials with profoundly hearing impaired subjects (Faulkner et al., 1993)
For the tactile sense, voiced and voiceless excitation and the associated amplitude envelopes are represented by two vibrators. The voiceless pattern is presented by a newly developed tangential vibrator that stimulates the non-Pacinian skin receptors (Huss, this meeting).
The speech analytic processing is a development of that used in the SiVo-II hearing aid that was evaluated in the preceding STRIDE project (Faulkner, 1995). A trained artificial neural-network (ANN: Wei et al., 1993) is employed that can extract from noisy and reverberant speech the temporal voicing pattern and voice fundamental frequency. Voiceless excitation is detected by a spectral-balance comparator.
The ANN is a substantial modification of that used in the preceding STRIDE project. It is trained on quiet and noisy speech to produce an output that corresponds as closely as possible to the cycle-by-cycle period of larynx vibration as indicated by a laryngograph signal. Comparisons by Bosman and Smoorenburg (1993) of this earlier STRIDE ANN to the SHS algorithm (Hermes, 1988) and a reference LPC fundamental frequency method (Entropic, 1991) showed that the STRIDE ANN was remarkably effective in the voiced/voiceless classification of speech in moderate levels of noise. However, its precision in fundamental frequency estimation was poorer that these other methods. The temporal analysis window of the STRIDE ANN is short (10.5 ms) compared to the 40 ms or longer windows of these other methods. Bosman and Smoorenburg also showed that the precision of the STRIDE ANN could be substantially improved if its output were integrated over a 40 ms time window. This finding led us to develop and implement a post-processing buffer that significantly improves the ANN performance by comparing voice fundamental period estimates with immediately prior period estimates, and adjusting the algorithm's thresholds to favour estimates that are similar to recent estimates.
Figure 1. Percentage correct
scores in audio-visual consonant identification from 11 profoundly
hearing-impaired subjects. The box and whisker plot shows the median,
interquartile range and range excluding outliers. The white boxes represent
scores with a conventional hearing aid, and black boxes scores using the
SiVo-II
User trials of the acoustic aid are currently commencing at Instituut voor Doven (Sint-Michielsgestel), Fondation Rothschild (Paris) and UCL. User trials of the tactile aid are in progress at KTH. A bimodal version of the aid capable if simultaneous acoustic and tactile stimulation will be used in trials later this year. The acoustic aid is also being used for field trials in Beijing for profoundly hearing impaired speakers of the Mandarin Chinese tone language, for whom a fundamental frequency extracting aid is expected to have especial value.
The noise suppression provided by ANN voice fundamental frequency extraction algorithm has been already been validated in speech perceptual tests with profoundly hearing impaired subjects. In audio visual consonant identification, the SiVo aid was shown, in a group of 11 listeners, to provide useful lipreading support at speech to noise ratios down to +5 dB. At this signal-to-noise ratio, the same listeners received virtually no lipreading support from conventional amplifying aids. These results are shown in figure 1. Comparable results have also been found for sentence materials.
The project also involves the development and use of a PC-based system for audio-visual speech perceptual assessment known as ASTEC (Speech Assessment Test Editor And Controller: Pavlovic et al., 1995). The MS-Windows-based ASTEC software is designed to allow the implementation of a wide range of speech perceptual tests and includes a database for the storage and analysis of clinical data. It is being used to carry out tests of audiovisual consonant identification and auditory-alone stress placement that are designed to be comparable across the main European languages. ASTEC has been developed by Brousseau at CNRS Departement de Parole et Langage at Aix-en-Provence.
Other elements of the project include:
The OSCAR project is supported by the TIDE sector of CEC DG XIII.
Bosman A. J. and Smoorenburg G. F. (1996) "Evaluation of three pitch tracking algorithms at several signal-to-noise ratios", Acta Acustica, in press.
Deeks, J. and Faulkner, A. (1994) "Residual spectral and temporal processing in signalling timbre contrasts in synthetic vowels" Speech, Hearing and Language, Work in progress, Department of Phonetics and Linguistics, University College London, 8, 141-162.
Drullman, R., and Smoorenburg, G. F. (1996). "Multichannel amplitude compression for the profoundly hearing impaired". Proc. ISAC-96
Entropic (1991) Entropic Signal Processing System (ESPS) version 4.1 Manual for user-level programs and technical memoranda. Entropic Speech Inc., Washington D.C.
Faulkner, A. (1995) "Final Report of TIDE Project 133/206": Ref. STRIDE/1995/2, Department of Phonetics and Linguistics, University College London
Faulkner, A., Rosen, S., and Moore, B. C. J. (1990) "Residual frequency selectivity in the profoundly hearing-impaired listener," Br. J. Audiol. 24, 381-392.
Faulkner, A., Walliker, J. R., Howard, I. H., Ball, V., and Fourcin, A. J. (1993) "New developments in speech pattern element hearing aids for the profoundly deaf" Scand. Audiol. Suppl., 38, 124-135.
Hermes, D. J. (1988) "Measurement of pitch by subharmonic summation" J. Acoust. Soc. Am., 91, 2136-2155
Huss, C, (1996) "Radial and tangential tactile stimulation: some masking aspects". Proc. ISAC-96
Pavlovic, C., Brousseau, M., Howells, D., Miller., D., Hazan, V., Faulkner., A. and Fourcin A. (1995) "Analytic assessment and training in speech and hearing using a poly-lingual workstation, EURAUD" In: I. Placencia Porrero and R. Puig de la Bellacasa (Eds) The European Context for Assistive Technology, IOS Press, Amsterdam, pp 332-335
Vickers, D. A. and Faulkner A (1995) "Noise spectrum discrimination by severe-to-profoundly hearing-impaired listeners". In: Psychoacoustics, Speech and Hearing Aids, ed. B. Kollmeier, World Scientific, Singapore, in press.
Wei J, Howells D, Fourcin AJ, and Faulkner A. (1993) "Larynx period and frication detection methods in speech pattern hearing aids" Speech, Hearing and Language, Work in progress, Department of Phonetics and Linguistics, University College London, 7, 269-276.