Speech Hearing and Language: work in progress. Volume 14, 2002. Abstracts


Speech, Hearing and Language: work in progress Volume 14 2002 ISSN: 1470-8507

Abstracts		Menu
UCL Database Definition Duncan MARKHAM and Valerie HAZAN Abstract This paper describes the design and recording procedures used in the development of the UCL Speaker Database. This database contains high-quality recordings of 45 speakers of South-Eastern British English: 18 women, 15 men, 6 boys and 6 girls. The range of materials recorded includes: VCV nonsense words, Manchester Junior Word lists, semantically unpredictable sentences, two read texts and semi-spontaneous speech (description of cartoon and subsequent retelling of story). A new word-level test, for use with children aged seven and above - the UCL Markham word test - was also designed and recorded; its development is described in some detail. The bulk of the materials collected for all 45 speakers is being made available to other researchers as a set of two DVDs.		Designed and built by Martyn Holland December 2002. Click to comment. © 2002. Copyright of these documents remains with the authors. Documents may be reproduced in part or as a whole for educational and review purposes only, and with suitable acknowledgement to the authors.
Vowel normalization for accent: an investigation of best exemplar locations in northern and southern British English sentences Bronwen G. EVANS and Paul IVERSON Abstract Two experiments investigated whether listeners change their vowel categorization decisions to adjust to different accents of British English. Listeners from different regions of England gave goodness ratings on synthesized vowels embedded in natural carrier sentences that were spoken with either a northern or southern English accent. A computer minimization algorithm adjusted F1, F2, F3, and duration until the best exemplar of each vowel was found. The results demonstrated that some listeners normalize their vowel categorization decisions based on the accent of the carrier sentence, and that the patterns of normalization are affected by individual differences in language background (i.e., the degree of experience that an individual has had living in multidialectal environments, and whether the individuals grew up in the north or south of England). The patterns of normalization corresponded with the changes in production that speakers typically make due to sociolinguistic factors, when living in multidialectal environments (e.g., when an individual moves from the north to the south of England). However, the results could not be readily explained by existing exemplar or category assimilation models.		(NB version 4 or above) © 2002. Copyright of these documents remains with the authors. Documents may be reproduced in part or as a whole for educational and review purposes only, and with suitable acknowledgement to the authors.
Studies in the statistical modelling of dialogue turn pairs in the british national corpus Gordon HUNTER and Mark HUCKVALE Abstract This article describes some preliminary investigations into the statistical properties of the transcribed dialogues that were collected for the British National Corpus of English. Our aim has been to look for evidence of linguistic structure which could be used to build better statistical language models for spontaneous human-human dialogues. We have concentrated on pairs of successive, relatively short dialogue turns. We find significant differences in the lexical distributions for dialogues compared to written text, as expected. Further experiments using cache, trigger and cluster-based models applied to pairs of turns found that interpolating such models with a standard trigram model resulted in improvements in perplexity compared with the perplexity scores obtained using a trigram model alone.		© 2002. Copyright of these documents remains with the authors. Documents may be reproduced in part or as a whole for educational and review purposes only, and with suitable acknowledgement to the authors.
Evaluation of a multilingual synthetic talking face as a communication aid for the hearing impaired Catherine SICILIANO, Geoff WILLIAMS, Jonas BESKOW and Andrew FAULKNER Abstract The goal of the Synface project is to develop a multilingual synthetic talking face to aid the hearing impaired in telephone conversation. This report describes multilingual perceptual studies to characterise the potential gain in intelligibility derived from a synthetic talking head controlled by hand-annotated speech. Speech materials were simple Swedish, English and Dutch sentences typical of those used in speech audiometry. Speech was degraded to simulate in normal-hearing listeners the information losses that arise in severe-to-profound hearing impairment. Degradation was produced by vocoder-like processing using either two or three frequency bands, each excited by noise. 12 native speakers of each of the three languages took part in intelligibility tests in which each of the two degraded auditory signals were presented alone, with the synthetic face, and with a natural video of the face of the original talker. Intelligibility in the purely auditory conditions was low (7% for the 2-band vocoder and 30% for the 3-band vocoder). The average intelligibility increase for the synthetic face compared to no face was 20%, and was statistically highly reliable. The synthetic face fell short of the advantage of a natural face by an average of 18%. We conclude that the synthetic face in its current form is sufficient to provide important visual speech information.		© 2002. Copyright of these documents remains with the authors. Documents may be reproduced in part or as a whole for educational and review purposes only, and with suitable acknowledgement to the authors.
Effects of the number of speech-bands and envelope smoothing condition on the ability to identify intonational patterns through a simulated cochlear implant speech processor Matthew SMITH and Andrew FAULKNER Abstract Though many investigations have been performed into the way that dividing the speech spectrum into a number of channels effects the intelligibility of speech, there appear to be few studies that have examined the effects of manipulating the number of channels on the ability to recognise and identify intonational patterns in speech. Secondly, many studies have demonstrated that voice fundamental frequency plays a significant role in the perception of speech intonation patterns; available evidence also suggests that changes in signal processing (in terms of simplifying the input waveform) could improve the perception of voice pitch changes for users of hearing aids and cochlear implants. In this study the ability to identify five intonational modes was assessed using sentences produced by a single male talker. The identification scores from seven normally-hearing subjects were examined with speech presented through 1, 4, 8, and 16-band simulations of cochlear implant signal processors (noise-band vocoders). Additionally, three different envelope conditions were utilised with each processor; this allowed the salience of temporal envelope cues to fundamental frequency to be manipulated and the consequent effect on identification performance to be examined. Significant improvements in identification ability were observed with both increased salience of temporal envelope cues to fundamental frequency and with increasing number of speech-bands (although the improvement in scores from the 8-band condition to the 16-band condition was not statistically significant). However, the level of identification improvement was found to vary significantly with intonational mode, suggesting that the overall identification scores may have been a reflection of subjects’ ability to disambiguate one or two intonational modes from the rest.		© 2002. Copyright of these documents remains with the authors. Documents may be reproduced in part or as a whole for educational and review purposes only, and with suitable acknowledgement to the authors.
A CHOICE THEORY METHOD FOR EVALUATING AUDIOVISUAL PHONEME RECOGNITION Paul IVERSON Abstract This article describes a mathematical method, based on Choice Theory (e.g., Luce, 1963), that can be used to predict audiovisual phoneme confusion matrices from unimodal audio and visual data. The predictions made from this method can be compared to obtained levels of audiovisual processing, for the purpose of identifying individuals whose audiovisual integration processes are not efficient. A reanalysis of Grant et al.'s (1998) audiovisual consonant confusion data is presented to evaluate this method. The results demonstrate that this method is effective at predicting audiovisual phoneme recognition responses, and suggests that Grant et. al's. subjects were highly efficient at integrating audiovisual information. Matlab code used in these analyses is available at: http://www.phon.ucl.ac.uk/home/paul/CT/home.htm.		© 2002. Copyright of these documents remains with the authors. Documents may be reproduced in part or as a whole for educational and review purposes only, and with suitable acknowledgement to the authors.