Speech, Hearing and Language: work in progress
Volume 11
1999
ISSN: 1470-8507
shl home

 

 Menu

The perceptual magnet effect is not specific to speech prototypes: new evidence from music categories.
Sarah BARRETT

Abstract
Previous work on prototypicality in music has led to the claim that music prototypes act in the opposite way to speech prototypes - as anchors rather than magnets. In one such study, professional musicians were given discrimination tasks in which they had to distinguish acoustically-similar sounds in the context of both a prototypical and non-prototypical C-major chord (Acker et al., 1995). In contrast to what has been found for American English listeners for various speech-sound categories (e.g. Kuhl, 1991), professional musicians show enhanced discrimination in the region of the prototype. The present study questions whether the performance of such musically-trained subjects is representative of the average listener's perception of music categories. Here, 10 non-musicians as well as 10 musicians were given a discrimination task in which they were required to distinguish prototypical and non-prototypical C-major chords from a series of acoustically-similar variants. Unlike the musicians who showed enhanced discrimination in the context of the prototype, the non-musicians showed reduced discrimination. The results have implications for the applicability of the perceptual magnet effect to domains other than speech and are interpreted in terms of a new theory called A&R theory, which suggests that prototypes have a dual role in the perceptual system depending upon the amount of attention paid to them by the listener.    

TOP           Download pdf file now


 

Return to SHL 11 opening page

return to Contents page

Database of individual publications


Click to link to Adobe Acrobat download site


 

Designed and built by
Martyn Holland
February 2000.
Click to comment.

Barrett  Faulkner(a)  Faulkner(b)  House  Huckvale  Taniguchi  
Estebas
  Wells(a)  Wells(b)  Baker(a)  Rosen  Baker(b)

 

Periodicity and Pitch Information in Simulations of Cochlear Implant Speech Processing
Andrew FAULKNER, Stuart ROSEN and Clare SMITH

Abstract
Pitch, periodicity and aperiodicity are regarded as important cues for the perception of speech. However, modern CIS cochlear implant speech processors, and recent simulations of these processors, provide no explicit representation of these factors. We have constructed four-channel vocoder processors that manipulate the representation of periodicity and pitch information, and examined the effects on the perception of speech and the ability to identify pitch glide direction.

A vocoder providing highly salient pitch and periodicity information used a pulse train source during voiced speech, and a noise source in the absence of voicing. The pulse train was controlled by voice fundamental frequency. A second condition provided a salient auditory contrast to periodicity but no pitch information, through the use of a fixed rate pulse source during voicing, and a noise source at other times. Further processing conditions were independent of input speech excitation. One such condition used a constant pulse train throughout, with neither periodicity nor pitch represented. Two further conditions used a noise source throughout. In one noise condition, the amplitude envelope extracted from each band was low-pass filtered at 32 Hz, eliminating pitch and periodicity cues from the envelope. In the second noise condition, the envelope was low-pass filtered at 400 Hz; this was expected to provide a relatively weak indication of pitch and periodicity.

The vocoder using a pulse source that followed the input fundamental frequency gave substantially higher performance in identification of frequency glides than vocoders using noise carriers, which in turn showed better performance than processors using a fixed rate pulse carrier. However, performance in consonant and vowel identification and sentence recognition was remarkably similar through all of the processors. Connected discourse tracking rates were affected by the envelope filter of the noise carrier processors, although this effect was small. We conclude that whilst the processors achieved the desired control over the salience of pitch and periodicity, the speech tasks used here show little sensitivity to this manipulation.

TOP           Download pdf file now


 
Barrett  Faulkner(a)  Faulkner(b)  House  Huckvale  Taniguchi  
Estebas
  Wells(a)  Wells(b)  Baker(a)  Rosen  Baker(b)
 

Effects of the number of channels and speech-to-noise ratio on rate of connected discourse tracking through a simulated cochlear implant speech-processor
Andrew FAULKNER, Stuart ROSEN AND Lucy WILKINSON

Abstract
A number of recent studies have investigated simulations of cochlear implant speech processors with the aim of establishing the minimum number of channels required to support speech perception in quiet and in noise. These studies have all used citation form consonant and vowel stimuli or simple sentences. Intelligibility measures for such materials, especially sentences, can often show ceiling effects. The present study has examined this issue using connected discourse tracking, a task that can be less subject to ceiling effects and is more representative of everyday communication. Speech processing employed a real-time sine-excited vocoder having three, four, eight or 12 channels. Amplitude envelopes extracted from each band modulated sinusoidal carrier signals placed at each band centre frequency. Speech-spectrum shaped random noise was added to speech prior to the vocoder processing to give three signal-to-noise ratios of +7, +12, and +17 dB. Noise levels were adjusted in real time according to measurements of speech level. Connected discourse tracking rates through the vocoders increased significantly with number of channels up to 12 in both quiet and noise, and decreased significantly with each increase in the noise level from quiet. For natural speech, these levels of noise had little effect on tracking rate. We conclude that with connected speech, optimal performance from a cochlear implant in the quiet and in modest levels of noise is likely to require more than eight independent frequency channels.

TOP
          Download pdf file now


 
Barrett  Faulkner(a)  Faulkner(b)  House  Huckvale  Taniguchi  
Estebas
  Wells(a)  Wells(b)  Baker(a)  Rosen  Baker(b)
 

Intonation Modelling in ProSynth
Jill HOUSE, Jana DANKOVICOVA and Mark HUCKVALE

Abstract
ProSynth uses a hierarchical prosodic structure (implemented in XML) as its core linguistic representation. To model intonation we map template representations of F0 contours onto this structure. The template for a particular pitch pattern is derived from analysis of a labelled speech database. For a falling nuclear pitch accent this template has three turning points: two define the F0 peak and one marks the end of the F0 fall. Statistical analysis confirmed that the alignment and shape of the template are sensitive to the properties of the structure and also provided quantitative values for F0 synthesis. Our results suggest that phonetic interpretation of the nuclear pitch accent is best related to the accented Foot rather than to the accented syllable. In determining parameter values for synthesis, we conclude that F0 information should be integrated with temporal and segmental information.

TOP           Download pdf file now


 
Barrett  Faulkner(a)  Faulkner(b)  House  Huckvale  Taniguchi  
Estebas
  Wells(a)  Wells(b)  Baker(a)  Rosen  Baker(b)
 

Opportunities for re-convergence of engineering and cognitive science accounts of spoken word recognition
Mark HUCKVALE

Abstract
This article traces the roots of the divergence between the engineering community and the cognitive science community accounts of word recognition. It argues that although there are cultural differences, when looked at objectively, there is considerable overlap in the desires and motivations of the two communities. It suggests that the criticisms of engineering systems that caused the original divergence in the late 1970s are much less valid today, that re-convergence is timely and will help create a theory of speech processing which will explain both primary and emergent phenomena. It proposes that the study of LVCSR systems as if they were human, and the study of humans as if they were LVCSR systems, could lead to a research agenda which would benefit both communities. It introduces elements of a programme to encourage joint research and co-operation.

TOP           Download pdf file now


 
Barrett  Faulkner(a)  Faulkner(b)  House  Huckvale  Taniguchi  
Estebas
  Wells(a)  Wells(b)  Baker(a)  Rosen  Baker(b)

 

Effect of interactive visual feedback on the improvement of English intonation of Japanese EFL learners
Masaki TANIGUCHI and Evelyn ABBERTON

Abstract
This paper is based on research dedicated to helping to improve the teaching and learning of English intonation (prosody) for Japanese EFL learners. It attempts to evaluate the effectiveness of the use of real time interactive visual feedback on the learners' approximation of their fundamental frequency contours to those of native speakers. It also attempts to investigate characteristic features of Japanese EFL learners' English intonation and how their Japanese accents are affecting their English intonation. This investigation enabled us to reaffirm our confidence in the effectiveness of interactive visual feedback of the voice fundamental frequency pattern in helping Japanese EFL learners improve their English intonation. We saw that there was a great difference in improvement between the group of learners who had the advantage of being exposed to interactive visual feedback for an hour every day in the two-week course and the group of learners who did not. We also found that the use of tone marks helped the learners a great deal, but an important finding was that if no tone marks were provided, it was extremely difficult for the learners to improve without any interactive visual feedback. With the use of interactive visual feedback, the learners were able to improve even in material without tone marks.

TOP           Download pdf file now


 
Barrett  Faulkner(a)  Faulkner(b)  House  Huckvale  Taniguchi  
Estebas
  Wells(a)  Wells(b)  Baker(a)  Rosen  Baker(b)
 

The intermediate phrase in central Catalan declaratives: a case for questioning the representation of downstep
Eva ESTEBAS I VILAPLANA & John A. MAIDMENT

Abstract
This paper examines two aspects of the intonation of S(ubject) V(erb) O(bject) Central Catalan declaratives produced in reading speech. First, it deals with the identification of an intermediate level of prosodic phrasing in Central Catalan declaratives. Second, it analyses the immediate implications of this intermediate phrase on the phonological representation of the F0 contours and, in particular, on the interpretation of downstep. Three different cues are used for the identification of the prosodic boundaries: a pause, a local F0 fall, and the lengthening of the boundary syllable. In all sentences, an intermediate level of prosodic structure, marked with a H- phrase accent, is observed between the subject and the verb. The tonal representation of the sentences is determined through both an auditory analysis and an acoustic analysis of the data. A pitch reset is observed at the beginning of the second intermediate phrase, which starts with a drastic lowering on the peak of the first pitch accent. Evidence for treating this lowering as an intended downstep movement is provided with the analysis of speaking rate differences.

TOP           Download pdf file now


 
Barrett  Faulkner(a)  Faulkner(b)  House  Huckvale  Taniguchi  
Estebas
  Wells(a)  Wells(b)  Baker(a)  Rosen  Baker(b)
 

Overcoming phonetic interference
John C. WELLS

Abstract
The phenomenon of phonetic interference in foreign language learning is addressed by considering first the phonetics of loan-words from Japanese to English and from English to Japanese and then the specific pronunciation difficulties it causes Japanese learners of English. There are well-known problems with and with the phonemic contrasts . Other difficulties are context-dependent: e.g. both /s/ and /t/ before high front vowels. Many involve phonotactics: consonant clusters, final consonants, the Japanese mora vs. the English syllable. Compound stress is also discussed. In dealing with all of these problems, ear-training may be as important for the learner as articulation practice.


TOP           Download pdf file now


 
Barrett  Faulkner(a)  Faulkner(b)  House  Huckvale  Taniguchi  
Estebas
  Wells(a)  Wells(b)  Baker(a)  Rosen  Baker(b)
 

Pronunciation preferences in British English: a new survey
John C. WELLS

Abstract
A second poll of BrE pronunciation preferences was carried out in late 1998. It was based on a self-selected sample of nearly 2000 'speech-conscious' respondents, who answered a hundred questions about words of uncertain or controversial pronunciation. The findings allow us to answer questions about lexical incidence and sound changes in progress

TOP           Download pdf file now


 
 

Auditory filter nonlinearity in mild/moderate hearing impairment
Richard J. BAKER and Stuart ROSEN

Abstract
Sensorineural hearing loss has frequently been shown to result in a loss of frequency selectivity. Less attention has been paid to the level dependency of selectivity that is so prominent a feature of normal hearing. The aim of the present study is to characterise such changes in nonlinearity as manifested in the auditory filter shapes of listeners with mild/moderate hearing impairment. Notched-noise masked thresholds were measured over a range of stimulus levels at 2kHz in hearing-impaired listeners with losses of 20-50 dB. Growth of masking functions for different notch-widths are more parallel for hearing impaired than for normal hearing listeners, indicating a more linear filter. Level dependent filter shapes estimated from the data show relatively little change in shape across level. The loss of nonlinearity is also evident in the input/output functions derived from the fitted filter shapes. Reductions in nonlinearity are clearly evident even in a listener with only 20 dB hearing loss.

TOP           Download pdf file now


 
Barrett  Faulkner(a)  Faulkner(b)  House  Huckvale  Taniguchi  
Estebas
  Wells(a)  Wells(b)  Baker(a)  Rosen  Baker(b)
 

The relationship between speech and nonspeech auditory processing in children with dyslexia
Stuart ROSEN & Eva MANGANARI

Abstract
Although there is good evidence that some dyslexic children show at least small deficits in speech perceptual tasks, it is not yet clear the extent to which this results from a general auditory, as opposed to a specifically linguistic/phonological problem. Here we have investigated the extent to which performance in backward and forward masking can explain identification and discrimination ability for speech sounds in which the crucial acoustic contrast (the second formant transition) is followed ("ba" vs. "da") or preceded ("ab" vs. "ad") by a vowel. More specifically, we expect children with elevated thresholds in backward masking to be relatively more impaired for tasks involving "ba" and "da" than for tasks involving "ab" and "ad". In order to determine whether poor performance with speech sounds reflects a general deficit for perceiving formant transitions, we also constructed nonspeech analogues of the speech syllables - the contrastive second formant presented in isolation.

Two groups of 8 children matched for age (mean of 13 years) and nonverbal intelligence were selected to be well separated in terms of their performance in reading and spelling. All underwent the same set of auditory tasks: 1) forward, backward and simultaneous masking with a short (20 ms) 1-kHz probe tone in a broadband and notched noise; 2) identification as "b" or "d" of synthetic "ba"-"da" and "ab"-"ad" continua; 3) same/different discrimination of pairs of stimuli drawn from the endpoints of the two speech continua (e.g., "ba-da", "da-ba", "da-da", "ba-ba"), as well as their nonspeech analogues.

There were no differences between dyslexic and control children in forward and simultaneous masking, but thresholds for backward masking in a broadband noise were elevated for the dyslexics as a group. Overall speech identification and discrimination performance was superior for the controls (barely so for identification), but did not differ otherwise for the two speech contrasts (one of which should be influenced by backward masking, and one by forward). Thus, although dyslexics show a clear group deficit in backward masking, this has no simple relationship to the perception of crucial acoustic features in speech. Furthermore, the deficit for the nonspeech analogues was much less marked than for the speech sounds, with of the dyslexic listeners performing equivalently to controls. Either there is a linguistic/phonological component to the speech perception deficit, or there is an important effect of acoustic complexity.

TOP           Download pdf file now


 
 

Minimising boredom by maximising likelihood - an efficient estimation of masked thresholds.
Richard J. BAKER and Stuart ROSEN

Abstract
One of the main problems in carrying out psychoacoustic experiments is the time required to measure a single threshold. In this study we compare the accuracy of threshold estimation in a 2I2AFC task for detecting a 2kHz tone in either a broadband noise or a notched-noise. Tone thresholds were estimated in three normal-hearing listeners using either a Levitt procedure to track 79% correct, or a maximum-likelihood estimation (MLE) procedure to track 70, 80 or 90% correct. Given the chosen parameters for the different procedures, the MLE procedure proved to be approximately 2.5 times faster at estimating masked thresholds than the Levitt procedure. Only thresholds using the 70% MLE procedure were significantly different in magnitude from those obtained using the Levitt procedure. To test the repeatability of the measurements the standard deviations (SD) of the threshold were calculated. Statistical analyses show smallest SDs for the Levitt and 90% MLE procedures, with significantly larger SDs for the 70% and 80% MLE.

TOP                    Download pdf file now


 
Barrett  Faulkner(a)  Faulkner(b)  House  Huckvale  Taniguchi  
Estebas
  Wells(a)  Wells(b)  Baker(a)  Rosen  Baker(b)
 

Other links:
Speech Hearing and Language volume 10
Speech Hearing and Language volume 9 Working Papers in Linguistics
link to UCL home page link to Phonetics and Linguistics home page