| |||
A research project funded by the Economic and Social Sciences Research Council (ESRC)
|
Grant Period: |
January - December 2004 |
|
Grant Award: |
£46,400. [RES-000-22-0445] |
|
Investigators: Research Fellows |
Kerry Bannister |
Overview
This study investigated whether the learning of new phonemic categories could best be promoted by an auditory training approach that emphasises exposure to natural speech variability or one that directs the learner's attention to key acoustic cues via a process of speech enhancement. The focus of the study was the learning of the contrast between the English sounds /r/ and /r/ for Japanese learners of English; different cohorts of learners were trained using one of four training approaches. The study tested the validity of the exemplar model of speech perception which predicts that exposure to natural variability is essential for category learning. Hopefully, it will also enable us to improve the efficiency of intensive auditory training, an approach that is increasingly being used with second-language learners and language- or hearing-impaired children.
Objectives
The study asked the following questions:
1. Is the learning of new phonemic categories best promoted by exposing the learner to natural token variability or by enhancing key acoustic cues to the phonemic contrast?
2. Is exposure to natural tokens essential for category learning, thus supporting an exemplar model of speech perception?
3. Is category learning best promoted by enhancing key acoustic cues or to by providing 'negative feedback' when an uninformative acoustic cue is used by the second-language learner?
Training conditions
There were four different training conditions.
- High Variability Phonetic Training. Learners were trained on normal speech. Listen to some examples
- Perceptual Fading. At the start, acoustically-enhanced stimuli were used. LPC analysis and resynthesis was used to push F3 to more extreme values; PSOLA was used to add 100 ms to the onset duration. The level of enhancement was decreased each day so that by Day 7 there was no enhancement (same as normal stimuli) and Day 10 had less extreme F3 values than normal.
- All Enhancement. The acoustic enhancement was the same as at the start of the Perceptual Fading training but the training stimuli remained fully enhanced every day. Listen to some examples
- Secondary Cue Variability. At the start of training, the duration of the closure, the duration of the transition, and F2 frequency were set to neutral values (i.e., the stimuli were equated in terms of these secondary cues). These manipulations were accomplished using PSOLA and LPC analysis and resynthesis. The random variability of these cues was increased each day, such that by Day 10 subjects could hear short or long closures, short or long transitions, and high or low F2 values, randomly mixed and matched for both /r/ and /l/. Listen to some examples
![]()
Figure 1: Spectrograms of /r/-/l/ stimuli using in the Perceptual Fading training condition.
Subjects
The subjects were 61 native speakers of Japanese who were attending English-language courses (41 at Kochi University in Japan, 20 in London). They were screened so that their identification of /r/ and /l/ was less than 90% correct before training. Subjects were randomly assigned to training methods (i.e., each subject was trained using only one of the 4 methods).
Stimuli and speech recordings
The phonemic distinction under investigation was the English /r/-/l/ contrast which is difficult to perceive for Japanese learners of English as both English phonemes get assimilated to the same native phoneme category (an alveolar flap). Native English listeners primarily perceive this distinction on the basis of changes in the third formant transition at vowel onset. The phonemes /r/ and /l/ also vary in terms of a number of secondary cues (second formant transition at vowel onset, closure duration, transition duration), although these changes are not as consistent as the changes in F3 transition.
High-quality digital recordings of 12 native speakers of English were made in an anechoic chamber. The recordings included a set of 512 words: minimal pairs of words in which /r/ and /l/ appeared in either initial or medial position, and a small set of ‘filler words’ which did not include /r/ and /l/. A portion of these materials were used in the pre/post training tests, which evaluated the learners’ perception of /l/-/r/ in word-initial (singleton and cluster) and word-medial positions. The rest of the words were used for training. These included 50 minimal pairs of real words in which /r/ and /l/ appeared in initial position only.Training Procedure
The same basic procedures were used for each training method. The training task was a /r/-/l/ word identification task, with feedback provided after each response. Stimuli were 100 words (from 50 minimal pairs) in which /r/ or /l/ appeared in initial position. There were 10 different talkers, with one talker per day. Ten training sessions were carried out over a 2-3 week period, with 300 trials per session (3 repetitions of each stimulus). After each training session, subjects were given a short tracking test to evaluate how well they could identify natural stimuli from the trained talker.
Results
- Across all procedures and stimuli, the identification of /r/-/l/ in initial position improved significantly, by an average of 18 percentage points [F(1,58) = 102.0, p < 0.001] (See Figure 2)
![]()
Figure 2: Pre-post training performance for /r/-/l/ in initial position with trained talkers and words
- Training generalized to new talkers and words, as there was no significant interaction of stimuli and improvement (p > 0.05). (See Figure 3)
![]()
Figure 3: Pre-post training performance for /r/-/l/ in initial position with new talkers and words
- There was no significant effect of training procedure (p > 0.05), but there was a trend for less improvement in identification in the ‘All Enhanced’ condition.
- The rate of learning did not differ significantly across training conditions (See Figure 4)
![]()
Figure 4: Scores achieved on daily tracking test (with normal stimuli) by learners from the four training groups
- Learners began with a strong bias towards calling stimuli with long closures as /r/ and stimuli with short closures as /l/. Following training, there was a significant reduction in /r/ bias for long-closure stimuli, but there was no effect of training procedure (See Figure 5)
![]()
Figure 5: Identification of stimuli with long closure pre- and post-training.
Conclusions
These results enabled us to answer our research objectives as follows:
- Is the learning of new phonemic categories best promoted by exposing the learner to natural stimuli or to stimuli with signal-processed acoustic cues?
Our results show that it is not necessary to expose listeners to natural stimuli in order to train a new phonetic contrast; there were no significant differences between our natural and signal-processed training conditions. On the positive side, this supports the basic approach of using signal-processed speech in the future, because such stimuli can clearly be effective for training. However, our signal-processed conditions did not actually improve upon natural speech; at the present time training with natural speech appears to be the easiest way of improving /r/-/l/ identification performance.
- Is it more effective to alter perceptual cue weightings by enhancement of the primary acoustic cues or by enhancement of the secondary cue variability?
Exaggerating secondary cue variability in order to detract the learner’s attention away from these unreliable cues was not more effective than using stimuli in which primary cues were enhanced, or indeed than using natural speech stimuli. Generally, all training was successful to a certain extent in directing learners’ attention away from uninformative acoustic cues.
Publications
Iverson, P., Hazan, V, Bannister, K. (2005) Phonetic training with acoustic cue manipulations: A comparison of methods for teaching English /r/-/l/ to Japanese adults, Journal of the Acoustical Society of America, 118, 3267-3278.
Iverson, P., Bannister, K. and Hazan, V. (2004) Auditory training with phonetic variability and acoustic enhancement: A comparison of /r/--/l/ training techniques for Japanese adults. J. Acoust. Soc. Am. 116, 2573.
Resources
High-quality recordings of a large number of /r/-/l/ minimal pairs (with /r/-/l/ in initial and medial position) were made for 12 speakers to provide testing and training materials for this project. If you would like to have access to these materials for research purposes, please contact Valerie Hazan.
Our collaborators
- Prof. Masaki Taniguchi, Professor of English Phonetics and Speech at the University of Kochi in Japan, ‘hosted’ our main /l/-/r/ training study over a five-week period.
Related Issues
If you would like to know more about the project, please contact Paul Iverson or Valerie Hazan