In the training
studies, learners were recorded producing a series of key words containing the sound /l/ and /r/ pre- and post-training.
Our next aim was therefore to judge whether purely perceptual training would
lead to significant improvements in their production of words containing the
key sounds.
14 Native British listeners evaluated the pre/post utterances of a subset of Japanese trainees, who had received 10 sessions of perceptual training in 3 different conditions. Altogether there were 20 pre/post tokens per trainee, yielding a total of 500 tokens for 25 trainees. The trainees had completed the training study in the following conditions: Audio-visual training (10 students), Audio training (10 students) and training with animated face (5 students).
Two
independent perceptual evaluation tests were carried out: a minimal-pair
identification task and a quality rating task. For each test, 10 listeners
evaluated the productions of each Japanese subject. They were asked to focus on
the /l/-/r/ realizations in the word rather than on a correct pronunciation of
the word itself.
Results
(see Figure 3) showed that correct identification of the consonants produced by
the L2 learners increased significantly post-training, and the difference
between the post- and pre-production scores was significantly greater for
listeners who had been trained audiovisually with a natural face than for those
trained auditorily (or with an artificial face, although this might be due to a
ceiling effect in the production of /r/ for that group). Quality ratings of the
consonants by native listeners also improved significantly post-training, with
a greater difference in ratings obtained for those trained audiovisually with
natural face than those trained audiovisually with artificial face.
Overall, therefore, audiovisual
training in the perception of difficult sounds appears to have the greatest
impact on the quality and intelligibility of trainees’ production of the
sounds, even if the effect is relatively small in absolute terms.

Figure 3: Percentage of correct /l/-/r/ identification by native English speakers when listening to productions of words containing these sounds by Japanese learners of English before and after training.