In the study, pairs of children and adolescents carried out the diapix task (Van Engen et al., 2010; Baker & Hazan, 2011), a 'spot-the difference' game where they had to describe their pictures to each other to work out what the differences are. These conversations were audio-recorded and during the recording the two participants sat in different rooms and communicated via headsets with an attached microphone that was fitted with a condenser cardioid microphone (Beyerdynamic DT297). The speech of each participant was recorded on a separate channel at a sampling rate of 44100 Hz (16 bit) using an EMU 0404 USB audio interface and Adobe Audition. All children did the task three times. First they heard each other normally ('no barrier', NB condition). After this, in order to elicit clear speech, the voice of one of the speakers ('Speaker A') was either distorted in real time (via a 3-channel noise-excited vocoder, VOC condition) or played against adult multi-talker babble (BAB condition). It was expected that in VOC and BAB conditions Speaker A would have to enhance his/her speech for the benefit of Speaker B. In fact, much greater evidence of enhancement was seen in the VOC condition than the BAB condition. Speaker A was instructed to do most of the talking and Speaker B was mainly there to ask questions and make suggestions. A different picture scene was shown for each recording.
Every pair started out with a recording in NB, and pairs of participants were counterbalanced between doing two VOC recordings first or the two BAB recordings. Everyone ended with the second NB recording. Participants switched roles between recordings in each condition, so that every child was Speaker A once in the NB, VOC and BAB conditions.
Below is an example of the recordings made in the NB, BAB and VOC conditions for three different female pairs describing the Beach 1 Diapix picture. In the NB condition, the example is taken from a conversation between participants F23 and F24 (13-14 year-olds), in the BAB condition it is from participants F29 and F30 (11-12 year-olds), and in the VOC condition, it comes from a conversation between F31 and F32 (9-10 year-olds). In these examples you are hearing both speakers clearly, but note that in the recording situation Speaker B is hearing Speaker A through noise (BAB) or through a vocoder (VOC).
Baker, R., & Hazan, V. (2011). DiapixUK: task materials for the elicitation of multiple spontaneous speech dialogs. Behavior research methods, 761–770.
Van Engen, K. J., Baese-Berk, M., Baker, R. E., Choi, A., Kim, M., & Bradlow, A. R. (2010). The Wildcat Corpus of Native-and Foreign-accented English: Communicative Efficiency across Conversational Dyads with Varying Language Alignment Profiles. Language and Speech, 53(4), 510–540.