Speech perception using the UCLID CIS cochlear implant speech processor

Department of Phonetics and Linguistics

SPEECH PERCEPTION USING THE UCLID CIS COCHLEAR IMPLANT SPEECH PROCESSOR

Andrew FAULKNER, John R. WALLIKER, Stuart ROSEN, Harriet LANG, and Julian DALEY.

Abstract
The UCLID CIS speech processor (Walliker and Daley, 1997) is a highly flexible experimental tool capable of running a variety of speech processing algorithms. We have investigated an implementation of the CIS method (Wilson et al., 1991) in three adult users of the Ineraid cochlear implant. The first study compared a five-channel UCLID CIS processor to the standard Ineraid four-channel compressed-analogue processor, using consonant identification and sentence perception tests. For consonants in quiet, the CIS processor gave significantly higher scores for the three subjects. For sentences in quiet, two subjects scored similarly with both processors, but the third showed substantially higher scores with the CIS processor, both for speech in quiet and audio-visual speech in noise.

A second study investigated the effects of the relative timing of CIS stimulation pulses between electrodes. We have used a CIS processor with a fixed update rate per electrode of 500 Hz, and have varied the timing of bi-phasic pulses presented sequentially along the electrode array so that the time interval between the offset of the pulse on one electrode and the onset at its neighbour was either 10 or 300µs. Consonant identification in quiet and noise was measured with both timing configurations. There is no indication that a short 10µs interval between stimulation of adjacent electrodes leads to poorer performance in either quiet or noise: - indeed performance is significantly higher when adjacent electrodes are stimulated closer together in time.

Finally we have compared in one subject a CIS processor having a 500 Hz overall CIS rate and 10s inter-channel pulse intervals with a processor that is identical apart from using a 1000 Hz CIS cycle rate. In auditory-alone consonant identification the higher CIS rate led to higher scores both in quiet and in noise.

1. Introduction
A programme of research is currently in progress to examine issues in cochlear implant speech processing. This is based on the use of the UCLID speech processor (Walliker and Daley, 1997) which is a highly flexible experimental tool capable of running a variety of speech processing algorithms. It can drive up to eight channels simultaneously, and deliver pulses of durations down to 10µs with rise and fall times of less than 1µs.

At present the work is based on patients implanted with the INERAID 6-electrode intra-cochlear array which allows a direct electrical connection to the speech processor.

The studies presented here had two main purposes. The first was to confirm that the UCLID processor is operating effectively through a replication of previous comparisons of the continuous-interleaved-sampling (CIS) method (Wilson et al., 1991) with the compressed-analogue (CA) processing of the standard INERAID speech processor. The results were expected to verify the electrically measured performance of the UCLID processor when running the CIS algorithm.

The second aim has been to examine the effects of the relative timing of CIS stimulation pulses between electrodes. High-rate CIS methods are now being proposed that require pulses to be delivered with intervals of 50µs or less between electrodes. While electrical interactions are not expected with non-simultaneous stimulation, neural interactions may occur. The effect of the timing of stimulation between electrodes on such interactions is unknown. However, it seems a priori likely that close temporal proximity of stimulation will increase the degree of neural-interaction, especially where a degree of current spread occurs between adjacent electrodes. This might be expected to reduce the sharpness of effective frequency selectivity with possible deleterious effects on speech perception, especially in noise.

2. Study I. Comparison of UCLID CIS and INERAID CA processors
A number of studies have been reported of the speech perceptual performance of users of the Ineraid cochlear implant with CIS speech processors compared to the Ineraid CA processor. (e.g. Wilson et al., 1991; Boex et al., 1996). Typically, even after short periods of use of a CIS processor, most subjects show improved scores across a range of speech tests compared to scores with their much more familiar CA processor. The present study compares the UCLID CIS algorithm to the Ineraid processor in a similar way.

2.1 Speech processing
The Ineraid processor band-pass filters the acoustic input, after an initial slow-acting compression (AGC), using filters centred on 500, 1000, 2000 and 4000 Hz. The four filter outputs are delivered to the four most apical electrodes after a gain adjustment to avoid exceeding comfortable stimulation levels.

The CIS processing method extracts the amplitude envelope from each of a set of band-pass filters, and uses a compressed form of the extracted envelope to amplitude modulate a fixed-rate bi-phasic pulse train delivered to each of the electrodes. The pulses are presented non-simultaneously along the electrode array. Compression is applied after filtering. The CIS processor used here employed a square-root compression, which has the effect of halving level variations on a dB scale. After compression, the envelope signal is hard-limited at the current levels corresponding to the maximum comfortable stimulation level of each electrode.

We used a five-band CIS processor, whose first four analysis filters were closely matched to the filters used in the Ineraid processor. The fifth filter was centred on 6 kHz. The envelope smoothing filters used a 500 Hz cut-off frequency. In two of the subjects, the rate of stimulation per electrode was 1667 Hz, and the pulses were typically 35µs per phase. For subject RC, 65µs per phase pulses were used to increase dynamic range, with a stimulation rate per electrode of 1235 Hz. The time intervals between each phase of the pulse to one electrode, and that between the offset of the negative phase to one electrode and the onset of the positive phase to the next electrode, were typically 10µs.

2.2 Subjects
Subjects were three adult users of the Ineraid cochlear implant. In each case only five of the six Ineraid electrodes were usable as the most basal electrode gave rise to non-auditory sensations. The subjects have made use of the UCLID CIS processor only while attending for testing, whereas they have each been using the Ineraid processor daily for several years. Brief information about each subject is given in Table 1.

Table 1: Subject information

Subject Aetiology Age at profound loss Years of profound deafness to date Implanted

GA Progressive from childhood ~30 ~30 1990 :Single channel
1993: Ineraid

RC Meningitis 61 5 1994

AD Meningitis 10 45 (did not use hearing aids) 1991

2.3 Speech Assessments
An 18-item intervocalic consonant test and the BKB sentence materials have been used in speech assessment. Both tests were used in audio-visual and sound only forms and were run in quiet and with speech-spectrum shaped noise added. Signal-to-noise ratios are based on the peak levels of speech and noise.

Statistical tests are based on a general linear model analysis of variance (using SPSS) in which the subject factor is treated as a fixed rather than a random effect. This is considered to be reasonable since the subject group is small and individual subjects differ markedly in their hearing abilities using a cochlear implant, as might be expected from their differing histories of deafness. Hence, our subjects cannot reasonably be regarded as representative of a larger sample of implant users.

2.4 Results

2.4.1 Consonant identification
The results from both audio-alone and audio-visual presentation are displayed in Figure 1 as box plots. These represent the median score as the bar within each box. The number of test lists contributing to the data is shown on the X axis. Where there are more than two scores per condition, the box represents the interquartile range, and the whiskers represent the extreme values. Outlying points are shown as individual symbols.

Without lipreading, subject AD showed fairly low scores of around 30% correct. Subject GA showed slightly higher scores, while RC scored between 60 and 80% correct. Audio-visually, both GA and RC scored close to ceiling levels.

Figure 1. Consonant identification performance for three subjects using the Ineraid and CIS processors. The left panel shows auditory-alone results, while the right panel shows audio-visual results.

Table 2. GLM ANOVA for % correct consonant identification in quiet.

Source df F p -

Audio/Audio-Visual 1 183.33 0.001 **

Subject 2 61.493 0.001 **

Processor 1 5.225 0.035 *

Audio/Audio-Visual * Subject 2 9.106 0.002 **

Audio/Audio-Visual * Processor 1 0.459 0.507 -

Subject * Processor 2 0.211 0.812 -

Error 17 - - -

An ANOVA (Table 2) showed a significant overall effect of processor. On average, scores were around 5% higher with the CIS processor. There were also significant effects of subject, and of audio vs. audio-visual presentation. A significant interaction between subject and audio/audio-visual presentation was also found reflecting the range of auditory performance in the three subjects.

2.4.2 Sentence perception
The sentence results are shown in Figure 2 and Figure 3 as box plots.

Figure 2. BKB sentence scores for speech in quiet with the Ineraid and UCLID CIS processors. Left panel -- sound alone; right panel -- audio-visual.

Figure 3. Auditory-alone sentence perception for speech at 10 dB signal-to-noise ratio for subject RC only.

Table 3. GLM ANOVA of BKB sentence identification (Key Word Tight scoring)

Source df F p -

Audio/Audio-Visual 1 112.9 .001 **

Subject 2 4.712 .026 *

Processor 1 7.028 .018 *

Audio/Audio-Visual * Subject 2 12.49 .001 **

Audio/Audio-Visual * Processor 1 0.558 ns -

Subject * Processor 2 8.383 .004 **

Error 25 - - -

An ANOVA of the sentence data in quiet (see Table 3) showed a significant effect of processor on performance. There was a strong main effect of audio/audio-visual presentation mode, and an effect of subject. There was also an interaction between processor and subject, and between presentation mode and subject.

2.5 Conclusions
The CIS processor gave significantly higher scores overall for both consonant and sentence materials. For sentences, there was rather more variation in the effect of the processor across subject.

3. Study II
We have investigated the effects of the temporal proximity of pulse stimulation across electrodes using a CIS processor with a fixed update rate per electrode of 500 Hz and a 200 Hz envelope smoothing filter. The timing of bi-phasic pulses presented sequentially along the electrode array was varied so that the time interval between the offset of the negative pulse phase on one electrode and the onset of the positive pulse phase to its neighbour was either 10 or 260µs. For the 10µ sinter-pulse interval (IPI), there was a long interval between stimulation of electrode 5 and electrode 1.

3.1 CIS pulse timing for study II.
Each panel of Figure 4 shows a single cycle of the CIS sequence. Successive biphasic pulses were delivered in sequence to electrodes 1 to 5 (apex to base). Current level is not represented in the figures. Time intervals between the end of a negative pulse and the start of the next positive phase were either 10µs or 260µs. The overall cycle rate was fixed at 500 Hz. Pulse widths and current levels were constant within each subject across both pulse interval conditions.

Figure 4. Inter-electrode pulse timing conditions for Study II. Each panel shows a single 2000µs CIS cycle of pulses delivered across the electrode array. The upper panels show the closely-spaced (10µs) pulses as used with subject RC (left) and GA (right), while the lower panels show the widely-spaced (260 µs)pulses for the same two subjects.

3.2 Results of Study II.
Consonant identification scores are shown in Figure 5. Counter to the initial expectations, a longer time interval between pulses delivered to adjacent electrodes did not increase speech performance. Rather, the reverse effect was generally found. An ANOVA (see Table 4) showed significant main effects of inter-pulse interval, audio/audio-visual presentation, signal-to-noise ratio and subject. There were no significant interactions, although the subject x interpulse interval x signal-to-noise ratio interaction was close to significance

Figure 5. Consonant identification in quiet and noise as a function of between-electrode pulse interval. Panels on the left show auditory only performance for each of the three subjects, while panels to the right show audio-visual performance for subject AD and GA. White boxes (left box of each pair) show scores with 10µspulse intervals, and grey boxes scores with a 260µs interval.

Table 4. GLM ANOVA of consonant identification in study II.

Source df F Sig -

Subject 2 72.567 .001 **

Inter-pulse interval (IPI) 1 9.943 .002 **

Audio/Audio-Visual 1 123.935 .001 **

Signal-to-noise ratio (SNR) 3 24.518 .001 **

Subject * IPI 2 .172 .842 -

Subject * Audio/Audio-Visual 1 .038 .846 -

Subject * SNR 3 .873 .459 -

IPI * Audio/Audio-Visual 1 .089 .766 -

IPI * SNR 3 1.388 .254 -

Subject * IPI * Audio/Audio-Visual 1 .935 .337 -

Subject * IPI * SNR 3 2.704 .052 ~*

Error 69 - - -

The results show that the timing of pulses between electrodes has a significant effect on speech performance. Presumably the effect of pulses to adjacent electrodes being closer in time is to increase the neural interaction between electro-cochlear stimulation channels. This would seem likely to impair spectral resolution, but further investigation is needed to understand these effects.

Higher rate CIS inevitably reduces the scope for allowing longer time intervals between pulses to different electrodes, and this study appears to demonstrate that this does not have deleterious effects on speech perception.

4. Study III
The third study examines the effect of CIS cycle rate with both pulse duration, and inter-pulse interval (between electrodes) fixed. Having established an effect of the time interval between pulses to adjacent electrodes, we wanted to see how much this effect contributed to the overall effect of increasing CIS rate.

4.1 Method
CIS cycle rates of 500 and 1000 Hz were compared using the same pulse widths and pulse-spacings as in the "closely-spaced" pulses condition of study II. The low-pass cut-off frequency of the envelope extraction filter was fixed at 250 Hz. Stimulation is, therefore, identical except in respect of the CIS cycle rate.

Only one of our Ineraid subjects (RC) has so far taken part. Testing has so far been with sound alone consonant identification only. Preliminary results are shown below based on two vcv lists with each CIS cycle rate at three signal-to-noise ratios.

Results
The results are shown in Figure 6. An ANOVA (Table 5) showed that both CIS cycle rate and signal-to-noise ratio have significant effects on consonant identification. The interaction of rate and signal-to-noise ratio was not significant

Figure 6. Sound alone consonant identification by subject RC as a function of speech-to-noise ratio and CIS cycle rate.

Table 5. GLM ANOVA of results of study III.

Source df F Sig -

CIS RATE 1 6.900 .016 *

SNR 3 13.906 .000 **

CIS RATE * SNR 2 .558 .580 -

Error 21 - - -

4.3 Conclusions of study III
At least in this one subject, an increase of the CIS cycle rate from 500 to 1000 Hz appears to lead to an increased speech performance, especially in noise. Other studies of CIS rate have confounded the low-pass cut-off of the envelope extraction filter with CIS rate. Here, however, only the CIS rate has been varied We do not yet have sufficient data to compare the extent of the effect of CIS rate here to the effect of inter-pulse interval.

5. Concluding comments
We have been able to establish from these studies that the UCLID speech processor is successfully able to run a variety of CIS processing algorithms with flexible and finely timed control of stimulation parameters.

The finding that shorter time intervals between CIS pulses to adjacent electrodes leads to improved speech performance is a somewhat unexpected and important result. Further research is required to understand the mechanisms of electrical stimulation of the auditory nerve and the effects of pulse parameters on the representation of spectral and temporal speech information.

Acknowledgements
Supported by the Clothworker's Foundation and the Hearing Research Trust. We are extremely grateful to the three Ineraid users who have given their time to make this work possible, and to Addenbrooke's Hospital, Cambridge for arranging access to the Ineraid implant users.

References
Boex, C. Pelizzone, M. and Montandon-P (1996) "Speech recognition with a CIS strategy for the Ineraid multichannel cochlear implant." Am. J. Otol., 17, pp 61-68.

Walliker, J. R. and Daley, J. (1997), BSA Short Papers Meeting on Experimental Studies of Hearing and Deafness

Wilson, B., Finley, C., Lawson, D., Wolford, R., Eddington, D., and Rabinowitz, W. (1991), Nature, 352, 236-238

Department of Phonetics and Linguistics