Abstracts
|
|
Menu |
UCL
Database Definition
Duncan MARKHAM and Valerie HAZAN
Abstract
This paper describes
the design and recording procedures used in the development
of the UCL Speaker Database. This database contains
high-quality recordings of 45 speakers of South-Eastern
British English: 18 women, 15 men, 6 boys and 6 girls.
The range of materials recorded includes: VCV nonsense
words, Manchester Junior Word lists, semantically unpredictable
sentences, two read texts and semi-spontaneous speech
(description of cartoon and subsequent retelling of
story). A new word-level test, for use with children
aged seven and above - the UCL Markham word test - was
also designed and recorded; its development is described
in some detail. The bulk of the materials collected
for all 45 speakers is being made available to other
researchers as a set of two DVDs.
|
|
|
Vowel
normalization for accent: an investigation of best exemplar
locations in northern and southern British English sentences
Bronwen G. EVANS
and Paul IVERSON
Abstract
Two experiments investigated
whether listeners change their vowel categorization
decisions to adjust to different accents of British
English. Listeners from different regions of England
gave goodness ratings on synthesized vowels embedded
in natural carrier sentences that were spoken with either
a northern or southern English accent. A computer minimization
algorithm adjusted F1, F2, F3, and duration until the
best exemplar of each vowel was found. The results demonstrated
that some listeners normalize their vowel categorization
decisions based on the accent of the carrier sentence,
and that the patterns of normalization are affected
by individual differences in language background (i.e.,
the degree of experience that an individual has had
living in multidialectal environments, and whether the
individuals grew up in the north or south of England).
The patterns of normalization corresponded with the
changes in production that speakers typically make due
to sociolinguistic factors, when living in multidialectal
environments (e.g., when an individual moves from the
north to the south of England). However, the results
could not be readily explained by existing exemplar
or category assimilation models.
|
|
|
Studies
in the statistical modelling of dialogue turn pairs
in the british national corpus
Gordon HUNTER
and Mark HUCKVALE
Abstract
This article describes
some preliminary investigations into the statistical
properties of the transcribed dialogues that were collected
for the British National Corpus of English. Our aim
has been to look for evidence of linguistic structure
which could be used to build better statistical language
models for spontaneous human-human dialogues. We have
concentrated on pairs of successive, relatively short
dialogue turns. We find significant differences in the
lexical distributions for dialogues compared to written
text, as expected. Further experiments using cache,
trigger and cluster-based models applied to pairs of
turns found that interpolating such models with a standard
trigram model resulted in improvements in perplexity
compared with the perplexity scores obtained using a
trigram model alone.
|
|
|
Evaluation
of a multilingual synthetic talking face as a communication
aid for the hearing impaired
Catherine SICILIANO,
Geoff WILLIAMS, Jonas BESKOW and Andrew FAULKNER
Abstract
The goal of the Synface
project is to develop a multilingual synthetic talking
face to aid the hearing impaired in telephone conversation.
This report describes multilingual perceptual studies
to characterise the potential gain in intelligibility
derived from a synthetic talking head controlled by
hand-annotated speech. Speech materials were simple
Swedish, English and Dutch sentences typical of those
used in speech audiometry. Speech was degraded to simulate
in normal-hearing listeners the information losses that
arise in severe-to-profound hearing impairment. Degradation
was produced by vocoder-like processing using either
two or three frequency bands, each excited by noise.
12 native speakers of each of the three languages took
part in intelligibility tests in which each of the two
degraded auditory signals were presented alone, with
the synthetic face, and with a natural video of the
face of the original talker.
Intelligibility in the purely
auditory conditions was low (7% for the 2-band vocoder
and 30% for the 3-band vocoder). The average intelligibility
increase for the synthetic face compared to no face
was 20%, and was statistically highly reliable. The
synthetic face fell short of the advantage of a natural
face by an average of 18%. We conclude that the synthetic
face in its current form is sufficient to provide important
visual speech information.
|
|
|
Effects
of the number of speech-bands and envelope smoothing
condition on the ability to identify intonational patterns
through a simulated cochlear implant speech processor
Matthew SMITH
and Andrew FAULKNER
Abstract
Though many investigations
have been performed into the way that dividing the speech
spectrum into a number of channels effects the intelligibility
of speech, there appear to be few studies that have
examined the effects of manipulating the number of channels
on the ability to recognise and identify intonational
patterns in speech. Secondly, many studies have demonstrated
that voice fundamental frequency plays a significant
role in the perception of speech intonation patterns;
available evidence also suggests that changes in signal
processing (in terms of simplifying the input waveform)
could improve the perception of voice pitch changes
for users of hearing aids and cochlear implants. In
this study the ability to identify five intonational
modes was assessed using sentences produced by a single
male talker. The identification scores from seven normally-hearing
subjects were examined with speech presented through
1, 4, 8, and 16-band simulations of cochlear implant
signal processors (noise-band vocoders). Additionally,
three different envelope conditions were utilised with
each processor; this allowed the salience of temporal
envelope cues to fundamental frequency to be manipulated
and the consequent effect on identification performance
to be examined. Significant improvements in identification
ability were observed with both increased salience of
temporal envelope cues to fundamental frequency and
with increasing number of speech-bands (although the
improvement in scores from the 8-band condition to the
16-band condition was not statistically significant).
However, the level of identification improvement was
found to vary significantly with intonational mode,
suggesting that the overall identification scores may
have been a reflection of subjects’ ability to
disambiguate one or two intonational modes from the
rest.
|
|
|
A
CHOICE THEORY METHOD FOR EVALUATING AUDIOVISUAL PHONEME
RECOGNITION
Paul IVERSON
Abstract
This article describes a mathematical method, based
on Choice Theory (e.g., Luce, 1963), that can be used
to predict audiovisual phoneme confusion matrices from
unimodal audio and visual data. The predictions made
from this method can be compared to obtained levels
of audiovisual processing, for the purpose of identifying
individuals whose audiovisual integration processes
are not efficient. A reanalysis of Grant et al.'s (1998)
audiovisual consonant confusion data is presented to
evaluate this method. The results demonstrate that this
method is effective at predicting audiovisual phoneme
recognition responses, and suggests that Grant et. al's.
subjects were highly efficient at integrating audiovisual
information. Matlab code used in these analyses is available
at:
http://www.phon.ucl.ac.uk/home/paul/CT/home.htm.
|
|
|
|
|
|