Justin
IEE Meeting on State-of-the-Art in Speech Synthesis
Please find an abstract below. The paper will actually be jointly
authored by the ProSynth project team, who all contributed to the
work. Thus the names of the final authors on the paper may change.
Yours
Mark Huckvale
=============================================================================
Title:
Assessment of Naturalness in the ProSynth Speech Synthesis Project
Authors: (for ProSynth speech synthesis project)
Sebastian Heid, Sarah Hawkins
Linguistics
University of Cambridge
Jill House, Mark Huckvale
Phonetics & Linguistics
University College London
Abstract:
Scientific progress in any area requires quantification of the mismatch
between the predictions of theory and the measurements of nature. For many
years the objective criterion of intelligibility could be used to evaluate
speech synthesis systems, but with the advent of concatenative and corpus
synthesis techniques, intelligibility under good listening conditions has
reached ceiling levels. However, the perceived naturalness of such systems
still falls far short of the human model, and their intelligibility under
adverse listening conditions deteriorates rapidly by comparison with
natural speech. Other objective measures are therefore required to evaluate
scientific hypotheses about the reasons for the discrepancy.
The most common measure of naturalness used by commercial systems is a mean
opinion score, calculated from a panel of listeners applying a rating scale.
Such measures are expensive to undertake and unreliable unless the panel
size is large. Within the ProSynth synthesis project we have been pursuing
instead carefully designed perceptual experiments which attempt to tap the
degree of difficulty in cognitive processing experienced by listeners when
performing tasks informed by synthetic speech. Within such experiments it
is possible to make statistical comparisons between speech generated
according to models of differing complexity, and thus to assess the
perceptual advantage of an 'improved' over a 'default' model. In our
ProSynth work, we predict such an advantage when systematic phonetic
variation is correctly modelled.
In this paper we will review three perceptual experiments undertaken in the
areas of timing, intonation and long-range coarticulation. The experiments
are based on the hypothesis that 'unnaturalness' disturbs listeners'
processing of speech signals and slows their reaction times. We shall give
some examples where this effect does seem to be present in our data. The
paper concludes with some hard-won recommendations on experimental procedures.
This archive was generated by hypermail 2b29 : Fri Feb 04 2000 - 22:30:13 GMT