IEE Meeting Abstract

From: Mark Huckvale (M.Huckvale@ucl.ac.uk)
Date: Fri Feb 04 2000 - 22:28:59 GMT

  • Next message: Sarah Hawkins: "URGENT: Re: IEE Meeting Abstract"

    Justin

    IEE Meeting on State-of-the-Art in Speech Synthesis

    Please find an abstract below. The paper will actually be jointly
    authored by the ProSynth project team, who all contributed to the
    work. Thus the names of the final authors on the paper may change.

    Yours

    Mark Huckvale
    =============================================================================
    Title:

    Assessment of Naturalness in the ProSynth Speech Synthesis Project

    Authors: (for ProSynth speech synthesis project)

    Sebastian Heid, Sarah Hawkins
    Linguistics
    University of Cambridge

    Jill House, Mark Huckvale
    Phonetics & Linguistics
    University College London

    Abstract:

    Scientific progress in any area requires quantification of the mismatch
    between the predictions of theory and the measurements of nature. For many
    years the objective criterion of intelligibility could be used to evaluate
    speech synthesis systems, but with the advent of concatenative and corpus
    synthesis techniques, intelligibility under good listening conditions has
    reached ceiling levels. However, the perceived naturalness of such systems
    still falls far short of the human model, and their intelligibility under
    adverse listening conditions deteriorates rapidly by comparison with
    natural speech. Other objective measures are therefore required to evaluate
    scientific hypotheses about the reasons for the discrepancy.

    The most common measure of naturalness used by commercial systems is a mean
    opinion score, calculated from a panel of listeners applying a rating scale.
    Such measures are expensive to undertake and unreliable unless the panel
    size is large. Within the ProSynth synthesis project we have been pursuing
    instead carefully designed perceptual experiments which attempt to tap the
    degree of difficulty in cognitive processing experienced by listeners when
    performing tasks informed by synthetic speech. Within such experiments it
    is possible to make statistical comparisons between speech generated
    according to models of differing complexity, and thus to assess the
    perceptual advantage of an 'improved' over a 'default' model. In our
    ProSynth work, we predict such an advantage when systematic phonetic
    variation is correctly modelled.

    In this paper we will review three perceptual experiments undertaken in the
    areas of timing, intonation and long-range coarticulation. The experiments
    are based on the hypothesis that 'unnaturalness' disturbs listeners'
    processing of speech signals and slows their reaction times. We shall give
    some examples where this effect does seem to be present in our data. The
    paper concludes with some hard-won recommendations on experimental procedures.



    This archive was generated by hypermail 2b29 : Fri Feb 04 2000 - 22:30:13 GMT