abstract for ICPhS 1999

J.House (jill@phonetics.ucl.ac.uk)
Fri, 16 Oct 1998 12:25:15 +0100

Thanks, Yorkies, for the copy of your abstract. I'll return the compliment
by circulating our final version in a *readable* form. Apologies to those
unable to read the earlier drafts.

Jill

>Date: Thu, 15 Oct 1998 16:49:36 +0100
>To: abstract@trill.berkeley.edu
>From: "J.House" <jill@phon.ucl.ac.uk>
>Subject: abstract for ICPhS 1999
>
>Intonation modelling in ProSynth: an integrated prosodic approach to
speech synthesis
>
>Speech synthesis in ProSynth* uses a rich linguistic representation,
consisting of linked syntactic and prosodic hierarchical structures
implemented in XML. Structural nodes store linguistic attributes and
acoustic-phonetic values derived from a spoken database exemplifying
structures of interest. Phonetic interpretation integrates information
stored at all levels to generate natural-sounding, perceptually robust
speech.
>
>Intonation modelling involves identifying relevant properties of the F0
contour and relating them correctly to the constituents in the prosodic
hierarchy. The contour itself, chosen from a phonological inventory, is
specified as an attribute of the Accent Group (AG), and determined by
discoursal information stored at the top of the hierarchy in the
Intonational Phrase (IP). Components of the AG are Feet, and within these
are Syllables and their constituents, Onsets and Rhymes.
>
>Both frequency scaling and temporal alignment of F0 contours are sensitive
to prosodic structure. For example, the alignment of pitch accent peaks
and valleys is constrained by proximity to upcoming IP, AG, or Foot
boundaries. Further adjustments to timing and frequency depend on
properties of the syllabic constituents. Word boundaries can also affect
timing; though excluded from our strictly layered prosodic hierarchy, Word
information is recovered from the syntactic hierarchy and integrated at
Syllable level.
>
>We describe our use of a labelled speech database to develop a predictive
intonation model for synthesis. Quantitative data are derived from
acoustic-phonetic values extracted at prosodic constituent boundaries.
Additional F0 values are extracted for labels coinciding with turning
points in a template shape associated with each pitch accent and defined in
the AG. For example, in a falling contour (phonologically H*L) we identify
more than just H* and L; minimally we locate the point at which the contour
reaches its peak, the point at which it begins its fall (these two points
do not necessarily coincide), and the point where it levels out. For
synthesis, templates and their associated values are generated at AG level,
then integrated with the boundary values of other constituents as they are
pushed down through the structure, and a "best fit" contour generated. We
compare our results with natural speech and will be evaluating them
perceptually. Our work is important for its use of structure to integrate
F0 synthesis coherently with temporal and segmental properties, and for the
light it sheds on the contribution of different domains (Syllable, Foot,
AG) to pitch accent realisation.
>
>* EPSRC Grant no. GR/L52109
>
>Authors: Jill House, Jana Dankovicova, Mark Huckvale
>Affiliation: University College London, UK
>Author to contact: Jill House
>Postal Address: Dept of Phonetics & Linguistics
> UCL
> Gower Street
> London WC1E 6BT, UK
>E-mail: jill@phonetics.ucl.ac.uk
>Telephone: +44 171 419 3167
>Fax: +44 171 383 4108
>No.of words in abstract: 389
>Subject area: Prosody
>Preferred method of presentation: No preference
>