First Author : Sebastian Heid
Email : heid@phonetik.uni-muenchen.de
[Or to Hawkins: sh110@cam.ac.uk]
Affiliation : Phonetics Laboratory
Department of Linguistics
Sidgwick Avenue
University of Cambridge CB3 9DA
U.K.
Other authors : Sarah Hawkins
=20
Category of Submission=20
1st choice session : B (assessment)
2nd choice session : H (phonetics and phonology)
=20
Abstract (400 words approx):
This paper describes a hybrid method of automatically producing
natural-sounding formant-based synthetic speech from an existing speech
signal by using a combination of copy-synthesis and estimated articulatory
trajectories as input to an existing synthesizer, HLsyn (Sensimetrics
Corporation). The purpose is to allow subsequent controlled manipulation
of selected acoustic parameters. The method is being developed as part of
Prosynth (http://synth.phon.ucl.ac.uk/prosynth/), a
linguistically-informed, device-independent text-to-speech system.
Prosynth=92s motivating hypothesis is that the intelligibility of syntheti=
c
speech under adverse listening conditions will only approach that of
natural speech when the synthesizer reproduces the systematic variation of
natural speech. Part of our research addresses the temporal spread of
spectral coarticulatory effects, which demands formant synthesis to
control individual formants. But standard formant synthesis sounds
unnatural and is slow to do. Extracting parameter values by copy-synthesis
can speed the process up, but copy-synthesizing obstruents is notoriously
difficult. The method described here attempts to circumvent these
disadvantages. It uses utterances with known segmental labels and
durations, and the HLsyn synthesizer. HLsyn uses a small number of
"higher-level" (HL) parameters, representing time-varying cross-sectional
areas of vocal-tract constrictions, to drive the much larger number of
acoustic parameters in the underlying Klatt-type synthesizer. The
originality of our method is that parameter values for vowels and
consonants are obtained in different ways. Vowels and approximants are
copy-synthesized from the acoustic signal. Obstruents and nasals are
synthesized by rule: articulatory trajectories and constriction areas are
estimated from the segment label and its duration, and converted into HL
parameter values. HLsyn calculates the acoustic consequences of the HL
constrictions, and the results modify Klatt parameter values. Further
spectral manipulation can be done by hand as desired, within the limit
that HLsyn automatically adjusts parameter values so that the resultant
acoustic patterns are consistent with acoustic theory. The strengths of
our method are (i) that simple HLsyn input captures acoustically complex
obstruents, and (ii) that HLsyn parameters automatically produce complex
acoustic properties that accompany consonantal closures, especially at
segment boundaries. These properties are hard to synthesize and thus
typically absent in formant synthesis-by-rule, yet they provide some of
the variability we hypothesize contributes to robust, natural-sounding
synthesis. This technique has many potential applications. Our focus is to
map acoustic detail in a variety of prosodic structures, and to assess its
contribution to speech intelligibility, especially in noise and when
cognitive loads are high. Tests will assess speech intelligibility when
listeners have competing tasks involving combinations of auditory vs.
nonauditory modalities, and linguistic vs. nonlinguistic behaviours.
end of text
______________________________________________________________________
Dr. Sarah Hawkins Email: sh110@cam.ac.uk
Dept. of Linguistics Phone: +44 1223 33 50 52
University of Cambridge Fax: +44 1223 33 50 53 =20
Sidgwick Avenue or +44 1223 33 50 62
Cambridge CB3 9DA
United Kingdom