---------------------------------------------------------
Draft 2 of abstract on testing for ESCA synthesis workshop.
Heid and Hawkins
26 April 1998
Thanks for comments on draft 1 from Ali, Sebastian, Mark and Jill. In
consequence, draft 2 is rather different!
1. I think it could be cut from :
This technique has many potential applications.
to the end.=20
I included it because I=92m still nervous that we=92re promising something =
we
may not deliver, and this gives us an opportunity to describe something
related.=20
2. Suggestions welcome for:
a. category (assessment may no longer be appropriate)
b. title/a better title
Sarah
---------------------------------------------------------------------------=
-----------------------------------------------------
Title : Automatic estimation of HLsyn parameters for high-quality formant
synthesis and spectral manipulation.
Automatic parameter-estimation for high-quality formant synthesis using
HLsyn.
First Author : Sebastian Heid
Email :=20
Affiliation : Phonetics Laboratory, Department of Linguistics,
University of Cambridge, U.K.
Other authors : Sarah Hawkins
=20
Category of Submission=20
1st choice session : B (assessment)
2nd choice session : H (phonetics and phonology)
=20
Abstract (400 words approx): 410 words.
This paper describes a hybrid method of automatically producing
natural-sounding formant-based synthetic speech from an existing speech
signal by using a combination of copy-synthesis and estimated articulatory
trajectories as input to an existing synthesizer, HLsyn (Sensimetrics
Corporation). The purpose is to allow subsequent controlled manipulation
of selected acoustic parameters. The method is being developed as part of
Prosynth (http://synth.phon.ucl.ac.uk/prosynth/), a
linguistically-informed, device-independent text-to-speech system.
Prosynth=92s motivating hypothesis is that the intelligibility of syntheti=
c
speech under adverse listening conditions will only approach that of
natural speech when the synthesizer reproduces the systematic variation of
natural speech. Part of our research addresses the temporal spread of
spectral coarticulatory effects, which demands formant synthesis to
control individual formants. But standard formant synthesis sounds
unnatural and is time-consuming. Extracting parameter values by
copy-synthesis can speed the process up, but copy-synthesizing obstruents
is notoriously difficult. The method described here attempts to circumvent
these disadvantages. It uses utterances with known segmental labels and
durations, and the HLsyn synthesizer. HLsyn uses a small number of
=93higher-level=94 (HL) parameters, representing time-varying cross-section=
al
areas of vocal-tract constrictions, to drive the much larger number of
acoustic parameters in the underlying Klatt-type synthesizer. The
originality of our method is that parameter values for vowels and
consonants are obtained in different ways. Vowels and approximants are
copy-synthesized from the acoustic signal. Obstruents and nasals are
synthesized by rule: articulatory trajectories and constriction areas are
estimated from the segment label and its duration, and converted into HL
parameter values. HLsyn calculates the acoustic consequences of the HL
constrictions, and the results modify Klatt parameter values. Further
spectral manipulation can be done by hand as desired, within the limit
that HLsyn automatically adjusts parameter values so that the resultant
acoustic patterns are consistent with acoustic theory. The strengths of
our method are (i) that simple HLsyn input captures acoustically complex
obstruents, and (ii) that HLsyn parameters automatically produce complex
acoustic properties that accompany consonantal closures, especially at
segment boundaries. These properties are hard to synthesize and thus
typically absent in formant synthesis-by-rule, yet they provide some of
the variability we hypothesize contributes to robust, natural-sounding
synthesis. This technique has many potential applications. Our focus is to
map acoustic detail in a variety of prosodic structures, and to assess its
contribution to speech intelligibility, especially in noise and when
cognitive loads are high. Tests will assess speech intelligibility when
listeners have competing tasks involving combinations of auditory vs.
nonauditory modalities, and linguistic vs. nonlinguistic behaviours.
=2E
NOTE: The subject field of your e-mail submission MUST contain the
keywords "submit abstract"
mail to: synth@dwarf.bt.co.uk
Sarah
______________________________________________________________________
Dr. Sarah Hawkins Email: sh110@cam.ac.uk
Dept. of Linguistics Phone: +44 1223 33 50 52
University of Cambridge Fax: +44 1223 33 50 53 =20
Sidgwick Avenue or +44 1223 33 50 62
Cambridge CB3 9DA
United Kingdom