If you are interested in doing graduate research in speech synthesis in the department, please contact Mark Huckvale at .
Recent research work in speech synthesis has been concentrated on
the modelling of prosody and the development of synthesis software
tools.
In our prosody work we have been concerned with how to
represent the prosodic structure of text in a hierarchical phonological
representation. This has involved studying prosodic phrasing (breaking
text into intonational phrases) and the categorisation and assignment
of pitch accents within the phrase. Our work on the phonetic
interpretation of these structures involves modelling of fundamental
frequency contours and predicting the durations of syllabic
constituents as a function of the segmental content and the phrase
context.
In our tools work we have made extensive use of XML to
mark-up linguistic representations: particularly hierarchical
phonological representations and their phonetic interpretation. The
ProSynth tools convert text to a hierarchical phonological form
expressed in XML. Scripts interpret this form by fleshing out
durations, fundamental frequency and segmental quality in context. The
scripting language ProXML is designed to make declarative formulation
of such knowledge easy to state.
Future work will look at how intonation is used to express the information structure of sentences. This will be of use in concept to speech systems in which the text to be spoken is also generated by the computer system.
The Department of Phonetics and Linguistics at UCL has been involved in speech synthesis for over 30 years. Currently we have 3 staff who take an active interest in speech synthesis:
Mark Huckvale (Senior Lecturer in Speech Sciences)
Is an engineer and signal processing expert who has worked on PSOLA
time-domain synthesis, as well as in the implementation of formant
synthesizers. He has built a poly-lectal (multi-accent) synthesis
system for English. He has also been involved in the construction of a
database of read speech for the study of English prosody, with the aim
of constructing statistical models of rhythm and intonation linked to a
richer syntactical description of utterances. His current work on the
ProSynth project (see below) relates to the tools and database
infrastructure.
Jill House (Senior Lecturer in Phonetics)
Is a phonetician who has worked on models of intonation for speech
synthesis systems. As consultant (1989-94) to the Swedish synthesis
company Infovox she designed and implemented a revised prosodic model
for their British English text-to-speech system, and participated in
similar work for French. She also worked on the ESPRIT dialogue project
SunDial, with the aim of appropriately adapting the intonation of
system-generated dialogue responses to the information state of the
dialogue manager. On-going work includes the development of an
integrated phonological/phonetic model in the ProSynth project (see
below) with Sarah Hawkins (University of Cambridge), and John Local
(University of York).
Valerie Hazan (Reader in Speech Sciences)
Is interested in human perception of fine detail acoustic cues in
natural speech signals and in individual differences in the perception
of synthetic speech. This work has lead to a number of important
applications for synthetic speech: in an audiometer system for
assessing human speech processing performance; and in a project where
cues are artificially enhanced to improve intelligibility. This last
project has applications in foreign language learning and in prostheses
for the hearing impaired as well as in synthetic speech.
Other members of the Department have worked in the domain in the past:
John Maidment and Michael Ashby are phoneticians that have built components of speech synthesis systems.
John Wells is the author of the Longman Dictionary of English pronunciation which is available in machine-readable form.
The Department has links with the Survey of English Usage within the English Department at UCL, where work is undertaken on corpus analysis and parsing.
Recent funded research projects on synthesis (in reverse order):
1997-2001 COST258 Naturalness of synthetic speech
Mark Huckvale
Funded by European Union Co-operation in Science and Technology Directorate
1997-2000 & 2000-2001 Integrated Prosodic Aproach to Speech Synthesis
Sarah Hawkins (Cambridge), Jill House (UCL), Mark Huckvale (UCL), John Local (York), Richard Ogden (York)
Funded by EPSRC.
1993-96 Speech Pattern Audiometry and Training
Valerie Hazan, Ginny Wilson, Adrian Fourcin, Laryngograph Ltd.
Funded by DTI SMART award.
1993-96 Cue-enhancement in natural and synthetic speech
Valerie Hazan and Adrian Fourcin
Funded by SERC
1989-92 Individual strategies in synthetic speech perception
Valerie Hazan and Adrian Fourcin
Funded by SERC
1989-92 Modelling dynamics of vocal fold vibration for synthesis
David Howard, Jill House, Sarah Palmer
Funded by SERC
1985-89 Pitch accent model of intonation
Jill House and Mike Johnson
Funded by the Speech Research Unit, DRA Malvern
Recent student projects in speech synthesis have included:
2002 Intelligibility of a spelling-regular English accent.
Margaret Shaw and Mark Huckvale
This study is looking at whether mis-pronunciations that are
logically connected to the spelling are less disruptive to
intelligibility than random mispronunciations. Look at the Regular English pronunciation project web site.
2002 Development of a scale of acceptable rhythm.
Robert Gray and Mark Huckvale
This study is looking at whether it is possible to create a
perceptual scale of rhythmic quality, and to see if there are objective
properties of timing with correlate with listener preferences.
2000 Performance of the ITU test for speech output system assessment.
Yolanda Vazquez-Alvarez and Mark Huckvale
This study looked at the ITU P.85 assessment standard and checked
its reliability and sensitivity. Our results were published in ICSLP
Recent published papers in the area (in reverse order):
The following data sets are available to research workers in speech synthesis at nominal charge:
PROSICE - Extended read texts for study of English prosody
Currently 4 read texts each of 15 minutes in duration (about 2000 words each).
With word-level annotations. Hear a sample.
PROSYNTH - Read sentences for study of English intonation
Over 700 short read sentences exploring systematic variation in
stress pattern and segmental context for simple declarative intonation.
Hear a sample.
If you are interested in these please contact Mark Huckvale ().
The following software tools are available to research workers in speech synthesis free of charge:
Speech Filing System
Our general purpose speech analysis toolkit. Contains tools for
formant synthesis, diphone synthesis and semi-automatic annotation.
Refer to the Speech Filing System Home Page.
ProSynth Synthesis Shell for Windows
This is a Windows application that can be used to build and evaluate speech synthesis systems. You can download it from the ProSynth project pages. It uses XML to encode all intermediate representations between text and sound and comes complete with a basic set of scripts for synthesis of simple sentences (requires the MBROLA diphone synthesis sub system).
The Department has very good facilities for conducting research in speech synthesis, including: