Integrated Prosodic Approach to Speech Synthesis

An integrated prosodic approach to device-independent,
natural-sounding speech synthesis

A research project funded under the EPSRC Speech and Language programme

Administrative Details

Grant Period:	October 1997 - March 2000
Grant Award:	£268,000
Investigators:	Sarah Hawkins (University of Cambridge) Jill House (University College London) Mark Huckvale (University College London) John Local (University of York) Richard Ogden (University of York)

Overview

Current text-to-speech systems, both concatenative and formant-based, have good intelligibility but still have speech that often sounds unnatural because the rhythm, intonation and fine phonetic detail reflecting coarticulatory patterns are inadequately modelled. As a consequence, listening to such speech requires a greater cognitive effort which can lead to problems in applications for synthetic speech in circumstances of contaminating noise or poor communication channels.

This collaborative project between Linguistics departments in Cambridge, London and York aims to construct a model of computational phonology that integrates and extends modern metrical approaches to phonetic interpretation and to apply this model to the generation of high-quality speech synthesis. The three focal areas of research are intonation, morphological structure and systematic segmental variation. Integrating these is a temporal model that provides a linguistic structure or 'data object' upon which phonetic interpretation is executed and which delivers control information for synthesis.

Initially, the current project aims to cover a limited range of phenomena in one British English accent, but the complete model should be appropriate for language and accent independence. For signal generation, the project will start with time-domain modification of natural speech signals, supplemented by formant-based synthesis models - although compatibility with concatenative methods will be maintained. Progress will be evaluated using perceptual tests for naturalness, intelligibility and communicative success.

Objectives

demonstration of selected parts of a text-to-speech system constructed on linguistically-motivated, declarative computational principles
development of a system-independent description of the linguistic structures involved
perceptual evaluations using criteria of naturalness and robustness

Progress

For progress on the prosynth project, refer to the Project Home Page

UCL Publications

Copies of ProSynth papers and reports generated at UCL (in PDF):

Intonation modelling in ProSynth: an integrated prosodic approach to speech synthesis, Jill House, Jana Dankovičová, Mark Huckvale, Int. Congress of Phonetic Science, San Francisco, 1999. Also poster with more details.
Representation and processing of linguistic structures for an all-prosodic synthesis system using XML, Mark Huckvale, Proc. EuroSpeech 99, Budapest, 1999.
Intonation modelling in ProSynth, J. House, J. Dancovicova, & M. Huckvale. Speech, Hearing & Language: Work in Progress, 11 (1999), UCL, 51-61.

For a project-wide list refer to the Project Home Page

Related Issues

If you would like to know more about the project, or have ideas for original contributions that you could make to the project, please contact Jill House.

^{Author: Mark Huckvale. Last Changed: 8 June 1999}

An integrated prosodic approach to device-independent, natural-sounding speech synthesis