Research in Speech Synthesis

If you are interested in doing graduate research in speech synthesis in the department, please contact Mark Huckvale at .


Recent research work in speech synthesis has been concentrated on the modelling of prosody and the development of synthesis software tools.

In our prosody work we have been concerned with how to represent the prosodic structure of text in a hierarchical phonological representation. This has involved studying prosodic phrasing (breaking text into intonational phrases) and the categorisation and assignment of pitch accents within the phrase. Our work on the phonetic interpretation of these structures involves modelling of fundamental frequency contours and predicting the durations of syllabic constituents as a function of the segmental content and the phrase context.

In our tools work we have made extensive use of XML to mark-up linguistic representations: particularly hierarchical phonological representations and their phonetic interpretation. The ProSynth tools convert text to a hierarchical phonological form expressed in XML. Scripts interpret this form by fleshing out durations, fundamental frequency and segmental quality in context. The scripting language ProXML is designed to make declarative formulation of such knowledge easy to state.

Future work will look at how intonation is used to express the information structure of sentences. This will be of use in concept to speech systems in which the text to be spoken is also generated by the computer system.


The Department of Phonetics and Linguistics at UCL has been involved in speech synthesis for over 30 years. Currently we have 3 staff who take an active interest in speech synthesis:

Mark Huckvale (Senior Lecturer in Speech Sciences)
Is an engineer and signal processing expert who has worked on PSOLA time-domain synthesis, as well as in the implementation of formant synthesizers. He has built a poly-lectal (multi-accent) synthesis system for English. He has also been involved in the construction of a database of read speech for the study of English prosody, with the aim of constructing statistical models of rhythm and intonation linked to a richer syntactical description of utterances. His current work on the ProSynth project (see below) relates to the tools and database infrastructure.

Jill House (Senior Lecturer in Phonetics)
Is a phonetician who has worked on models of intonation for speech synthesis systems. As consultant (1989-94) to the Swedish synthesis company Infovox she designed and implemented a revised prosodic model for their British English text-to-speech system, and participated in similar work for French. She also worked on the ESPRIT dialogue project SunDial, with the aim of appropriately adapting the intonation of system-generated dialogue responses to the information state of the dialogue manager. On-going work includes the development of an integrated phonological/phonetic model in the ProSynth project (see below) with Sarah Hawkins (University of Cambridge), and John Local (University of York).

Valerie Hazan (Reader in Speech Sciences)
Is interested in human perception of fine detail acoustic cues in natural speech signals and in individual differences in the perception of synthetic speech. This work has lead to a number of important applications for synthetic speech: in an audiometer system for assessing human speech processing performance; and in a project where cues are artificially enhanced to improve intelligibility. This last project has applications in foreign language learning and in prostheses for the hearing impaired as well as in synthetic speech.

Other members of the Department have worked in the domain in the past:

John Maidment and Michael Ashby are phoneticians that have built components of speech synthesis systems.

John Wells is the author of the Longman Dictionary of English pronunciation which is available in machine-readable form.

The Department has links with the Survey of English Usage within the English Department at UCL, where work is undertaken on corpus analysis and parsing.

Research projects

Recent funded research projects on synthesis (in reverse order):

Recent student projects in speech synthesis have included:


Recent published papers in the area (in reverse order):

Speech Data

The following data sets are available to research workers in speech synthesis at nominal charge:

PROSICE - Extended read texts for study of English prosody
Currently 4 read texts each of 15 minutes in duration (about 2000 words each). With word-level annotations. Hear a sample.

PROSYNTH - Read sentences for study of English intonation
Over 700 short read sentences exploring systematic variation in stress pattern and segmental context for simple declarative intonation. Hear a sample.

If you are interested in these please contact Mark Huckvale ().

Software Tools

The following software tools are available to research workers in speech synthesis free of charge:

Speech Filing System
Our general purpose speech analysis toolkit. Contains tools for formant synthesis, diphone synthesis and semi-automatic annotation. Refer to the Speech Filing System Home Page.

ProSynth Synthesis Shell for Windows

This is a Windows application that can be used to build and evaluate speech synthesis systems. You can download it from the ProSynth project pages. It uses XML to encode all intermediate representations between text and sound and comes complete with a basic set of scripts for synthesis of simple sentences (requires the MBROLA diphone synthesis sub system).


The Department has very good facilities for conducting research in speech synthesis, including: