ProSynth Demonstration Help

This page provides some explanation for the ProSynth Web Demonstration. You can investigate the operation of the synthesizer in more detail by downloading the Windows demonstration executable from the ProSynth Outputs page.


Input Formats

Plain Text with Diacritics

Input should be post-processed text, containing just plain words and the prosody diacritics. No text pre-processing is performed and no punctuation or numbers/abbreviations/acronyms are allowed. For example

The diacritics are as follows: use '/' (forward-slash) to mark Intonation Phrase boundaries; use '`' (back-quote) to mark Accent Group boundaries; use '\' (back-slash) to mark Foot boundaries. Note that an IP boundary is also an AG boundary, and that an AG boundary is also a Foot boundary.

Transcription with Diacritics

Input should be SAMPA notation phonetic transcription. with prosody diacritics. Spaces are ignored, but no other text or punctuation is allowed.

For example:

The diacritics are as follows: use '/' (forward-slash) to mark Intonation Phrase boundaries; use '`' (back-quote) to mark Accent Group boundaries; use '\' (back-slash) to mark Foot boundaries. Note that an IP boundary is also an AG boundary, and that an AG boundary is also a Foot boundary.

Regular English Text with Diacritics

This is the same as Plain text input, but pronunciations are calculated according to the rules of Regular English Pronunciation.

Klatt Interpretation Rules

The Klatt rules interpretation implements the rules for segment duration of English Sentences described by Dennis Klatt.

The pitch contour is a simple declarative accent as used in the ProSynth interpretation.

ProSynth Corpus Interpretation Rules

The ProSynth corpus interpretation implements a CART analysis of the durations found in the ProSynth corpus. This work was done by Paul Carter and John Local at York and is described in Ogden, R., Local, J., Carter, P. Temporal integration in ProSynth, a Prosodic Speech Synthesis System, Int. Congress of Phonetic Science, San Francisco, 1999.

The pitch contour is a simple declarative accent as analysed from the ProSynth corpus by Jana Dankovicova, Rachael Knight and Jill House at UCL. This is described in: House, J., Dankovicová, J., Huckvale, M., Intonation modelling in ProSynth: an integrated prosodic approach to speech synthesis, Int. Congress of Phonetic Science, San Francisco, 1999.

Formant Synthesis Signal Generation

The Formant Synthesis by Rule English output uses a set of simple formant target and interpolation rules based on the design used in the old JSRU TTS system.

MBROLA diphone Signal Generation

The MBROLA English output uses the MBROLA diphone signal generation system in combination with the "en1" diphone database to generate an output waveform.

Festival diphone Signal Generation

The Festival English output uses the signal generation component only of the Festival text to speech system in combination with the RAB_diphone database.

Synthesizer Output Formats

Audio Output
Audio output in in WAV format, either from the formant synthesizer or from the diphone syntheiszer.
Input Format as XML
This option displays the input text once it has been parsed into a hierarchical phonological structure. A discussion of the struture can be found in: Ogden, R., Hawkins, S., House, J., Huckvale, M., Local, J., Carter, P., Dankovicova, J., Heid, S. (2000). ProSynth: an integrated prosodic approach to device-independent natural- sounding speech synthesis. Computer Speech and Language, 14, 177-210.
Output Format as XML
This option displays the phonological structure after it has been processed by the interpretation rules. At this stage, the phonetic durations and fundamental frequency contour have been specified. More information about the process of interpretation can be found in: Representation and processing of linguistic structures for an all-prosodic synthesis system using XML, Mark Huckvale, Proc. EuroSpeech 99, Budapest, 1999.
Synthesizer Control Parameters
Synthesizer control parameters are listed in MBROLA format: namely a segment name in SAMPA notation, a duration in milliseconds, then a number of pairs of values expressing a percentage time through the segment and a fundamental frequency value in Hertz.


Mark HuckvaleNovember 2001