This page provides some explanation for the ProSynth Web Demonstration. You can investigate the operation of the synthesizer in more detail by downloading the Windows demonstration executable from the ProSynth Outputs page.
Input should be post-processed text, containing just plain words and the prosody diacritics. No text pre-processing is performed and no punctuation or numbers/abbreviations/acronyms are allowed. For example
Input should be SAMPA notation phonetic transcription. with prosody diacritics. Spaces are ignored, but no other text or punctuation is allowed.
The diacritics are as follows: use '/' (forward-slash) to mark Intonation Phrase boundaries; use '`' (back-quote) to mark Accent Group boundaries; use '\' (back-slash) to mark Foot boundaries. Note that an IP boundary is also an AG boundary, and that an AG boundary is also a Foot boundary.
This is the same as Plain text input, but pronunciations are calculated according to the rules of Regular English Pronunciation.
The Klatt rules interpretation implements the rules for segment duration of English Sentences described by Dennis Klatt.
The pitch contour is a simple declarative accent as used in the ProSynth interpretation.
The ProSynth corpus interpretation implements a CART analysis of the durations found in the ProSynth corpus. This work was done by Paul Carter and John Local at York and is described in Ogden, R., Local, J., Carter, P. Temporal integration in ProSynth, a Prosodic Speech Synthesis System, Int. Congress of Phonetic Science, San Francisco, 1999.
The pitch contour is a simple declarative accent as analysed from the ProSynth corpus by Jana Dankovicova, Rachael Knight and Jill House at UCL. This is described in: House, J., Dankovicová, J., Huckvale, M., Intonation modelling in ProSynth: an integrated prosodic approach to speech synthesis, Int. Congress of Phonetic Science, San Francisco, 1999.
The Formant Synthesis by Rule English output uses a set of simple formant target and interpolation rules based on the design used in the old JSRU TTS system.
The MBROLA English output uses the MBROLA diphone signal generation system in combination with the "en1" diphone database to generate an output waveform.
The Festival English output uses the signal generation component only of the Festival text to speech system in combination with the RAB_diphone database.
|Mark Huckvale||November 2001|