Speech Processing by Computer

 

LAB 7

TEXT-TO-SPEECH SYNTHESIS

 

In this lab session we will use 3 web-based synthesis systems to investigate the components of a text-to-speech system and the challenges they face.

 

1.                  Locate these systems in three separate browser windows.

a.    The AT&T Next-Generation TTS system at:

http://www.research.att.com/~mjm/cgi-bin/ttsdemo

            b. The Lucent Technologies TTS system at

http://www.bell-labs.com/project/tts/voices-java.html

            c. The Edinburgh Festival TTS system at

http://www.cstr.ed.ac.uk/projects/festival/userin.html

2.                  Design some sentences that stress different levels of the system:

a.       Text normalisation (e.g. ambiguous abbreviations)

b.      Prosodic phrasing (e.g. garden path sentences)

c.       Intonation (e.g. locations of pitch prominences)

d.      Letter-to-Sound (e.g. some odd pronunciations)

3.                  Test out the different systems with the sentences.

a.       Save the audio to files on the computer, and view them with WASP

b.      Are there differences between the systems? 

c.       What default assumptions do the system make?

4.                  Listen to the durations, pitch and voice quality of these systems. 

a.       What aspects of the speech are still in need of improvement?

5.                  Compare a couple of synthetic utterances with natural recordings.

a.       Print out spectrograms of a few words of synthetic and few words of natural and identify the largest differences.