At the moment we have found a few sources for phonemic transcriptions
and some stress assignments for 100,000 or so words. There are quite
a few problems of consistency and notation, but in general a reasonable
start.
It occurred to me that we should aim to build a lexicon in which the
pronunciation was expressed in terms of a prosodic hierarchy rather
than as a segment string. Such a structure would combine the phonetic
content as well as the stress pattern. The reason is that for real TTS we
will want to concatenate and adjust such structures. It seems a little
odd to build them from phonemic transcription during synthesis
(suggesting it is part of the 'rules' of synthesis). Given also that
there seems to be morphological information which influences how words
are timed, then again this should be stated in our lexical entries.
So, ..., how do we go about building such a lexicon and how big should it
be? I would have thought that a lexicon containing all the words in the
Phase 1 Database recordings would be a minimum.
Another question is who will take responsibility? Another: what software
have we got to help build it?
Mark