1. Communicating with each other.
Vote of thanks to Mark for setting up prosynth. I think the
archive probably *should* have a password, if the alternative is
that anyone can log in and read our discussions.
Another issue which will come up soon will be that of
transferring files to each other, which was such a horrendous
aspect of preparing our bid. Now I have taken steps to bring my
system up to speed on this: shortly (within a week or so) I shall
be using a new PC with Windows95, Eudora, Word, and a decent Web
browser. Yay!! So I should at least be able to share files with
Mark and Sarah. I can't remebmber what York's compatibility
problems were. I hope we can establish a protocol for this at
our project meeting (item for agenda).
2. Project meeting.
Any advance on this? I teach till 4 on Thursdays, so if you want
to combine EPSRC with Siphtra, my availability would be:
Thur 16 4-7 (or so)
Fri 17 9-11; 5-7 (rather a heavy day...)
Sat 18 a.m.?
Don't suppose any of us relishes the idea of Sat, but it may be
the least worst option, as it were. Otherwise a whole separate
journey required.
Do we want to invite BT along at this point? My own preference
would be to invite them at a later date but this may be silly if
they can help with setting up.
3. Mark's document.
I think this looks very promising, and I look forward to seeing
Mark present the ideas at our meeting (with examples, as I am not
clever enough to be able to conceptualise/visualise exactly what
the formalism will look like). I strongly approve of the
proposals in paras 3 - 5, and need to think further about the
implications of 8. Phonetic interpretation of f0 ideally
requires finer resolution than 10ms frame anyway (microprosody).
To be discussed.
4. Extending/interfacing with parser.
I'm assuming Alex's existing parser is reliable enough for
immediate purposes when it comes to segmenting text into
intonational phrase (IP)-sized chunks. If we were starting from
scratch I'd probably go for a shallow parse-for-prosody, but
since we've already got something that works well with a deep
parse, I'd be reluctant to throw anything away unless it proves
cumbersome.
We now need some way of deriving the prosodic structure within
the IP-sized chunks. The data we record for analysis can of
course be hand-annotated for prosodic structure, but for
synthesis we need the output of the parser to yield the correct
structure automatically. Some preliminary thoughts:
within IPs, we want to be able to identify
* accent groups
* feet
* phonological words
* syllables
at least. For this task, we need information not available in
an orthographic text, so a transcription of some kind will be
necessary. Do we know whether there are dictionary-based
spelling-to-sound systems available which we could plug in to
give us a start? (Laureate??) For limited data we can do
transcriptions by hand just to get going.
* Parsing into syllables is fraught with problems if we need to
insert syllable boundary markers, since syllabification decisions
would have to be made, which goes against the useful concept of
ambisyllabicity. Is there any way of doing a "fuzzy" parse --
identifying syllable nuclei (usually vowels) so we know how many
syllables we're dealing with, and allocating consonants later?
Or should we have a default (maximise onsets, maximise
ambisyllabicity...) from which to deviate? This may in the end
be a non-problem if we get the higher levels of prosodic
structure right, and apply the correct principles of
syllabification on the basis of this structure...
* Words: these are to an extent "given" in the orthographic
string, but grouping into phonological words through
cliticisation needs to be done.
* Feet: these require knowledge of lexical stress, and also a
differentiation between strong and weak syllables.
* Accent groups: ultimately needs access to pragmatic
information, but failing that, minimally requires access to same
info as for feet, plus anything that can be gleaned about
phonological compounds.
I won't presume to look into the internal composition of
syllables. I'm assuming that's going beyond any parser's remit.
I'm hoping to be able to include some empirical investigation
into the validity of domains such as phonological words and feet,
or what the most useful definition of them might be.
Enough for now as I have to go. Comments and suggestions
welcome.
Jill!