Top-level labelling of intonation

Mark Huckvale (mark@phonetics.ucl.ac.uk)
Fri, 07 May 1999 17:04:41 +0100

This message is directed at Jill and Jana but comments are welcome
from everyone.

I have been looking at how we can synthesise continuous prose using
the XML processing architecture. I have adapted the main utterance
processing programs (uttmake, uttclass, uttcomp, prx, uttmbrol) so that
they can accept a _sequence_ of utterances, rather than a single
utterance.

I have also put in some elementary code for reading XML marked up
text as input. The question now becomes: what _text_ markup is required
to specify the construction of the upper levels of the prosodic
hierarchy?

The elementary code accepts input in this form:

<?xml version='1.0'?>
<PROSYNTH>
<IP>they `say it's a `lie</IP>
<IP>in the `rigging</IP>
</PROSYNTH>

Here, the <IP> tags specify intonational phrase boundaries. I believe
this is always possible since IP boundaries are also word boundaries.
The identification that a word carries stress is made with the reverse
quote mark (which can occur anywhere in the spelling of the word).
If a word is marked for stress, then any stressed syllables in the
lexical pronunciation become heads of feet. ("heads of feet"?!).

I now need two other pieces of information: (i) how do I decide
where accent group boundaries come, and (ii) how do I specify the
features on AG nodes in the prosodic hierarchy?

Note that it is a trivial matter to put any IP features into the
elementary mark-up scheme, so the issues are only to do with AGs.

Currently AG boundaries are coterminous with FOOT boundaries - but
this is only because the database is like that. Since in a sense
the reverse quote character marks foot boundaries (to within the
nearest word) we could consider having two types of stress mark:
one which was both a FOOT & AG boundary, and one which was just
a FOOT boundary.

This, however, doesn't solve the problem of the features to put on
the AG nodes. I am still not sure whether these are specifiable
by rule from features put on the IP nodes. If they are - fine; if
not then we need a mechanism to specify these too. Perhaps we
could use mark-up such as "they say it's a <AG TYPE=FALL/>lie"
where the presence of the mark-up indicates the features to be used
on the next AG formed, whichever syllable it starts on.

So I need your ideas. Perhaps you (Jill/Jana) could try to write
out a list of possible phrases and we could see how they might
be marked up. I can then concentrate on the specification for
the Fx contuour itself and how to predict it from the features
on the IP and AG nodes.

Mark