Durations etc

Paul Carter (pgc104@york.ac.uk)
Mon, 5 Jul 1999 12:49:34 +0100 (BST)

Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Next message: Sarah Hawkins: "RESONANCE DATABASE 1"
Previous message: Mark Huckvale: "DBASE2 and RESON1 recordings on FTP site"

I've produced a simple durational model for syllables which runs in place
of Mark's Klatt durations model.

The control script and the ProXML script are on the ftp site, in pub/temp.
The files are say001.sh.crypt (control script) and durations001.prx.crypt
(ProXML script). The control script will need to be edited to run on your
local system. The control script includes a fix to alter the xml files to
have rhyme VOI="Y" in open syllables (this fix was taken into account in
building the model). For this you will also need the ProXML script
correct_checked.prx.crypt (also on the ftp site, though hardly worth it
in this case!).

The model is based entirely on a straightforward regression tree analysis
of the database using only the structure and the features that are in the
database (with one small exception, outlined below).

Trees are grown at syllable level, then all further calculations are split
by the decisions made at that level, so trees for onset level are split by
strength and weight, since that split comes out of decisions made at
syllable level (an alternative model might ignore weight at onset level,
since weight as an attribute does not occur at the onset). This procedure
is then repeated down to terminal node level. This has the effect that
terminal nodes can only see up branches, not down again, so (for example)
vowel duration is not here dependent on manner of the following consonant.

The only deviation from this is that I included a knowledge of sequencing
for CNS and VOC, so the model for CNS is sensitive to existence of, and
position in, clusters; similarly, the VOC model is sensitive to existence
of, and position in, diphthongs.

This means that the more sophisticated linguistic decisions we want to
make (such as the sitter/sister distinction in final syllables Richard
referred to) are not built into this model. This is why I say this is a
simple model.

Another aspect of its simplicity is that it is entirely multiplicative:
all decisions are associated with factors.

Since no information above syllable level is incorporated, and the
database has a very high proportion of utterance-final syllables, the
model produces things which are clearly too long when you try synthesising
long phrases. The first block of the code

SYL {
.:DUR *= 0.2596;
}

effectively sets a syllable duration. To speed up the whole utterance,
try adding another factor (JKL suggests something like 0.56):

SYL {
.:DUR *= 0.2596;
.:DUR *= 0.56;
}

The next block of the code is a fix to get round the fact that uttmbrol
expects Klatt-type MINDUR and INHDUR attributes if it finds DUR. Since
this model uses DUR, I've added

CNS,VOC {
:INHDUR=1;
}

to fool uttmbrol into factoring out MINDUR and INHDUR, since they are both
set to 0 in the defaults and Klatt's formula

Duration=MINDUR+(INHDUR-MINDUR)*DUR

reduces to

Duration=DUR

The rest of the code comprises the decisions based on regression tree
analysis, in this order:

SYL {}
RHYME {}
NUC {}
ONSET {}
CODA {}
ACODA {}
VOC{}
CNS{}

CNS in ACODA decisions are built into the ACODA block, just to simplify
the coding.

There are two problems I've identified with the output, both of which I
think are issues relating to the parser rather than this model. Firstly,
there are some incorrect phone selections appearing in the MBROLA control
files (eg "it's" as [i: t s], "parchment" as [p A: t S m 3: n t] - NB [t
S], not [tS]). Secondly, anything with an ACODA won't work. I get an
error message about ACODA not having a DUR attribute, though the xml files
in the database do have DUR for ACODA. I am about to code a fix for this,
but it would be better if it got sorted out properly at some stage.

I hope this makes some sort of sense.

Best wishes,
Paul

_ Paul Carter ________________________________________ pgc104@york.ac.uk _
Dept of Language & Linguistic Science|http://www-users.york.ac.uk/~pgc104/
University of York |telephone: +44 (0)1904 432660
Heslington, York. YO10 5DD |fax: +44 (0)1904 432673

Next message: Sarah Hawkins: "RESONANCE DATABASE 1"
Previous message: Mark Huckvale: "DBASE2 and RESON1 recordings on FTP site"

This archive was generated by hypermail 2.0b3 on Mon Jul 05 1999 - 12:50:53 BST