CSL paper


Richard Ogden (rao1@york.ac.uk)
Thu, 10 Jun 1999 11:34:22 +0100 (BST)


Dear all,

First year exams/marking are over: I've done a bit of work on our CSL
paper. Enclosed in rtf format.

I have integrated everyone's text into what we already had; some of it
doesn't appear quite where you expected, because I had to stitch things
together. Please let me know where you think things jar, are not clear
enough, etc. Excision/movement/replacement would be preferable to new
text, unless it's just a line or two.

Here are what I think are very obvious structural gaps:
* we haven't put any test results in. Are we going to?
* we have no new text on XML. If we need it, please supply it.
* the bibliography is a total mess! (but can be left till the end)
* we should probably include a few more diagrams/figures

Do we want to add more specific details about our respective work? eg.
temporal model, f0 modelling, /r l/ work, procsy? If so I suggest we limit
it to a couple of paragraphs each. (Paul, Jana, Sebastian: feel free to
contribute!)

Length: at the moment it's about 13 pages, 1.5 space, 12-pt Times. I guess
it shouldn't grow much more.

Could I ask you to give me some response by the end of next week if at all
possible please? ie. June 18. We *must* get moving if we want this out
soon.

Richard

Richard Ogden
rao1@york.ac.uk
http://www.york.ac.uk/~rao1/

{\rtf1\mac\deff2 {\fonttbl{\f0\fswiss Chicago;}{\f2\froman New York;}{\f3\fswiss Geneva;}{\f4\fmodern Monaco;}{\f5\fscript Venice;}{\f6\fdecor London;}{\f7\fdecor Athens;}{\f12\fnil Los Angeles;}{\f13\fnil Zapf Dingbats;}{\f14\fnil Bookman;}
{\f16\fnil Palatino;}{\f18\fnil Zapf Chancery;}{\f20\froman Times;}{\f21\fswiss Helvetica;}{\f22\fmodern Courier;}{\f23\ftech Symbol;}{\f34\fnil New Century Schlbk;}{\f134\fnil Saransk;}{\f237\fnil Petersburg;}{\f2017\fnil IPAPhon;}
{\f2713\fnil IPAserif Lund1;}{\f9839\fnil Espy Serif;}{\f9840\fnil Espy Sans;}{\f9841\fnil Espy Serif Bold;}{\f9842\fnil Espy Sans Bold;}{\f10565\fnil M Times New Roman Expt;}{\f12407\fnil SILDoulosIPA-Regular;}{\f12605\fnil SILSophiaIPA-Regular;}
{\f13505\fnil SILManuscriptIPA-Regular;}}{\colortbl\red0\green0\blue0;\red0\green0\blue255;\red0\green255\blue255;\red0\green255\blue0;\red255\green0\blue255;\red255\green0\blue0;\red255\green255\blue0;\red255\green255\blue255;}{\stylesheet{
\s243\qj\sl-240\tqc\tx4967\tqr\tx9935 \f20\fs20 \sbasedon0\snext243 footer;}{\s244\qj\sl-240\tqc\tx4967\tqr\tx9935 \f20\fs20 \sbasedon0\snext244 header;}{\s252\qj\sb240\sa60\keepn \b\i\f20 \sbasedon0\snext0 heading 4;}{\s253\qj\sb240\sa60\keepn \b\f20
\sbasedon0\snext0 heading 3;}{\s254\qj\sb240\sa60\keepn \b\i\f21 \sbasedon0\snext0 heading 2;}{\s255\qj\sb240\sa60\keepn \b\f21\fs28 \sbasedon0\snext0 heading 1;}{\qj\sb240\sl360 \f20 \sbasedon222\snext0 Normal;}{\s1\qj\sb120\sa120\sl360 \f65535
\sbasedon222\snext1 Abstract;}{\s2\qc\sb180\sl-280 \b\f20 \sbasedon222\snext2 AbstractHeading;}{\s3\li288\ri288\sb140\sl-219 \f20\fs18 \sbasedon222\snext3 Address;}{\s4\qc\sb180\sl-219 \f20\fs22 \sbasedon222\snext4 Affiliation;}{\s5\qc\sb180\sl-219
\i\f20\fs22 \sbasedon222\snext5 Author;}{\s6\qj\sb120\sa120\sl360 \f65535 \sbasedon222\snext6 Body;}{\s7\qc\sb120\sa240\sl360 \f65535 \sbasedon0\snext0 caption;}{\s8\qc\sl219 \f20\fs18 \sbasedon222\snext8 CellBody;}{\s9\qc\sl219 \b\f20\fs18
\sbasedon222\snext9 CellHeading;}{\s10\qc\sb180\sl-280\keepn \b\f20 \sbasedon222\snext10 Head1;}{\s11\fi-562\li562\sb180\sl-280\keepn\tx566 \b\f20 \sbasedon222\snext11 Head2;}{\s12\qj\fi-283\li572\ri561\sb140\sl-220\tx566 \f65535\fs18
\sbasedon222\snext12 Item;}{\s13\qj\fi-283\li572\ri561\sb140\sl-220\tx560 \f65535\fs18 \sbasedon222\snext13 NumItem;}{\s14\qc \f20\fs8 \sbasedon4\snext14 bugfix;}{\s15\qj\fi-284\li556\sb120\sl-219\tx560 \f65535\fs18 \sbasedon222\snext15 Reference;}{
\s16\qj\sl-280 \f21 \sbasedon222\snext16 RTF_Defaults;}{\s17\qj\sl219 \f20\fs18 \sbasedon222\snext17 TableTitle;}{\s18\qc\sl-340 \b\f20\fs28 \sbasedon0\snext18 Title;}{\s19\qc\sl280 \f20 \sbasedon222\snext19 CellFooting;}{\s20\qj\sb240\sl360 \f65535
\sbasedon0\snext20 Document Map;}{\s21\qj\fi-720\li720 \f65535 \sbasedon0\snext21 Indent;}{\s22\qj \f65535\fs20 \sbasedon0\snext22 Plain Text;}}{\info{\title INSTRUCTIONS FOR ICSLP96 AUTHORS}{\author Richard Ogden}}
\paperw11880\paperh16820\margl1151\margr1151\margt1582\margb2098\widowctrl\ftnbj \sectd \sbkodd\linemod0\headery709\footery709\cols1\colsx288 {\header \pard\plain \qj \f20 \par
}{\footer \pard\plain \qj\tqc\tx4800\tqr\tx9520 \f20 CSL paper\tab {\field{\*\fldinst date \\@ "MMMM d, yyyy"}}\tab \chpgn \par
}\pard\plain \s18\qc\sl-340 \b\f20\fs28 ProSynth: An Integrated Prosodic Approach to Device-Independent, Natural-Sounding Speech Synthesis\par
\pard\plain \s5\qc\sb180\sl-219 \i\f20\fs22 Paul Carter{\fs14\up11 ***}, Jana Dankovicov\'87{\fs14\up11 **}, Sarah Hawkins{\fs14\up11 *}, Sebastian Heid{\fs14\up11 *}, Jill House{\fs14\up11 **}, Mark Huckvale{\fs14\up11 **}, John Local{\fs14\up11 ***}
, Richard Ogden{\fs14\up11 ***}\par
\pard\plain \s4\qc\sb180\sl-219 \f20\fs22 {\fs14\up11 *} University of Cambridge, {\fs14\up11 **} University College, London, {\fs14\up11 ***} University of York\par
\pard \s4\qc\sb180\sl-219 \par
\pard\plain \s14\qc \f20\fs8 \sect \sectd \sbknone\linemod0\headery709\footery709\cols1\colsx289 \pard\plain \s2\qc\sb180\sl-280 \b\f20 ABSTRACT{\fs18 \par
}\pard\plain \s1\qj\sb120\sa120\sl360 \f65535 {\f20
This paper outlines ProSynth, an approach to speech synthesis which takes a rich linguistic structure as central to the generation of natural-sounding speech. We start from the assumption that the speech signal is informationally rich, and that this acoust
ic richness reflects linguistic structural richness and underlies the percept of naturalness. Naturalness achieved by structural richness produces a perceptually robust signal intelligible
in adverse listening conditions. ProSynth uses syntactic and phonological parses to model the fine acoustic-phonetic detail of real speech, segmentally, temporally and intonationally. [[In this paper, we present the results of some preliminary tests to eva
luate the effects of modelling timing, intonation and fine acoustic phonetic detail.]]\par
}\pard\plain \s255\qj\sb240\sa60\keepn \b\f21\fs28 1. INTRODUCTION\par
\pard\plain \qj\sb240\sl360 \f20 Speech synthesis by rule (text-to-speech, TTS) has restricted uses because it sounds unnatural and is often difficult to understand. Despite recent impro
vements in grammatical analysis and in deriving correct pronunciations for irregularly-spelled words, there remains a more fundamental problem, that of the inherent incoherence of the synthesized acoustic signal. This typically lacks the subtle systematic
variability of natural speech that underlies the perceptual coherence of syllables and their constituents, and the longer phrases of which they form part. Intonation is often dull and repetitive, timing and rhythm are poor, and modifications that word boun
daries undergo in connected speech are poorly modelled. Much of this incoherence arises because many modern TTS systems encode linguistic knowledge in ways which are not in tune with current understanding of human speech and language processes.\par
\pard \qj\sb240\sl360
Segmental intelligibility data illustrate the scale of the problem. When heard in noise, most synthetic speech loses intelligibility much faster than natural speech: natural speech is about 15% less intelligible at 0 dBs/n ratio than in quiet, whereas for
isolated wo
rds/syllables, Pratt (1986) reported that typical synthetic speech drops by 35%-50%. We can expect similar results today. Concatenated natural speech avoids those problems related solely to voice quality and local segment boundaries, but suffers just as mu
ch from poor models of timing, intonation, and systematic variability in segmental quality that is dependent on word and stress structure. Even when the grammatical analysis is right, one string of words can sound good, while another with the same grammati
cal pattern does not.\par
\pard \qj\sb240\sl360
Interdependencies between grammatical, prosodic and segmental parameters are well known to phoneticians and to everyone who has synthesized speech. When these components are developed for synthesis in separate modules, the apparent convenience is offset by
 the need to capture the interdependencies, which often leads to problems of rule ordering and rule proliferation to correct effects of earlier rules. Much of the robustness of natural speech is lost by neglecting systematic subphonem
ic variability, a neglect that results partly from an inappropriate emphasis on phoneme strings rather than on linguistic structure. Recent research in computational phonology (eg. Bird 1995, Dirksen & Coleman forthcoming) combines highly structured lingui
stic representations (more technically, signs) with a declarative, computationally tractable formalism. Recent research in phonetics (eg. Simpson 1992, Manuel et al. 1992, Hawkins & Slater 1994, Manuel 1995, Zsiga 1995) shows that speech is rich in non-pho
nemic information which contributes to its naturalness and robustness (Hawkins 1995). Work at York (Local 1992a & b, 1994, 1995a & b, Local & Fletcher 1991a & b, Ogden 1992) has shown it is possible to combine phonological with phonetic knowledge by means
of a process known as phonetic interpretation: the assignment of phonetic parameters to pieces of phonological structure. Listening tests have shown that the synthetic speech generated by YorkTalk is interpreted and misinterpreted by listeners in ways that
 are very like those found for natural speech (Local 1993).{\plain \par
}\pard \qj\sb240\sl360 ProSynth, an integrated {\i prosodic}
 approach to speech synthesis, explores the viability of a phonological model that addresses phonetic weaknesses found in current concatenative and formant-based text-to-speech (TTS) systems, in which the speech often sounds unnatural because the rhythm, i
ntonation and fine phonetic detail reflecting coarticulatory patterns are poor. Building on [1, 2, 3, 4], ProSynth integrates and extends existing knowledge to prod
uce the core of a new model of computational phonology and phonetic interpretation which will deliver high-quality speech synthesis. Key objectives are: (1)\~
demonstration of selected parts of a TTS system constructed on linguistically-motivated, declarative computational principles; (2)\~
a system-independent description of the linguistic structures developed; (3) perceptual test results using criteria of naturalness and robustness. To initially test the viability of our approach, we use a set of representati
ve linguistic structures applied to Southern British English. The three focal areas of research are intonation, morphological structure, and systematic segmental variation.\par
\pard\plain \s255\qj\sb240\sa60\keepn \b\f21\fs28 2. The Phonological Model\par
\pard\plain \qj\sb240\sl360 \f20 In this section, we describe in more detail the phonological model used in ProSynth and motivate in more detail the reasons for modelling \lquote segmental\rquote , temporal and intonational fine detail.\par
\pard\plain \s254\qj\sb240\sa60\keepn \b\i\f21 2.1 Overview\par
\pard\plain \qj\sb240\sl360 \f20 Central to ProSynth is a model which encodes phonological information in a hierarchical (rather than merely li
near) fashion using structures based on attribute-value pairs. A declarative framework based on constraint satisfaction identifies for each phonological unit a complete metrical context. This context is a prosodic hierarchy with phonological contrasts avai
lable at all level. The complex interacting levels of rules present in traditional layered systems are replaced in ProSynth by a single phonetic interpretation function operating on the entire context. Phonetics is related to phonology via a one-step phone
tic interpretation function which makes use of as much linguistic knowledge as necessary. Systematic phonetic variability is constrained by position in structure, not by a set of phonological rules. The basis of phonetic interpretation is not the segment,
but phonological features at places in structure. These declarative principles have been successfully demonstrated in YorkTalk (Local & Ogden 1997; Local 1992) for structures of up to three feet. We thus extend the principle successfully demonstrated in [3
, 4], to larger phonological domains.\par
\pard\plain \s254\qj\sb240\sa60\keepn \b\i\f21 2.2\~Phonetic detail and perceptual coherence\par
\pard\plain \qj\sb240\sl360 \f20
More acoustic-phonetic fine detail is included in ProSynth than is standard in synthetic speech, consistent with the view that the signal will be more robust when it includes the patterns of systematic phonetic variability found in natural speech. This vie
w is based on the argument that it is the informational richness of natural speech that makes it such an effective communicative medium. By informational richness, we mean t
hat the acoustic fine detail of the time-varying speech signal reflects multidimensional properties of both vocal-tract dynamics and linguistic structure. The well-known \ldblquote redundancy\rdblquote
 of the speech signal, whereby a phone can be signalled by a number of mor
e-or-less co-occurring acoustic properties, contributes some of this richness, but in our view, other less well-documented properties are just as important. These properties can be roughly divided into two groups: those that make the speech signal sound as
 if it comes from a single talker, and those that reflect linguistic structure\emdash or if you will, those that make it sound as if the talker is using a consistent accent and style of speech. \par
\pard \qj\sb240\sl360
A speech signal sounds as if it comes from a single talker when its properties reflect details of vocal-tract dynamics. This type of systematic variability contributes to the fundamental acoustic coherence of the speech signal, and hence to its perceptual
coherence. By perceptual coherence, then, we mean that the speech si
gnal sounds as if it comes from a single talker because its properties reflect details of vocal-tract dynamics. Listeners associate these time-varying properties with human speech, so that when they bear the right relationships to one another, the perceptu
al system groups them together into an internally coherent auditory stream (cf. Bregman 199xx, Remez 19xx). A wide range of properties seems to contribute to perceptual coherence. The influence of some, like patterns of formant frequencies, is widely ackno
wledged (cf. Remez and Rubin 19xx {\i Science}
 paper). Others are known to be important but are not always well understood; examples are the amplitude envelope which governs some segmental distinctions (cf. Rosen and Howell 19xx) and also perceptions of rhythm and of \lquote integration\rquote
 between stop bursts and following vowels (van Tasell, Soli et al 19xx); and correlations between the mode of glottal excitation and the behaviour of the upper articulators, especially at abrupt segment boundaries (Gobl and NiChasaide 19xx).\par
\pard \qj\sb240\sl360
A speech signal sounds as if the talker is using a consistent accent and style of speech when the allophonic variation is right. This requires producing often small distinctions that reflect different combinations of linguistic properties. As an example, t
ake the words {\i mistakes} and {\i mistimes}. Most people have no difficulty hearing that the /t/ of {\i mistimes} is aspirated whereas that of {\i mistakes} is not. The two words also have quite different rhythms: the first syllable of {\i mistimes}
 has a heavier beat than that of {\i mistakes}
, even though the words begin with the same four phonemes. The spectrograms of the two words in Figure xx confirm the differences in aspiration of the /t/s, and also show that the /m/, /I/, and /s/ also have quite different durations in the two words, cons
istent with the perceived rhythmic difference. These differences arise because the morphology of the words differ: {\i mis} is a removable prefix in {\i mistimes}, but in {\i mistakes} it is part of the word stem. These morphological differences are ref
lected in the syllable strcuture, as shown on the right of the Figure. In {\i mistimes}
, /s/ is the coda of syllable 1, and /t/ is the onset of syllable 2. So the /s/ is relatively short, the /t/ closure is long, and the /t/ is aspirated. XConversely, the /s/ and /t/ in {\i mistakes}
 are ambisyllabic, which means that they form both the coda of syllable 1 and the onset of syllbale 2. On an onset /st/, the /t/ is always unaspirated (cf. {\i step, stop, start). }The differences in the /m/ and the /I/ arise because {\i mist} is a
 phonologcially heavy syllable whereas {\i mis}
 is phonologcially light, and both syllables are metrically weak. So, in these metrically weak syllables, differences in morphology create differences in syllabification and phonological weight, and these appear as differences in duration or aspiration acr
oss all four initial segments.\par
\pard \qj\sb240\sl360 \par
\par
\par
\pard \qj\li720\sb240\sl360 Legend to Figure xx. Left: spectrograms of the words {\i mistimes} (top) and {\i mistakes }(bottom) spoken by a British English woman in the sentence {\i I\rquote d be surprised if Tess _______ it} with main stress on {\i Tess}
. Right: syllabic structures of each word.\par
\pard \qj\sb240\sl360 \par
\pard \qj\sb240\sl360
Some types of systematic variability may contribute both perceptual coherence and information about linguistic structure. So-called resonance effects (Kelly and Local 1989) provide one example. Resonance effects associated with /r/, for example, manifest a
coustically as lowered formant frequencies, and can spread over several syllables, but the factors that determine whether and how far they will spread include syllable stress, the number of co
nsonants in the onset of the syllable, vowel quality, and the number of syllables in the foot (Slater and Hawkins 199x, Tunley 1999). The formant lowering probably reflects slow movements of the tongue body as it accommodates to the complex requirements of
 the English approximant /r/. On the one hand, including this type of information in synthetic speech makes it sound more natural in a subtle way that is hard to describe in phonetic terms but seems to make the signal \ldblquote fit together\rdblquote
 better\emdash in other words,
it seems to make it more coherent. On the other hand, the fact that the temporal extent of rhotic resonance effects depends on linguistic structure means no only that cues to the identity of a single phoneme can be distributed across a number of acoustic s
egments (sometimes several syllables), but also that aspects of the linguistic structure of the affected syllable(s) can also be subtly signalled.\par
\pard \qj\sb240\sl360 Listeners can use this type of distributed acoustic information to identify naturally-spoken words (Marslen-W
ilson and Warren 199x; other wmw refs (Gaskell?); Hawkins and Nguyen submitted-labphon), and when it is included in synthetic speech it can increase phoneme intelligibility in noise by 10-15% or more (Slater and Hawkins, Tunley). Natural-sounding, systemat
ic variation of this type may be especially influential in adverse listening conditions or when cognitive loads are high (cf. Pisoni in van Santen book, Pisoni and Duffy 19xx. sh check these refs.) because it is distributed, thus increasing the redundancy
of the signal. However, Heid and Hawkins (1999 -ICPhS) found similar increases in phoneme intelligibility simply by manipulating the excitation type at fricative-vowel and vowel-fricative boundaries and in the closure periods of voiced stops; these improve
ments to naturalness were quite local. Thus, although only some of the factors mentioned above have been shown to influence perception, on the basis of our own and others\rquote
 recent work (Slater and Hawkins, Tunley, Heid/Hawkins-ICPhS 1999; Pisoni in van Sant
en book, Pisoni and Duffy 19xx, Kwong and Stevens 1999), we suggest that most of those whose perceptual contribution has not yet been tested would prove to enhance perception in at least some circumstances, as developed below. [xxThis para is not great but
 will have to do for now.]\par
\pard\plain \s254\qj\sb240\sa60\keepn \b\i\f21 2.3\tab The Prosodic Hierarchy\par
\pard\plain \qj\sb240\sl360 \f20 Our declarative phonological structure makes extensive use of a prosodic hierarchy, with phonological information distributed across the structure. The knowledge is formally represented as a Directed Ac
yclic Graph (DAG) which is constrained so that re-entrant nodes are only found at the terminal level. Graph-structures in the form of trees are commonly used in phonological analysis, except for the important addition of ambisyllabicity. Phonological attri
butes and their associated values are distributed around the entire prosodic hierarchy rather than at the terminal nodes, as in many phonological theories. Attributes at any level in the hierarchy may be accessed for use in the temporal model.\par
\pard \qj\sb240\sl360 Text is pars
ed into a prosodic hierarchy which has units at the following levels: syllable constituents (Onset, Rhyme, Nucleus, Coda); Syllable; Foot; Accent Group; Intonational Phrase. Linguistic contrast can occur at each level in the hierarchy.\par
\pard\plain \s7\qc\sb120\sl360 \f65535 {\f20\fs20 {\pict\macpict\picw233\pich175\picscaled
05e8ffffffff00cd0111001102ff0c00ffffffffffff0000ffff00000111000000cd00000000000000a0008c001e000cffffffff0001000afffefffe00cc01100007000000000008000a0022f380f38000000023000000a100b6000400010001000700010001000800080022000b00d8001700a10096000a02000000020000
000000000cff3400000001000a00000000000d00210007000000000022f380f3800000002c000a001607436f757269657200030016000d0009002e000400000000002b050a03495020002c0010000d0d5a6170662044696e67626174730003000d000d000c00290f01c00000a0009700a10096000a02000000020000000000
000c0000ffdf0023000000030016000d00090028000a0005034147200003000d000d000c00290f01c10000a0009700a10096000a02000000020000000000000c001fffbd0001000a000000000024005f0023000000030016000d00090029010953796c6c61626c6520000003000d000d000c00292d01c30000030016000402
00000d0009002800150019093c66656174757265730028002000170a666f72202f6c61492f3e0000a00097000c00ad00640001000afffefffe00cc0110002200200049000000a100b60004000100010007000100010022002c00d800170007000000000023000000a100b60004000100010007000100010020002c00d8003a
004800a10096000a02000000020000000000000cffc4ffbf0001000a00000000000c00190007000000000022f380f38000000004000000280008000304466f6f740000a0009700a10096000a02000000020000000000000cffebffdd0001000a000000000021004c0023000000290f0853796c6c61626c6500040200002800
13000d0a3c66656174757265732000002a0b0a666f72202f73406c2f3e00a0009700a0008c000c002f00180001000affb2ffdc008000ee002300000023000000a100b60004000100010007000100010022ffff002527170007000000000023000000a100b60004000100010007000100010022ffff0025da1700a0008d00a1
0096000a02000000020000000000000c0022ffe80001000a000000000021004c0007000000000022f380f3800000000400000028000800120853796c6c61626c650000040200002800130010093c66656174757265730028001e000d0a666f72202f4974732f3e0000a0009700a10096000a01000000020000000000000cff
df00580001000a000000000016009000230000002c000800140554696d65730003001400040000002800080000295061727469616c20747265652073747275637475726520666f7220d24974d5732061206c6965d32e20002a0b23566572746963616c206c696e657320696e646963617465206865616465646e6573732e00
00a0009700a10096000a02000000020000000000000cff66ffcc0001000a00000000000d004300230000000300160028000a001005466f6f74200003000d000d000c00291901c20000a00097000c00bb00400001000afffefffe00cc01100022000a0032000000a100b60004000100010007000100010022004b00d8001700
07000000000023000000a100b60004000100010007000100010022008500d800170007000000000023000000a100b60004000100010007000100010022008500d8af1700a10096000a02000000020000000000000cff8fff660001000a00000000000d00340007000000000022f380f380000000030016000d00090028000a
0007064f6e73657420000003000d000d000c00291e01c60000a0009700a10096000a02000000020000000000000cffb500000001000a00000000000d00410023000000030016000d00090028000a000d065268796d6520000003000d000d000c00291e01c40000a0009700a10096000a02000000020000000000000c0008ff
df0001000a00000000000d00500023000000030016000d00090028000a000f084e75636c65757320000003000d000d000c00292801c50000a00097000c00b400bb0001000afffefffe00cc01100022000a0040000000a100b6000400010001000700010001002200a500d8001700a0008d00ff62696e003135313200003835
0000}}{\f20 \par
}\pard \s7\qc\sb120\sa240\sl360 {\f20 Fig. 1. Partial tree structure of the utterance: \ldblquote it\rquote s a lie\rdblquote . Indices (such as }{\f13 \'c0}{\f20 ) relate to the XML structure in Fig. 2.\par
}\pard\plain \qj\sb240\sl360 \f20 Our prosodic hierarchy, building on House & Hawkins (1995) and Local & Ogden (1997) is a relatively flat, head\_driven, and strictly layered (Selkirk 1984) structure.{\plain }
Each unit is dominated by a unit at the next highest level (Strict Layer Hypothesis [10]). This produces a linguistically well-motivated and computationally tractable hierarchy. Constituents at each level have a set of possible attributes, and relationship
s between units at the same level are determined by the principle of headedness. Structure-sharing is explicitly recognized through ambisyllabicity. Relationships between units at the same level are determined by a principle of headedness. St
ructure sharing is explicitly recognised through ambisyllabicity. \par
\pard \qj\sb240\sl360\tx0 {\i (Fig. e.g. as in ICPhS paper, but expand to show syllabic constituents properly. Could be more elaborate, to include degenerate Foot/AG, and attributes on selected nodes)\par
}\pard \qj\sb240\sl360\tx0 \par
\pard \qj\tx0 {\f22\fs20 IP\par
                           \par
                 AG AG\par
\par
                 F F F\par
                        \par
                 S S S S S \par
                 Fin (d) a be(tt)erone}\par
\pard \qc\sb240\sl360\tx0 Figure 1. Supra\_syllabic tree structure for "Find a better one".\par
\pard \qj\sb240\sl360\tx0
The richness of the hierarchy comes from the information stored within structural nodes in the form of attributes and parameter values. Attributes of the IP, for example, include discourse information which will determine choice of intonation pattern. The
 IP consists of one or more Accent Groups (AGs), which in turn include as attributes specifications for the individual pitch accents making up the intonation contour. The rightmost AG -- the traditional intonational nucleus -- a
cts as the head of the IP.\par
\pard \qj\sb240\sl360 There is no separate level of {\i phonological word} within our hierarchy. Such a unit does not sit happily in a strictly layered structure\emdash
the boundaries of prosodic constituents like AG and Foot may well occur in the middle of a lexical item. Conversely, word boundaries may occur in the middle of a Foot/A
G. Lexico-grammatical information may nonetheless be highly relevant to phonetic interpretation and must not be discarded. The computational representation of our prosodic structure, using the extensible mark\_
up language (XML) (described in more detail below xxx), allows us to get round this problem: word\_level and syntactic\_level information is hyper\_linked into the prosodic hierarchy. In this way lexical boundaries and the grammat
ical functions of words can be used to inform phonetic interpretation. \par
\pard\plain \s254\qj\sb240\sa60\keepn \b\i\f21 \tab Units of Structure and their Attributes\par
\pard\plain \qj\sb240\sl360 \f20 [[WOULD THIS SECTION BENEFIT FROM SOME MORE PICTURES?]]\par
\pard \qj\sb240\sl360 Input text is parsed to head-driven syntactic and phonological hierarchical structures.
 The phonological parse allots material to places in the prosodic hierarchy and is supplemented with links to the syntactic parse. The lexicon itself is in the form of a partially parsed representation. Phonetic interpretation may be sensitive to informati
on at any level, so that it is possible to distinguish, for instance, a plosive in the onset of a weak foot-final syllable from an onset plosive in a weak foot-medial syllable.\par
\pard \qj\sb240\sl360 {\b Headedness}: When a unit branches into sub-constituents, one of these constituen
ts is its Head. If the leftmost constituent is the head, the constituent is said to be left-headed. Feet are left-headed. If the rightmost constituent is the head, the structure is right-headed. Properties of a head are shared by the nodes it dominates [11
]. Therefore a [+heavy] syllable has a [+heavy] rhyme; the syllable-level resonance features [\'b1grave] and [\'b1
round] can also be shared by nodes they dominate: this is how coarticulation is modelled. Phonetic interpretation proceeds head-first and is therefore determined in a structurally principled fashion without resort to extrinsic ordering.\par
\pard \qj\sb240\sl360 {\b Phonological features:} We use binary features, with each {\i attribute} having a {\i value}, where the {\i value} slot can also be filled by another attribute-value{\i }
pair. To our set of conventional features we add the features [\'b1rhotic], to allow us to mimic the long-domain resonance effects of /r/ [5, 8], and [\'b1
ambisyllabic] for ambisyllabic constituents (see below). Not all features are stated at the terminal nodes in the hierarchy: [\'b1voice], for instance, is a property of the rhyme as a whole in order to model durational and resonance effects.\par
\pard \qj\sb240\sl360 {\b Syllables:}
 The Syllable contains the constituents Onset and Rhyme. The rhyme branches into Nucleus and Coda. Nuclei, onsets and codas can all branch. The syllable is right-headed, the rhyme left-headed. Attributes of the syllable are [weight] (values heavy/light), a
nd [strength] (values strong/weak): these are necessary for the correct assignment of temporal compression (\'a42.4). Foot-initial Syllables are strong.\par
\pard \qj\sb240\sl360
Weight is defined with regard to the subconstituents of the Rhyme. A Syllable is heavy if its Nucleus attribute [length] has the value long (in segmental terms, if it contains a long vowel or a diphthong). A Syllable is also heavy if its coda has more than
 one constituent. EXAMPLES\par
\pard \qj\sb240\sl360 There is not a direct relationship between syllable strength and syllable weight. Strong syllables need not be heavy. In {\i loving}, /{\f12407 l\'c3v}/ has a SHORT Nucleus, and the coda has only one constituent (corresponding to /{
\f12407 v}/, yet it is the strong syllable in the Foot. Similarly, weak syllables need not be light. In {\i amazement}
, the final Syllable has a branching Coda (i.e. more than one constituent) and therefore is HEAVY but WEAK. ProSynth does not make use of extrametricality.\par
\pard \qj\sb240\sl360 {\b Ambisyllabicity}
: Constituents which are shared between syllables are marked [+ambisyllabic]. Ambisyllabicity makes it easier to model coarticulation [4] and is an essential piece of knowledge in the overlaying of syllables to produce polysyllabic ut
terances. It is also used to predict properties such as plosive aspiration in intervocalic clusters (\'a42.4).\par
\pard \qj\sb240\sl360 Constituents are [+ambisyllabic] wherever this does not result in a breach of syllable structure constraints. {\i Loving} comprises two Syllables, /{\f12407 l\'c3v}/ and /{\f12407 vIN}/, since /{\f12407 v}
/ is both a legitimate Coda for the first Syllable, and a legitimate Onset for the second. {\i Loveless} has no ambisyllabicity, since /{\f12407 vl}/ is neither a legitimate Onset nor a legitimate Coda. Clusters may be entirely ambisyllabic, as in {\i
risky} (/{\f12407 rIsk}/+/{\f12407 ski}/), where /{\f12407 sk}/ is a good Coda and Onset cluster; partially ambisyllabic (i.e. one consonant is [+ambisyllabic], and one is [-ambisyllabic]), as in {\i selfish} /{\f12407 sElf}/+/{\f12407 fIS}
/), or non-ambisyllabic as in {\i risk them} (/{\f12407 rIsk}/+/{\f12407 D\'abm}/).{\b \par
Feet}
: All syllables are organised into Feet, which are primarily rhythmic units. The foot is left-headed, with a [+strong] syllable at its head, and includes any [-strong] syllables to the right. Types of feet can be differentiated using attributes of [weight]
, [strength] and
 [headedness]. Any phrase-initial, weak syllables are grouped into a weak, headless foot. A syllable with the values [+head, +strong] is stressed. When an IP begins with one or more weak, unaccented syllables, we maintain our strictly layered structure by
organising them into "degenerate" ([light]) feet which are in turn contained within similarly [light] AGs.\par
\pard \qj\sb240\sl360 {\b Accent Groups (AG)}: AGs are made up of one or more Feet, which are primarily units of timing. An accented syllable is a stressed syllable associated
with a pitch accent; an AG is a unit of intonation initiated by such a syllable, and incorporating any following unaccented syllables. The head of the AG is the leftmost heavy foot. A weak foot is also a weak, headless AG. \par
\pard \qj\sb240\sl360 AG attributes include [headedness], pitch accent specifications, and positional information within the IP.\par
\pard \qj\sb240\sl360 {\b Intonational Phrase (IP)}: The IP, the domain of a well-formed, coherent intonation contour, contains one or more AGs; minimally it must include a strong AG. The rightmost AG\emdash traditionally the intonational nucleus\emdash
is the head of the IP. It is the largest prosodic domain recognised in the current implementation of our model.\par
\pard\plain \s254\qj\sb240\sa60\keepn \b\i\f21 2.3\tab Segmental information\par
\pard\plain \qj\sb240\sl360 \f20
The temporal extent of systematic spectral variation due to coarticulatory processes is modelled using two intersecting principles. One reflects how much a given allophone blocks the influence of neighbouring sounds, and is like coarticulation resistance [
12]. The other principle reflects resonance effects, or how far coarticulatory effects
 spread. The extent of resonance effects depends on a range of factors including syllabic weight, stress, accent, and position in the foot, vowel height, and featural properties of other segments in the domain of potential influence. For example, interveni
ng bilabials let lingual resonance effects spread to more distant syllables, whereas other lingual consonants may block their spread; similarly, resonance effects usually spread through unstressed but not stressed syllables.{\i \par
}\pard\plain \s254\qj\sb240\sa60\keepn \b\i\f21 2.4\tab Temporal information\par
\pard\plain \qj\sb240\sl360 \f20 Timing
 relations in ProSynth are handled primarily in terms of (1) temporal compression and (2) syllable overlap. Like spectral detail, temporal effects are treated as an aspect of the phonetic interpretation of phonological representations. Linguistic informati
on necessary for temporal interpretation includes a grammar of syllable and word joins, using ambisyllabicity and an appropriate feature system. Such details as formant transition times, and inherent durational differences between close and open vowels, ar
e handled in the statements of phonetic exponency pertaining to each bundle of features at a given place in structure. \par
\pard \qj\sb240\sl360 {\b A model of temporal compression }allows the statement of relationships between syllables at different places in metrical structure [3], using a knowledge database. For instance, the syllable /man/ in the words {\i man}, {\i
manage}, {\i manager} and in the utterance \ldblquote {\i She\rquote s a bank manager}\rdblquote all have different degrees of temporal compression which can be related to the metrical structure as a whole. The primary timing unit is the syllable.\par
\pard \qj\sb240\sl360 Monosyllabic utterances are not compressed, and therefore have a compression factor of 1. Syllables in polysyllabic feet have a compression factor or less than 1. {\i Distance}
 is an expression of the separation between the end of an Onset and the start of a tautosyllabic Coda, and is used to calculate the temporal compression factor for any given syllable. Distance relates to structural information from the prosodic hierarchy s
uch as position of the Syllable in the Foot (initial
, medial or final), the presence or absence of a preceding Syllable, the position of the Foot in the utterance, and the values of various attributes at all nodes in the hierarchy at or below Syllable level.\par
\pard \qj\sb240\sl360 {\b Syllable overlap:}
 By overlaying syllables to varying degrees (making reference to ambisyllabicity), it is possible to lengthen or shorten intervocalic consonants systematically. There are morphologically bound differences which can be modelled in this way, provided that th
e phonological structure is sensitive to them. For instance, the Latinate prefix {\i in-} is fully overlaid with the stem to which it attaches, giving a short nasal in {\i innocuous}, while the Germanic prefix {\i un-}
 is not overlaid to the same degree, giving a long nasal in {\i unknowing}. Differences in aspiration in pairs like {\i mistake} and {\i mis-take}
 can likewise be treated as differences in phonological structure and consequent differences in the temporal interpretation of those structures.\par
\pard\plain \s254\qj\sb240\sa60\keepn \b\i\f21 2.5\tab Intonational information\par
\pard\plain \qj\sb240\sl360 \f20 There is a dimension of paradigmatic
 choice in modelling intonation: the pitch pattern used is not predictable from structure but is determined by discourse factors. The pattern for an IP depends on the pitch accents assigned to AGs, and on boundary tones associated with the edges of domains
. The interpretation of the selected pitch contour in terms of f0 is, like other phonetic parameters, structure-dependent. Precise alignment of contour turning-points is constrained by the properties of units at lower levels in the hierarchy.\par
\pard\plain \s255\qj\sb240\sa60\keepn \b\f21\fs28 3. Implementation\par
\pard\plain \qj\sb240\sl360 \f20 We have so far recorded and begun analysis of a speech database and implemented our phonological representations using XML.\par
\pard\plain \s254\qj\sb240\sa60\keepn \b\i\f21 3.1\tab Design and Construction of a Database\par
\pard\plain \qj\sb240\sl360\tx0 \f20 [[So far, this text is Jill\rquote s. Perhaps there\rquote s a bit more to add, from CAM and YORK?]]\par
\pard \qj\sb240\sl360\tx0
Analysis for modelling is based on a core speech database of over 450 utterances, recorded by a single male speaker of southern British English. Database speech files have been exhaustively labelled to identify segmental and prosodic constituent boundari
es, using careful hand\_correction of an automated procedure. F0 contours, calculated from a simultaneously recorded Laryngograph signal, can be displayed time\_aligned with constituent boundaries.\par
\pard \qj\sb240\sl360
The database has been designed to exemplify a subset of possible structures, within which we can predict that we will find interesting examples of systematic variability. For preliminary modelling, the range of prosodic structures was restricted to a small
 subset of possible IPs, containing a maximum of 2 AGs. Nucl
ear AGs all contained a single foot of one or two syllables and exemplified an intonation pattern which was essentially the same: a (low) falling tone (H*L), consistent with an unmarked, utterance\_
final discourse context. Even these limited structures show
 systematic variability in the alignment of F0 and the timing of different feet. Accented syllables themselves covered a wide range of sub-syllabic structures, incorporating segmental sequences that differ in the extent to which intervening segments block
the spread of coarticulatory effects. \par
\pard\plain \s254\qj\sb240\sa60\keepn \b\i\f21 3.2\tab Linguistic Representation and Processing\par
\pard\plain \qj\sb240\sl360 \f20 [[To follow. Suitable extracts from Mark's Eurospeech paper will be our best starting-point here. MARK: We need some text to add. Could you provide some please?]]\par
\pard \qj\sb240\sl360
For linguistic representation and processing, we have formatted our computational structures using the extended mark-up language XML [13]. XML provides a powerful and computationally tractable representation for our hierarchical structures. It is also an u
pcoming internet standard and one supported by available toolkits such as the Edinburgh Language Technology Group toolkit LT-XML [14].\par
\pard \qj\sb240\sl360
Currently we are using XML to represent: (1) lexicon, including the parts of speech and word pronunciation data; (2) utterance audio file information, including speaker name, utterance identifier, file name; (3) utterance word sequence, including time alig
nment information and cross references into the syntactic and prosodic hierarchies; (4) utterance parse, including detailed word t
ag, phrase structure and syntactic functions; (5) utterance prosodic structure, including phonetic features derived from the signal.\par
We use \lquote hyperlinks\rquote
 within XML to indicate structural relationships between the syntactic and prosodic hierarchies and word-sequence within an utterance. This allows us, for example, to identify a syllable contained within a particular word or positioned a
t a particular place within a grammatical phrase. The links also allow us to identify the timing of a word from a phonetic alignment with a signal. Fig. 2 shows a partial XML representation of the parsed utterance, \ldblquote It\rquote s a lie\rdblquote
, whose tree structure representation is shown in Fig. 1.\par
\pard \qj\sb180\sl360\box\brsp20\brdrs {\f22\fs16 <}{\b\f22\fs16 IP}{\f22\fs16 }{\f13 \'c0}{\f13\fs16 }{\f22\fs16 START="0.2206" STOP="0.9727">}{\fs16 \par
}\pard \qj\box\brsp20\brdrs \tx1276 {\f22\fs16 <}{\b\f22\fs16 AG}{\f22\fs16 }{\f13 \'c1}{\f13\fs16 }{\f22\fs16 START="0.2206" STOP="0.9727">\'c9}{\fs16 \par
}{\f22\fs16 <}{\b\f22\fs16 FOOT}{\f22\fs16 }{\f13 \'c2}{\f13\fs16 }{\f22\fs16 START="0.5011" STOP="0.9727">}{\fs16 \par
}\pard \qj\fi-567\li567\box\brsp20\brdrs {\f22\fs16 <}{\b\f22\fs16 SYL}{\f22\fs16 }{\f13\fs16 \'c3 }{\f22\fs16 FPOS="1" RFPOS="1" RWPOS="1" START="0.5011" STOP="0.9727" STRENGTH="STRONG" WEIGHT="HEAVY" WPOS="1" WREF="WORD3">}{\fs16 \par
}\pard \qj\box\brsp20\brdrs \tx1276 {\f22\fs16 <}{\b\f22\fs16 ONSET}{\f22\fs16 }{\f13 \'c6}{\f13\fs16 }{\f22\fs16 START="0.5011" STOP="0.6615"}{\fs16 \par
}\pard \qj\box\brsp20\brdrs \tx1276 {\f22\fs16 STRENGTH="WEAK">}{\fs16 \par
}\pard \qj\fi-709\li709\box\brsp20\brdrs \tx1276 {\f22\fs16 <}{\b\f22\fs16 CNS}{\f22\fs16 AMBI="N" CNSCMP="N" CNSGRV="N" CNT=
"Y" FXGRD="52.4" FXMID="115.6" NAS="N" RHO="N" SON="Y" START="0.5011" STOP="0.6615" STR="N" VOCGRV="N" VOCHEIGHT="CLOSE" VOCRND="N" VOI="Y">l</CNS></ONSET>}{\fs16 \par
}\pard \qj\fi-720\li720\box\brsp20\brdrs \tx1276 {\f22\fs16 <}{\b\f22\fs16 RHYME}{\f22\fs16 }{\f13 \'c4}{\f13\fs16 }{\f22\fs16 CHECKED="N" START="0.6516" STOP="0.9727" STRENGTH="WEAK" VOI="N" WEIGHT="HEAVY">}{\fs16 \par
}\pard \qj\fi-720\li720\box\brsp20\brdrs \tx1276 {\f22\fs16 <}{\b\f22\fs16 NUC}{\f22\fs16 }{\f13 \'c5}{\f13\fs16 }{\f22\fs16 CHECKED="N" LONG="Y" START="0.6516" STOP="0.9727" STRENGTH="WEAK" VOI="N" WEIGHT="HEAVY">}{\fs16 \par
}\pard \qj\fi-720\li720\box\brsp20\brdrs \tx1276 {\f22\fs16 <}{\b\f22\fs16 VOC}{\f22\fs16 FXGRD="-160.6" FXMID="106.0" GRV="Y" HEIGHT="OPEN" RND="N" START="0.6516" STOP="0.8620">a</VOC>}{\fs16 \par
}\pard \qj\fi-720\li720\box\brsp20\brdrs \tx1276 {\f22\fs16 <}{\b\f22\fs16 VOC}{\f22\fs16 FXGRD="-105.3" FXMID="95.4" GRV="N" HEIGHT="CLOSE" RND="N" START="0.8620"\par
         STOP="0.9727">I</VOC></NUC>}{\fs16 \par
}\pard\plain \s22\qj\box\brsp20\brdrs \tx1276 \f65535\fs20 {\f22\fs16 </RHYME>\par
   </SYL>\par
  </FOOT>\par
}\pard \s22\qj\keepn\box\brsp20\brdrs \tx1276 {\f22\fs16 </AG>\par
}\pard \s22\qj\keepn\box\brsp20\brdrs \tx1276 {\f22\fs16 </IP>}{\f22\fs16\expnd60 \par
}\pard\plain \s7\qc\sb120\sa240\sl360 \f65535 {\f20 Fig 2. Partial XML representation of utterance: \ldblquote it\rquote s a lie\rdblquote .\par
}\pard\plain \qj\sb240\sl360 \f20 To annotate an existing audio file with XML annotations requires the following steps: (1)\~create a basic XML description of the audio data in the file; (2)\~add in a word level transcription; (3)\~
update with parts of speech and pronunciations to word; (4)\~copy over prosodic structures from lexicon; (5)\~align prosodic structure with automatically-derived phone labels on audio file; (6)\~
transfer parameters of modelled fundamental frequency into XML structure.\par
\pard \qj\sb240\sl360
Our database of XML annotated files can be searched to find structures matching a specific pattern so that analysis can be made of timing, f0 patterns and ultimately segmental realisations in context. To provide the required flexibility of pattern-matching
 across the syntactic and prosodic hierarchies, we have developed our own pattern-matching system. For example, the following pattern\par
\pard\plain \s21\qj\fi-720\li2160\sb180 \f65535 {\f22\fs18 UTT\par
}\pard \s21\qj\fi-720\li2160 {\f22\fs18 .WORDSEQ\par
..WORD(ID=$1) /the/\par
.IP\par
..AG\par
...FOOT\par
....SYL(WREF=$1)\par
.....*RHYME\par
....SYL\par
.....ONSET\par
......CNS /j/\par
}\pard\plain \qj\sb240\sl360 \f20 searches and reports the rhyme in the word \ldblquote the\rdblquote
 before a syllable containing a /j/ in its onset. The indented structure reflects the pattern of the annotation hierarchy. The pattern-matching language will be extended to express the kind of declarative lin
guistic knowledge about timing, fundamental frequency form and segmental realisation in context required by our synthesis system.\par
\pard\plain \s255\qj\sb240\sa60\keepn \b\f21\fs28 4. Future work\par
\pard\plain \qj\sb240\sl360 \f20 AREN\rquote T WE GOING TO ADD A SECTION ABOUT THE PERCEPTUAL TESTS HERE? WE COULD EG. POOL OUR MATERIAL FROM THE ICPhS PAPERS.\par
\pard \qj\sb240\sl360
Work is in progress [15] to automatically copy-synthesize database items into parameters for HLsyn, a Klatt-like formant synthesizer that synthesizes obstruents by means of pseudo-articulatory parameters. This method allows for easy production of utterance
s whose parameters can then be edited. Utterances can be altered to either conform to rules of the model, or to break such rules, t
hus allowing the perceptual salience of particular aspects of phonological structure to be assessed. Tests will assess speech intelligibility when listeners have competing tasks involving combinations of auditory vs. nonauditory modalities, and linguistic
vs. nonlinguistic behaviour.\par
\pard \qj\sb240\sl360 A statistical model based on our hypotheses about relevant phonological factors for temporal interpretation will be constructed from the database, leading to a fuller non-segmental model of temporal compression. Temporal, inton
ational and segmental details will be stated as the phonetic exponents of the phonological structure.{\ul \par
}\pard\plain \s255\qj\sb240\sa60\keepn \b\f21\fs28 5. REFERENCES\par
\pard\plain \s15\qj\fi-284\li556\sb120\sl-219\tx560 \f65535\fs18 {\f20 1.\tab Hawkins, S. \ldblquote Arguments for a nonsegmental view of speech perception.\rdblquote }{\i\f20 Proc. ICPhS XIII}{\f20 , Stockholm. Vol. 3, 18-25, 1995.\par
2.\tab House, J. & Hawkins, S., \ldblquote An integrated phonological-phonetic model for text-to-speech synthesis\rdblquote , }{\i\f20 Proc. ICPhS XIII}{\f20 , Stockholm, Vol. 2, 326-329, 1995.\par
3.\tab Local, J.K. & Ogden R. \ldblquote A model of timing for nonsegmental phonological structure.\rdblquote In Jan P.H. van Santen, R W. Sproat, J. P. Olive & J. Hirschberg (eds.) }{\i\f20 Progress in Speech Synthesis}{\f20
. Springer, New York. 109-122, 1997.\par
4.\tab Local, J.K. \ldblquote Modelling assimilation in a non-segmental rule-free phonology.\rdblquote In G J Docherty & D R Ladd (eds): }{\i\f20 Papers in Laboratory Phonology II}{\f20 . Cambridge: CUP, 190-223, 1992.\par
}\pard \s15\qj\fi-284\li556\sb120\sl-219\tx560 {\f20 5.\tab Kelly, J. & Local, J. }{\i\f20 Doing Phonology.}{\f20 Manchester: University Press, 1989.\par
}\pard \s15\qj\fi-284\li556\sb120\sl-219\tx560 {\f20 6.\tab Hawkins, S., & Nguyen, N. \ldblquote Effects on word recognition of syllable-onset cues to syllable-coda voicing\rdblquote , }{\i\f20 LabPhon VI}{\f20 , York, 2-4 July 1998.\par
7.\tab Hawkins, S. & Slater, A. \ldblquote Spread of CV and V-to-V coarticulation in British English: implications for the intelligibility of synthetic speech.\rdblquote }{\i\f20 ICSLP}{\f20 94, 1: 57-60, 1994.\par
}\pard \s15\qj\fi-284\li556\sb120\sl-219\tx560 {\f20 8.\tab Tunley, A. \ldblquote Metrical influences on /r/-colouring in English\rdblquote , }{\i\f20 LabPhon VI}{\f20 , York, 2-4 July 1998.\par
}\pard \s15\qj\fi-284\li556\sb120\sl-219\tx560 {\f20 9.\tab Fixmer, E. and Hawkins, S. \ldblquote The influence of quality of information on the McGurk effect.\rdblquote Presented at Australian Workshop on Auditory-Visual Speech Processing, 1998.\par
}\pard \s15\qj\fi-284\li556\sb120\sl-219\tx560 {\f20 10.\tab Selkirk, E. O., }{\i\f20 Phonology and Syntax}{\f20 , MIT Press, Cambridge MA, 1984.\par
}\pard \s15\qj\fi-284\li556\sb120\sl-219\tx560 {\f20 11.\tab Broe, M. \ldblquote A unification-based approach to Prosodic Analysis.\rdblquote }{\i\f20 Edinburgh Working Papers in Cognitive Science}{\f20 \~7, 27-44, 1991.\par
}\pard \s15\qj\fi-284\li556\sb120\sl-219\tx560 {\f20 12.\tab Bladon, R.A.W. & Al-Bamerni, A. \ldblquote Coarticulation resistance in English /l/.\rdblquote }{\i\f20 J. Phon}{\f20 4: 137-150, 1976.\par
13.\tab http://www.w3.org/TR/1998/REC-xml-19980210\par
14.\tab http://www.ltg.ed.ac.uk/\par
}\pard \s15\qj\fi-284\li556\sb120\sl-219\tx560 {\f20 15.\tab Heid, S. & Hawkins, S. \ldblquote Automatic parameter-estimation for high-quality formant synthesis using HLSyn.\rdblquote Presented }{\i\f20 at 3rd ESCA Workshop on Speech Synthesis}{\f20
, Jenolan Caves, Australia, 1998.\par
}}



This archive was generated by hypermail 2.0b3 on Thu Jun 10 1999 - 11:37:13 BST