Re: Prosynth words

Sarah Hawkins (sh110@cam.ac.uk)
Mon, 24 Nov 1997 22:45:09 +0000 (GMT)

Dear All,

Sorry for the delay in writing: (a) busy, and (b) I did write it last
Friday but the computer crashed just as I was finishing it and had to
leave.

The main thing to say is that, having gone round 3 or more times looking
for words to form a general database, Ali and I have decided that this
approach is relatively fruitless. As we've said before, there are no
perfect sets for what we currently know that we'll want to look at, and
trying to satisfy everyone's needs in a few well-chosen words always meets
with failure: structured lists soon become unstructured, and the number of
words proliferates undesirably while still not yielding perfect sets. (I
said this better last Friday, but I hope you understand.)

So, I have advised Ali to take another tack. Namely, to get on with her
PhD research. This involves designing a particular experiment to do now,
in which she will look at particular words in particular structures. (She
will fill you in on the details relatively soon; here, I am only concerned
with principles.) Then she will have some words and phrases etc to give to
Alex. Based on her results, she'll presumably design another expt,
involving different words in different structures. Etc: current
experiments will at least partially determine future ones.

This is a different approach to the one orginally planned, and I do
apologize about it. It does not preclude a database being worked on now,
(e.g. based on Ali's larger word lists, probably with some pruning and
some more additions - see below) but it does mean that we at Cambridge, at
least, find it difficult to envisage working on an immutable database: we
would rather know that we can parse more words as we find we need them. So
we regret the decision made at our last meeting that Alex would deliver a
parsed database, and not a parser. If our reasoning is right, can that
question be re-opened? (I understand there may be some intellectual
property rights etc. How about a compiled parser?)

Less important point re Richard's suggestions of how to cut down Ali's
database: we don't think there's any need to. That list is just something
to work from that conforms to John's synthesis of what we're all wanting.
In particular, if we understand the issues rightly, we particularly don;t
want to omit inflected forms, affixes, and proper names etc.(I recognize
that proper names often have different rhythmic constraints, but we will
have to face those eventually (albeit not necessarily in this grant
period) because many applications make heavy use of proper names.) So, if
Ali's list is to be cut, we'd prefer there to be independently motivated
reasons.

Third and final point. John's admirable synthesis of everybody's needs,
that led to Ali's 30,000 word list, was structured around syllable codas,
because those are the main focus of York's word joins. Our work has tended
to focus on onsets and ambisyllabic word-medial bits. I see no reason why
we shouldn;t graduate to codas (and indeed, word joins have to take onsets
into consideration too....) but in these first stages, I think we should
restrict ourselves to things we already know something about. So Ali will
almost certainly want to add to that 30,000 words. However, because we
can't work with many words at a time (because to do so takes too long) she
probably won;t be adding very many words.

Before writing this, I had a brief word with Jill. She seemed happy to
provide Alex with a few phrases that conform to the general requirements
and would let him get on with testing his parsing lagorithms. I hope that
this allows everyone to make best use of their time. Obviously, if it
doesn't, we must reconsider our suggestions above. Unitl we hear from you,
Ali will be working on the design for her next expt.

best,

Sarah

______________________________________________________________________

Dr. Sarah Hawkins Email: sh110@cam.ac.uk
Dept. of Linguistics Phone: +44 1223 33 50 52
University of Cambridge Fax: +44 1223 33 50 53
Sidgwick Avenue or +44 1223 33 50 62
Cambridge CB3 9DA
United Kingdom