testing material - request for help

Paul Carter (pgc104@york.ac.uk)
Fri, 5 Feb 1999 14:52:53 +0000 ("GMT)

Dear all,

I am compiling sets of testing material which will compare YorkTalk
timings for three parameters:

1 heavy versus light syllables
2 long versus short vowels
3 ambisyllabic versus half-ambi versus non-ambi consonants
(eg -st- versus -nt- versus -nt#)

All of these are in strong-weak feet.

Below are a statement of my problem and requests for feedback.

STATEMENT OF PROBLEM
====================

Constraints due to the need to avoid duplication of tokens and the need to
test each parameter in all contexts of the other parameters mean that I'm
struggling to find enough tokens in the language (never mind the
database). The detail of the YorkTalk parser means I also have to avoid a
lot of items with morphological inflections.

It would be helpful if 5 items in each cell were sufficient for testing
purposes. This means I need 10 pairs of tokens for each set so that we can
test correct versus incorrect timings for both values of the parameter.
For example, if we're testing long versus short vowels in heavy,
nonambisyllabic contexts, I have the following:

"killed them" correct timing
"willed them" timing for "wheeled them"
"pooled them" correct timing
"deemed them" timing for "dimmed them"

... times five.

I am assuming that one subject will then hear five sets of
correct short V timing
short V with long V timing

with the next subject hearing five sets of
correct long V timing
long V with short V timing

This second subject would then hear (amongst other things):

"killed them" timing for "keeled them"
"willed them" correct timing
"pooled them" timing for "pulled them"
"deemed them" correct timing

If I am wrong in this, please put me right. Maybe these will be mixed up
between the subjects so that, for example, one subject doesn't hear all
correct timings for short vowels. That's for S&S to decide; I'm concerned
with how many items I need in all.

All the above means that each set of material needs 10 pairs of tokens.
There are 12 sets:

(L=light, H=heavy; a=ambi, h=half-ambi, n=nonambi; l=long, s=short)

1 Las versus Lns
2 Las versus Has
3 Lns versus Hns
4 Has versus Hhs
5 Has versus Hns
6 Has versus Hal
7 Hhs versus Hns
8 Hhs versus Hhl
9 Hns versus Hnl
10 Hal versus Hhl
11 Hal versus Hnl
12 Hhl versus Hnl

This makes a total of 240 tokens, each of which need a correct and an
incorrect timing, making 480 MBROLA files, most of which do not come from
the database.

This is why I'd rather not have 10 items per cell because that would
result in 960 MBROLA files. Moreover, I don't think I'll be able to get
enough tokens to make 10 per cell. In fact, it will be a struggle to get
[Has versus Hal] up to 5 items.

CRIES FOR HELP
==============

SARAH AND SEBASTIAN:

Am I right in the assumptions I'm making above? Particularly, will 5
items per cell be enough? Will we be needing correct versus incorrect
timings for each token ("killed them", "willed them", "pooled them",
"deemed them" in my example) or correct for half and incorrect for the
other half?

Additionally, will each speaker need to hear, for instance:
5xcorrect short V
5xincorrect short V
5xcorrect long V
5xincorrect long V

This would double the number of files needed to 960 and produce severe
difficulties in finding enough tokens in the language.

However, if it was permissible for a speaker to hear, for example, "killed
them" with correct timings AND "keeled them" with the timings for "killed
them", we'd halve the number of files required.

Could you please enlighten me regarding the experimental requirements in
this respect?

JILL AND JANA (but also to S&S):

None of these files will have an F0, since I don't have your model. I
can't even generate a Mark-like F0 for those items which are in the
database since there are not yet XML files with information from the
hand-labelling available. This means presumably that either we test on a
monotone (which sounds awful) or some such simple F0, or you'll need to
add your F0s to all of my files. Comments?

The more rapid the response, the more welcome it will be. I'm not in on
Thursday or Friday of next week and I'd rather know soon whether I have to
put 480 items through the YorkTalk parser and then hand-produce 2 MBROLA
files for each one in a couple of days next week.

Best wishes,
Paul

_ Paul Carter __________________________________________ pgc104@york.ac.uk _
Dept of Language & Linguistic Science | http://www-users.york.ac.uk/~pgc104/
University of York | telephone: +44 (0)1904 432660
Heslington, York. YO10 5DD | fax: +44 (0)1904 432673