Rachel Smith (ex-undergrad, and about to be current MPhil student here)
will start about 60 hours of labelling for us on Sept 1 or 2 (of course I
have forgotten which). Sebastian and I are preparing the ground. We intend
to:
1. train her on some of Mark's utterances that Cambridge should label.
2. give her a bit of experience in independent labelling of some more of
Mark's utts from the ones we must do.
3. all 3 of us independently label the 20 utterances everyone in the
group was going to do (and send the results to Paul).
4. let Rachel continue with our assigned set of utts to label.
As we remember, and as the minutes of the last prosynth meeting seem to
confirm, it was agreed that:
(1) Paul would
(a) compile a list of labels that we should all use
(b) label 20 utts using the new label set, and circulate
whereupon
(2) we would all label the next 20 utts and send back to Paul, so that we
can assess labelling reliability.
and
(3) Mark would identify the set of utterances from the database that each
site should label independently.
PROGRESS SO FAR
As we understand it, Paul has labelled most of set0, and also sent around
20 files whose labelling was particularly problematic, and which he's
re-done with the new system as discussed in July. These 20 files are:
0010 0020 0023 0032 0037 0038 0046 0047 0048 0067 0068 0071 0077 0078 0079
0088 0091 0092 0096 0101
(details can be found in Paul's email, in archive, "Twenty label files")
As far as we know, (2) and (3) above have not yet been done. So we propose
a couple of things that will let us proceed with Rachel next week. Please
respond ASAP if we've missed something or if you would prefer a different
proposal.
OUR PROPOSAL
(1) 20 files we all label: jhset0: 0-9, 11-19, 21.
(2) files only one site labels:
(a) Cambridge: jhset1 30 to end
(b) UCL: jhset2 30 to end
(c) York: jhset1 0 to 29
jhset2 0 to 29
Sebastian estimates that this is almost fair, by number of labels rather
than number of utterances. York ends up with most, and UCL with least, but
only by a few in each case.
If we don't hear from you, or you read this after Tuesday Sept 1, then
please assume that's what we've gone ahead and started doing.
best wishes
Sarah
______________________________________________________________________
Dr. Sarah Hawkins Email: sh110@cam.ac.uk
Dept. of Linguistics Phone: +44 1223 33 50 52
University of Cambridge Fax: +44 1223 33 50 53
Sidgwick Avenue or +44 1223 33 50 62
Cambridge CB3 9DA
United Kingdom