Sorry to have left early on Friday due to family duties.
While I appreciate that a decision has been made, and accepted, on the
construction of a corpus including its content, I feel I need to make
clearer my opinions I was trying to clarify on Friday:
1. The corpus should contain transcribed continuous read speech.
2. The corpus should be constructed from material such as MARSEC and PROSICE.
3. The transcriptions should be time-aligned to the actual recording.
4. The transcriptions should be annotated for IPA, stress, and syllable
division, and otherss as discussed on Friday.
5. The transcriptions should be analysed for detailed part-of-speech
information.
6. The transcriptions should be analysed for detailed syntactic information.
Such a corpus will then be integrated and transformed into a database, which
will support analysis at within-word, between-word, within-phrase, between-
phrase, and clausal levels. Interesting phenomena can be automatically
retrieved from the database (eg all NPs with s+s, sw+s, sww+s foot
structures) and analysed according to various contexts and criteria.
The above are actually stated in the proposal, either explicitly or
implicitly. The cut in funding should only mean reduced size.
My reservations about the design agreed on Friday regard:
1. the selection of the content
2. the limited context for the selection
3. the validity of the selection and thus any generalisation
4. the reusability of the corpus for future development
Please, I don't mean to be offensive or rude. I'm only arguing as a matter
of academic discussion, though it sometimes hurts. I remember that at the
Survey somebody would just storm out of the room bacause of such diagreements.
Alex