I don;t suppose this information on its own is very useful, and I;m afraid
I don;t have time to do more than give a pointer right now, but just in
case it helps you make decisions:
I seem to remember that Andrew Slater failed to find the perfect source,
so wrote (or contributed to writing - I forget which right now) a number
of unix scripts that created the sort of thing you're looking for from the
machine-readable version of the OALD. The MRC corpus was also mentioned a
lot, but I don;t know if it was finally used.
He also found, like many others at that time, that Longman's was not only
very expensive but also impossible to get hold of, for strange reasons,
but we were severely constrained by being a commercial grant. Longmans is
(or was) a great deal cheaper for "pure" research.
Andrew also used a number of other corpora for this and that, most
notable perhpas being one distributed by somewhere in Norway (Trondheim?)
A final thought: if BT really want to be invlved, and given that the grant
period is so short, would this type of thing be something they could
contribute to? They MUST have something of this sort.
I can't believe I have more knowledge in these areas than Jill and Mark,
so I won;t go on. But do let me know if you;d like me to think more about
this.
Sarah
______________________________________________________________________
Dr. Sarah Hawkins Email: sh110@cam.ac.uk
Dept. of Linguistics Phone: +44 1223 33 50 52
University of Cambridge Fax: +44 1223 33 50 53
Sidgwick Avenue or +44 1223 33 50 62
Cambridge CB3 9DA
United Kingdom