Alex Chengyu Fang
Department of Phonetics and Linguistics
University College London
Gower Street, London WC1E 6BT
Phone 0171 380 7777 ext 3169
Fax 0171 383 4108
Email alex@phonetics.ucl.ac.uk
WWW http://www.phon.ucl.ac.uk/home/alex/home.htm
Corpus Construction and Annotation
The HKUST Corpus
HKUST refers to the Hong Kong University of Science and Technology.
The corpus contains one million words of samples of computer science text
books written in English. The corpus was completed in 1992 at the Language
Centre of the university.
I started the project and designed the structure of the corpus.
The following publication was made:
- Fang, A.C. 1992. Building a Corpus of Computer Science English. In
English Language Corpora: Design, Analysis and Exploitation, ed. by J.
Aarts, P. de Hann, and N. Oostdijk. Amsterdam: Rodopi. pp 73-78.
The ICE Corpus
ICE stands for the International Corpus of English, an international
project to investigate varieties of English as an international language.
The project was started by Professor Sidney Greenbaum, the late Director
of the Survey of English Usage, University College London.
My involvement in the project was the design of both a wordclass tagger
and a syntactic parser for the project. I also extensively changed the
annotation schemes whereby the British component (ICE-GB) has been analysed
both grammatically and syntactically.
My publications about ICE include:
- Fang, A.C. Forthcoming. Prepositional Phrases: Towards the Automatic
Determination of their Syntactic Functions.
- Fang, A.C. 1997. Verb Forms and Subcategorisations.
In Oxford Literary and Linguistic Computing, 12:4.
- Fang, A.C. and Yamazaki, S. 1997. The
International Corpus of English and TEFL: In Memory of Professor Sidney
Greenbaum. In Daito Gogaku Kyoiku Ronshu, 5. Tokyo: Daito Bunka Univeristy.
pp 7-39.
- Yamazaki, S. and A.C. Fang. 1997. Corpus-Based English Teaching. In
Daito Bunka Daigaku Gogaku Kyoiku Kenkyujo Shoho, No. 15. Tokyo: Daito
Bunka University. pp 2-4.
- Fang, A.C. 1996a. AUTASYS: Automatic
Tagging and Cross-Tagset Mapping. In Comparing English World Wide:
The International Corpus of English, ed. by Sidney Greenbaum. Oxford: Oxford
University Press. pp 110-124.
- Fang, A.C. 1996b. Automatically
Generalising a Wide-Coverage Formal Grammar. In Synchronic Corpus Linguistics,
ed. by C. Percy, C. Meyer, and I. Lancashire. Amsterdam and Atlanta: Rodopi.
pp 131-146.
- Fang, A.C. 1996c. The Survey Parser:
Design and Development. In Comparing English World Wide: The International
Corpus of English, ed. by Sidney Greenbaum. Oxford: Oxford University Press.
pp 142-160.
- Fang, A.C. 1995. The Distribution
of Infinitives of Contemporary British English: A Study Based on the British
ICE Corpus. In Oxford Literary and Linguistic Computing, 10:4. pp 247-257.
- Fang, A.C. 1994. ICE: Applications
and Possibilities in NLP. In Proceedings of International Workshop
on Directions of Lexical Research, 15-17 August 1994, Beijing. pp 23-44.
- Fang, A.C. and G. Nelson. 1994. Tagging
the SEU Corpus: a LOB to ICE Experiment Using AUTASYS. In Oxford Literary
and Linguistic Computing, 9:2. pp 189-194.
The PROSICE Corpus
PROSICE is a corpus for studies of English prosodic features. It is
a collection of re-recorded ICE-GB texts with high technical specifications.
The corpus has been syntactically analysed and temporally aligned. The
corpus was constucted by Dr Mark Huckvale and me.
My contribution to the project mainly included the syntactic analyses
of the corpus.
Publications include:
- Fang, A.C. and M. Huckvale. 1996. Syncronising
Syntax with Speech Signals. In Speech,
Hearing, and Language, ed. by V. Hazan, M. Holland, S. Rosen. Department
of Phonetics and Linguistics, University College London. pp 11-26.
- Huckvale, M. and A.C. Fang. 1996. PROSICE: A Spoken English Database
for Prosodic Research. In Comparing English World Wide: The International
Corpus of English, ed. by Sidney Greenbaum. Oxford: Oxford University
Press. pp 262-279.
Author: Alex Chengyu Fang - Last updated: 2 October
1997