Alex Chengyu Fang

Department of Phonetics and Linguistics
University College London
Gower Street, London WC1E 6BT

Phone 0171 380 7777 ext 3169
Fax   0171 383 4108
Email alex@phonetics.ucl.ac.uk
WWW   http://www.phon.ucl.ac.uk/home/alex/home.htm


Corpus Construction and Annotation


The HKUST Corpus

HKUST refers to the Hong Kong University of Science and Technology. The corpus contains one million words of samples of computer science text books written in English. The corpus was completed in 1992 at the Language Centre of the university.

I started the project and designed the structure of the corpus.

The following publication was made:


The ICE Corpus

ICE stands for the International Corpus of English, an international project to investigate varieties of English as an international language. The project was started by Professor Sidney Greenbaum, the late Director of the Survey of English Usage, University College London.

My involvement in the project was the design of both a wordclass tagger and a syntactic parser for the project. I also extensively changed the annotation schemes whereby the British component (ICE-GB) has been analysed both grammatically and syntactically.

My publications about ICE include:


The PROSICE Corpus

PROSICE is a corpus for studies of English prosodic features. It is a collection of re-recorded ICE-GB texts with high technical specifications. The corpus has been syntactically analysed and temporally aligned. The corpus was constucted by Dr Mark Huckvale and me.

My contribution to the project mainly included the syntactic analyses of the corpus.

Publications include:


Author: Alex Chengyu Fang - Last updated: 2 October 1997