EUROM1 - Multilingual Speech Corpus

Description

The EUROM1 database contains recordings of 60 speakers in each of seven European Languages: Danish, Dutch, British English, French, German, Norwegian and Swedish. It was explicitly designed to aid the phonetic comparison of languages, with similar materials and recording protocols in all languages.

The EUROM1 corpus was collected by the "Enabling Technology and Research" working group within ESPRIT-project 2589 "Speech Assessment Methodology". A compatible Italian database and compatible databases in Eastern European languages are available from the European Language Resources Association.

Overview

Languages:
Danish, Dutch, English, French, German, Norwegian, Swedish
Speakers:
60 speakers per language
Protocol:
20KHz 16bit sampling, anechoic room
Media:
5 CDROMs per language
Content: (for each language)
Many Talker Corpus (30 women, 30 men), 100 numbers, 3 passages, 5 sentences, (speech signal)
Few Talker Corpus (5 women and 5 men), 100 numbers x 5, 15 passages, 25 sentences, C(C)VC(V) x 5, (speech + laryngographic signals)
Very Few Talker Corpus (1 woman and 1 man), C(C)VC(V) material embedded in 5 context, phrases., Context words x 5, (speech + laryngographic signals)

Reference

D. Chan, A. Fourcin, D. Gibbon, B. Granstrom, M. Huckvale, G. Kokkinakis, K. Kvale, L. Lamel, B. Lindberg, A. Moreno, J. Mouropoulos, F. Senia, I. Trancoso, C. Veld & J. Zeiliger, "EUROM- A Spoken Language Resource for the EU", in Eurospeech'95. Proceedings of the 4th European Conference on Speech Communication and Speech Technology. Madrid, Spain, 18-21 September, 1995. Vol 1, pp. 867-870

General Documentation

Language Specific Documentation

Danish Danish Project
Danish Technical Appendix
Dutch Dutch Project
Dutch Technical Appendix
English English Project
English Technical Appendix
French French Project
German German Project
German Technical Appendix
Norwegian Norwegian Project
Norwegian Technical Appendix
Swedish Swedish Project
Swedish Technical Appendix

Copyright

The speech data in EUROM1 is the intellectual property of the individual laboratories that made the recordings. The data may be used for research purposes, but it may not be resold in any form.

The copyright holders for the individual languages are:

Danish
Tele Danmark, Jydsk Telefon, Denmark
Dutch
Royal PTT Nederland NV (KPN), TNO Human Factors Research Institute, Soesterberg, The Netherlands
English
University College London, United Kingdom
French
CNRS / INPG (ICP), France
German
Universitat Bielefeld, Germany
Norwegian
The Norwegian Institute of Technology, SINTEF DELAB and Telenor Research, Norway
Swedish
Dept of Speech Communication and Music Acoustics, KTH, Sweden

Acknowledgment

This database is the result of the efforts of many people in many countries. Thank you to all those that took part: either as engineers or speakers.

Ordering

The database includes all seven languages on over 30 CDs in a presentation folder. We have limited supplies of this database, and we may restrict sales to one per customer. For more information, please contact

EUROM1 Multilingual Corpus (CD) £100 (about US$180)  
x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

University College London - Gower Street - London - WC1E 6BT - Telephone: +44 (0)20 7679 2000 - Copyright © 1999-2013 UCL