EUROM1 - Multilingual Speech Corpus


The EUROM1 database contains recordings of 60 speakers in each of seven European Languages: Danish, Dutch, British English, French, German, Norwegian and Swedish. It was explicitly designed to aid the phonetic comparison of languages, with similar materials and recording protocols in all languages.

The EUROM1 corpus was collected by the "Enabling Technology and Research" working group within ESPRIT-project 2589 "Speech Assessment Methodology". A compatible Italian database and compatible databases in Eastern European languages are available from the European Language Resources Association.


Danish, Dutch, English, French, German, Norwegian, Swedish
60 speakers per language
20KHz 16bit sampling, anechoic room
5 CDROMs per language
Content: (for each language)
Many Talker Corpus (30 women, 30 men), 100 numbers, 3 passages, 5 sentences, (speech signal)
Few Talker Corpus (5 women and 5 men), 100 numbers x 5, 15 passages, 25 sentences, C(C)VC(V) x 5, (speech + laryngographic signals)
Very Few Talker Corpus (1 woman and 1 man), C(C)VC(V) material embedded in 5 context, phrases., Context words x 5, (speech + laryngographic signals)


D. Chan, A. Fourcin, D. Gibbon, B. Granstrom, M. Huckvale, G. Kokkinakis, K. Kvale, L. Lamel, B. Lindberg, A. Moreno, J. Mouropoulos, F. Senia, I. Trancoso, C. Veld & J. Zeiliger, "EUROM- A Spoken Language Resource for the EU", in Eurospeech'95. Proceedings of the 4th European Conference on Speech Communication and Speech Technology. Madrid, Spain, 18-21 September, 1995. Vol 1, pp. 867-870

General Documentation

Language Specific Documentation

Danish Danish Project
Danish Technical Appendix
Dutch Dutch Project
Dutch Technical Appendix
English English Project
English Technical Appendix
French French Project
German German Project
German Technical Appendix
Norwegian Norwegian Project
Norwegian Technical Appendix
Swedish Swedish Project
Swedish Technical Appendix


The speech data in EUROM1 is the intellectual property of the individual laboratories that made the recordings. The data may be used for research purposes, but it may not be resold in any form.

The copyright holders for the individual languages are:

Tele Danmark, Jydsk Telefon, Denmark
Royal PTT Nederland NV (KPN), TNO Human Factors Research Institute, Soesterberg, The Netherlands
University College London, United Kingdom
CNRS / INPG (ICP), France
Universitat Bielefeld, Germany
The Norwegian Institute of Technology, SINTEF DELAB and Telenor Research, Norway
Dept of Speech Communication and Music Acoustics, KTH, Sweden


This database is the result of the efforts of many people in many countries. Thank you to all those that took part: either as engineers or speakers.


The database includes all seven languages on over 30 CDs in a presentation folder. We have limited supplies of this database, and we may restrict sales to one per customer. For more information, please contact

EUROM1 Multilingual Corpus (CD) £100 (about US$130)  

x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

University College London - Gower Street - London - WC1E 6BT - Telephone: +44 (0)20 7679 2000 - Copyright © 1999-2016 UCL