Applied Research on Voice

This page describes some current and previous research activities in the area of voice and its applications. The principal investigator is Prof. Mark Huckvale.

Prediction of Fatigue and Stress from Voice in Safety-Critical Environments (iVOICE)

The iVOICE project was a feasibility study funded by the European Space Agency under the Artes 20 programme. The project partners were UCL Speech, Hearing and Phonetic Sciences, UCL Mullard Space Sciences Laboratory Centre for Space Medicine and the Gagarin Cosmonaut Training Centre (GCTC) in Star City, Russia. It ran from January 2014 to January 2015.

The goals of iVOICE were to test the feasibility of using changes in the speaking voice as indicators for changes in the levels of fatigue in the speaker or changes in the levels of cognitive load of the speaker. We developed technology that analysed recordings of speech under controlled conditions and predicted levels of fatigue or levels of cognitive load from characteristics of the audio signal.

For fatigue we were fortunate to obtain recordings from seven aeronautical professionals undertaking a training exercise at GCTC in which they had to stay awake for 60hours. We showed that we could make reasonable predictions of how long each of the speakers had been awake from characteristics of their speech. For a task in which we only asked whether the speaker had slept in the past 24 hours, we were able to obtain a 90% accuracy of prediction. The results of this experiment were reported in Baykaner et al (2015).

For cognitive load we performed two experiments, one looked at recordings of subjects performing the Stroop test, one looked at recordings of subjects performing a demanding visual task. The first experiment was reported in Huckvale (2014).

Publications

Prediction of Speaker Age from Voice

The estimation of the age of a speaker from his or her voice has both forensic and commercial applications. Previous studies have shown that human listeners are able to estimate the age of a speaker to within 10 years on average, while recent machine age estimation systems seem to show superior performance with average errors as low as 6 years. However the machine studies have used highly non-uniform test sets, for which knowledge of the age distribution offers considerable advantage to the system. In this study we compare human and machine performance on the same test data chosen to be uniformly distributed in age. We show that in this case human and machine accuracy is more similar with average errors of 9.8 and 8.6 years respectively, although if panels of listeners are consulted, human accuracy can be improved to a value closer to 7.5 years. Both humans and machines have difficulty in accurately predicting the ages of older speakers.

The figure below shows the predicted ages of 52 speakers made by 36 listeners. The mean absolute error of age prediction was about 10years, that is we can often estimate a speaker's age within a decade just by hearing their voice.

Publications

Monitoring of Psychological Well-Being in Long-Term Missions (VULCAN)

The VULCAN project was a feasibility study funded by the European Space Agency under the Artes 20 programme. The project partners are UCL Speech, Hearing and Phonetic Sciences, UCL Mullard Space Sciences Laboratory Centre for Space Medicine and the Institute for Biomedical Problems (IBMP) in Moscow, Russia. It ran from January 2016 to July 2017.

The VULCAN project was part of a larger endeavour investigating how psychological support may be given to astronauts undertaking a long-term mission, for example a mission to Mars that might take up to two years. VULCAN builds on the outcomes of the iVOICE project that showed how signal analysis and machine learning methods may be applied to the prediction of speaker fatigue and cognitive load from voice recordings.

At the heart of VULCAN is a new technology for Longitudinal Voice Analysis. This is a combination of innovative signal analysis methods together with statistical modelling of sequences to uncover either anomalous recordings or long-term trends in the voice. The effectiveness of the technique was explored by applying it to several thousand spoken messages recorded as part of the Mars500 simulated mission to Mars experiment conducted by IBMP in 2010/11.

We have been able to show that the voices of the Mars500 crew did change significantly over the 520day mission, and that the changes were commensurate with known changes in the psychological health of the speakers as they dealt with the stresses of a long period of isolation.

Publications

Detection of the common cold from voice

We took part in the Interspeech 2017 Computational Paralinguistics cold challenge, in which we built a system to detect whether someone had a cold infection from the sound of their voice. On the development test data, we showed that voice features based on the modulation spectrogram could achieve an unweighted recognition accuracy of 68% from short (3-10s) segments of speech signal.

Publications

 


Some other pages on our site you may enjoy:

CochSim - Cochlear Simulation teaching tool

CochSim is a dynamic simulation of the time and frequency analysis performed by the ear. Sound signals such as sinewaves, pulse trains, sawtooth waves and vowels can be fed into an auditory filterbank and the output monitored in a moving animated display. The program shows the vibration of the oval window and the basilar membrane, the haircell activity against filter frequency and time, and an average excitation pattern across the cochlea. More information.

RTGRAM - Real-time Spectrographic Display

RTGRAM is a free program for displaying a real-time scrolling speech spectrogram on Windows computers. More information.