PALS0009 Introduction to Speech Science

5. Vowels

Key Concepts

Vowels may be described in terms of phonology, phonetics, and acoustics.
There are about 20 phonological choices for vowels in British English.
The Cardinal Vowel system can be used to describe the quality of any vowel in any language.
Vowel quality can be described using terms such as front-back, open-close, rounded-unrounded, short-long, monophthong-diphthong.
The source-filter model of vowel production explains the acoustic form of vowels
The frequency response of the vocal tract pipe used for vowels can be characterised using the frequencies of the first few formants.
Formant frequencies for a given phonological vowel vary across speakers, even of the "same" articulation.

Learning Objectives

At the end of this topic the student should be able to:

identify the phonological vowel choices used in English
use appropriately the vowel descriptors: long/short, monophthong/diphthong, front/back, open/close and rounded/unrounded
describe the nature of vocal tract articulations that give rise to vowels
explain what is shown on the vowel quadrilateral
describe the acoustics of vowel production in terms of the actions of a filter on a sound source
explain how mechanical analogues of vowel sounds can be constructed
make acoustical measurements of vowels that characterise the frequency response of the vocal tract

Topics

Phonology of English Vowels

Phonology is the study of the pronunciation choices that are available in the pronunciation of words in a language and the study of the restrictions on the combinations of those choices in utterances.

There are about 20 phonological choices for vowels in English words (the number varies according to accent, and whether words loaned from other languages are counted). The table below shows words which exemplify 20 choices (in Standard Southern British English). Because there are not enough alphabetic letters to describe the vowel choices conveniently, Phoneticians use symbols that are borrowed from the International Phonetic Alphabet to notate them. The most commonly used symbols for English are shown in the table.

Keyword	IPA Symbol	Character in SSBE
Monophthongs
heed	iː	Close, Front, Long
hid	ɪ	Close, Front, Short
head	e	Half-open, Front, Short
had	æ	Open, Front, Short
Hudd	ʌ	Open, Central, Short
hard	ɑː	Open, Back, Long
hod	ɒ	Open, Back, Short
hoard	ɔː	Half-open, Back, Long
hood	ʊ	Close, Back, Short
who’d	uː	Close, Back, Long
heard	ɜː	Half-open, Central, Long
vanilla	ə	Half-open, Central, Short (Schwa)
Diphthongs
bade	eɪ	Fronting Diphthong
bide	aɪ	Fronting Diphthong
buoyed	ɔɪ	Fronting Diphthong
beard	ɪə	Centering Diphthong
bared	eə	Centering Diphthong
Ruhr	ʊə	Centering Diphthong
hoe	əʊ	Backing Diphthong
how	aʊ	Backing Diphthong

A Phonetic study of vowels looks at the actual articulation or sound that speakers of some accent use to differentiate these choices.

Within an English accent, the phonological vowel choices can be grouped according to their phonetic character, that is the actual pronounced form of the vowel choice. In the table above you will see the typical form of the vowels in Standard Southern British English. You will see that the phonetic form of vowels can be characterised using labels such as open versus close, front versus back, short versus long, and monophthong versus diphthong. These labels might be different in different accents, for example in Geordie, the /əʊ/ diphthong (as in "boat") is produced more like the /ɔː/ monophthong (as in "thought").

The vowel labelled Schwa /ə/ has a special character in that it only appears in unstressed syllables.

Vowel Articulation

To articulate a vowel sound, the tongue, jaw and lips are placed to create a tube between larynx and lips (see MRI images below). The soft palate is normally raised, sealing off the nasal cavity (except in nasalised vowels). No constriction occurs that might cause turbulence (frication).

iː	uː
e	ɔː
æ	ɑː

To make the vowel sound, phonation must occur in the larynx to send a sound through the tube. This sound is changed by its passage through the supra-laryngeal vocal tract, and the vowel sound radiated from the lips is thus a modified or coloured form of larynx buzz.

The Vowel Quadrilateral

We can define an articulatory vowel space in terms of the limits to the extremes of tongue position that still give rise to non-nasal resonant sounds. The idea is that if the tongue position became more extreme than these limits then turbulence would occur. Three tongue positions can be used to demonstrate this: [i] the closest frontest tongue position, [ɯ] the closest backest tongue position, and [ɑ] the openest backest tongue position, as can be seen in the diagram below [from Catford, 1988]:

Give these upper and back extremes, we now consider how to judge variation in height and frontness. The long-standing convention is that we do so with respect to the highest point on the tongue. Different vowels cause the highest point to trace out a grid of positions as shown in the diagram below [from Catford, 1988]:

The thing to note here is that the highest point of the tongue does not coincide with the point of maximum constriction, as you might have thought. For back vowels, the narrowest constriction can occur much further back than the highest point.

We can now add a vowel articulation representing the frontest openest tongue position [a] to create a quadrilateral that represents the highest point of the tongue for four extreme vowels, as shown in the diagram below:

The edges of this diagram can now be enhanced by considering vowels that divide the vertical edges into three equal steps. There is some debate whether we should do this on articulatory grounds or on acoustic grounds, but let's not worry about that for now. Lastly we can add different lip positions for different parts of the quadrilateral to get the Cardinal Vowel chart:

As you can see from the diagram below, cardinal vowels 1-4 have an unrounded lip position, while cardinal vowels 5-8 are rounded [source: Catford, 1988]:

The invention of the Cardinal Vowel system is usually credited to Daniel Jones although its basis goes back to the work of earlier phoneticians. Hear Daniel Jones himself speaking the first 8 Cardinal Vowels [Source]:

The cardinal vowels set out the limits of the vowel quadrilaterial. We can now locate any particular vowel quality as a position in the chart by comparing its quality to each of the cardinal vowels. For Standard Southern British English (SSBE) the monophthongal vowels are located approximately as shown below: [source: Wikipedia]

Acoustics of Vowel Production

The most powerful explanatory model we have for the spectra of speech signals is called the source-filter model. In this model, sounds are generated and then independently shaped (filtered) by the vocal tract acting as a resonating tube. For example, the source of sound in a vowel is vocal fold vibration while the filtering is performed by a characteristically-shaped vocal tract tube extending from larynx to lips. The pitch and voice quality of a vowel is changed by modifying the source; the phonetic quality of a vowel is changed by modifying the filter: i.e. changing the shape of the vocal tract tube. This diagram shows three different tubes generating three different vowel sound spectra from the same larynx buzz spectrum.

Like all simple acoustic systems, we can characterise the effect of a vocal tract tube on the spectrum of a sound passing through it by means of its frequency response. Remember from Week 3 that a frequency response is just a graph of the change in amplitude caused by a system against frequency. The frequency response of a vocal tract tube making a typical vowel can be seen in the middle of the diagram below.

Note how the frequency response diagram describes the effect of the vocal tract system on the input sound to generate the output sound.

When the vocal tract tube is unobstructed, the frequency response of the tube has a shape which contains a small number of peaks and valleys. Peaks in the frequency response graph are caused by the fact that the tube has certain preferred frequencies of vibration. These preferences are called the resonances of the system. Phoneticians use the special term formants for the resonances of the vocal tract demonstrated by peaks in its frequency response. It turns out that a good way to quantify the frequency response of the vocal tract for a vowel is by measuring the frequencies of these formant peaks.

Formant space

If we measure the first few formant frequencies for vowels, we find something rather remarkable:

Notice how the differences in vowel quality are well described by changes in the first two formant frequencies (formant frequencies = frequencies of the peaks in the vocal tract tube frequency response for that vowel).

In a formant model of the vocal tract frequency response, each peak is considered to be a separate simple resonator which can be completely described by its resonant frequency and bandwidth. Studies of formant frequencies for different phonetic vowel qualities show a rough relation between the frequencies of the first two formants (F1, F2) and the position of the vowel on the vowel quadrilateral. This leads to the rule of thumb that F1 is associated with increasing open-ness of vowel articulation, while F2 is related to increasing front-ness of vowel articulation.

But don't take this diagram to mean that formant frequencies are the same for all speakers. Formant frequencies scale with vocal tract length which correlates with a person's height; they also vary with accent. In a famous study, Peterson & Barney(1952), showed considerable variation of F1 & F2 across speakers for the same phonological vowel:

Can you see a relationship between this diagram and the vowel quadrilateral?

Vowels on Spectrograms

The effect of the formant frequencies on the vowel spectrum is clearly seen as thick, dark, slow-moving, horizontal bars on the spectrogram. Click the samples below to see how vowels both sound and look different.

Diphthongs

Diphthongs are vowels which change in quality, as if they were made from two vowel qualities glued together. The gliding movement of a diphthong is caused by movement of the articulators during their production. We can graph this on the vowel quadrilateral. Here for the British English diphthongs: [source: Wikipedia]

The movement of the articulators causes shifts in the formant frequencies of the vocal tract tube, which we can observe on a spectrogram:

Mechanical speaking machines

Attempts to build artefacts which produce speech sounds have a long history. Although all modern techniques involve digital computers, here we celebrate attempts to build mechanical speaking machines. Follow the links if you want to hear how they sounded.

Date

Machine

1771

Erasmus Darwin (Charles Darwin's grandfather) reported:

I contrived a wooden mouth with lips of soft leather, and with a vale back part of it for nostrils, both which could be quickly opened or closed by the pressure of the fingers, the vocality was given by a silk ribbon about an inch long and a quarter of an inch wide stretched between two bits of smooth wood a little hollowed; so that when a gentle current of air from bellows was blown on the edge of the ribbon, it gave an agreeable tone, as it vibrated between the wooden sides, much like a human voice. This head pronounced the p, b, m, and the vowel a, with so great nicety as to deceive all who heard it unseen, when it pronounced the words mama, papa, map, pam; and had a most plaintive tone, when the lips were gradually closed. [Darwin, 1806]

1779

Russian Professor Christian Kratzenstein demonstrated a set of vowel resonators in St. Petersburg in 1779. The resonators, shown below, produced vowel like sounds on constant pitch when they were excited by a reed.

The Exploratorium's Vocal Vowels exhibit is a modern version of Kratzenstein's resonators. Or you can follow some instructions of mine to make your own vowel resonators from plumbing supplies.

1791

Von Kempelen's speaking machine was a complex device consisting of bellows, a reed, whistles and a leather cup which could be manipulated to make speech like sounds. We know its basic design from the reconstruction by the English scientist Charles Wheatstone (~1890). Wheatstone's reconstruction is shown below:

Von Kempelen was a colourful character, his most infamous exploit was the creation of The Turk, a chess playing automaton in 1770 (which was actually controlled by a midget chess master).

1845

The Euphonia: Joseph Faber's Amazing Talking Machine

Sixteen levers or keys "like those of a piano" projected sixteen elementary sounds by which "every word in all European languages can be distinctly produced." A seventeenth key opened and closed the equivalent of the glottis, an aperture between the vocal cords. "The plan of the machine is the same as that of the human organs of speech, the several parts being worked by strings and levers instead of tendons and muscles." [Henry, 1845]

~1860

Alexander & Melville Graham Bell: physical working model of the human vocal tract.

"Following their father's advice, the boys attempted to copy the vocal organs by making a cast from a human skull and moulding the vocal parts in guttapercha. The lips, tongue, palate, teeth, pharynx, and velum were represented. The lips were a framework of wire, covered with rubber which had been stuffed with cotton batting. Rubber cheeks enclosed the mouth cavity, and the tongue was simulated by wooden sections - likewise covered by a rubber skin and stuffed with batting. The parts were actuated by levers controlled from a keyboard. A larynx 'box' was constructed of tin and had a flexible tube for a windpipe. A vocal-cord orifice was made by stretching a slotted rubber sheet over tin supports." [Flanagan, 1994]

1937

R. R. Riesz's talking mechanism

In 1937, R. R. Riesz demonstrated his mechanical talker which, like the other mechanical devices, was more reminiscent of a musical instrument. The device was shaped like the human vocal tract and constructed primarily of rubber and metal with playing keys similar to those found on a trumpet. The mechanical talking device ... produced fairly good speech with a trained operator ... With the ten control keys (or valves) operated simultaneously with two hands, the device could produce relatively articulate speech. Riesz had, through his use of the ten keys, allowed for control of almost every movable portion of the human vocal tract. Reports from that time stated that its most articulate speech was produced as it said the word 'cigarette'. [Cater, 1983]

1989

The Talking Machine art installation by Martin Riches.

The Talking Machine (1989-1991): 32 pipes and air valves, wind chests, magazine bellows, blower, computer. 230cm.

The white box at the bottom is a sound-proof casing enclosing the blower. The flat white box above that is a magazine bellows which evens out the air pressure no matter how many pipes are being played. The air goes up the red hoses to the four wind chests which carry the pipes. The wind chests are transparent so the movement of the electromagnetic valves can be seen and this is further reinforced by LEDs attached to each valve. The black cables carry the signals from the computer with the user interface to the valves.

2001

Takayuki Arai has built many working physical models of the vocal tract for teaching and learning.

2009

Waseda University Mechanical Talkers - a series of robot talkers by Atsuo Takanishi.

We developed WT-7RII (Waseda Talker No. 7 Refined II) in 2009, which have human-like speech production mechanism. WT-7RII consists of the mechanical models of the lung, the vocal cords, the tongue, the jaw, the palate, the velum, the nasal cavity and the lips. These mechanical models are designed based on human anatomy and they have same size as an adult human male to have similar acoustic characters.

Readings

Essential

N.Hewlett, M.Beck, “Introduction to the Science of Phonetics”, Lawrence Erlbaum, 2006, Chapter 5 - Basic Principles of Vowel Description. [in library]

Background

Elizabeth Zsiga, "The Sounds of Language", Wiley Blackwell 2013. Chapter 4 - A Map of the Vowels. [in library]
Peter Ladefoged, "Vowels and Consonants", Blackwell 2001. Chapters 3, 4 & 5. [in library]

Laboratory Activities

In this week's laboratory activity, we will measure the formant frequencies of some vowel sounds then attempt to recreate them using a simulated vocal tract.

Individual measurements of F1 & F2. You will measure the formant frequencies in your own productions of the "corner" vowels /i/, /æ/, /ɑ/ and /u/ in the phrase "Who heeds harsh hazards?"
F1-F2 diagram of class vowels. We will plot an F1-F2 graph from all the measurements of the class.
Articulatory synthesis. You will attempt to synthesize a version of your vowels using a software model of a vocal tract, where the articulators need to be positioned to create the appropriate tube shape.

The data collected in this laboratory session will be useful in the preparation of your coursework.

Reflections

You can improve your learning by reflecting on your understanding. Come to the tutorial prepared to discuss the items below.

Swedish is said to have 17 monophthongal vowels. British English has 12. Most languages have 5 or 6. What might be the advantages or disadvantages of a particularly large or particularly small inventory of vowels?
What is the difference between a phonological vowel and a phonetic vowel?
What aspects of the cardinal vowel diagram are articulatory and which acoustic?
In the source-filter model, what is the source and what is the filter?
What is a formant? Why are formant frequencies interesting in the study of vowels?
What difficulties does a child face when learning to copy adult vowel sounds?

Word count: . Last modified: 13:13 02-Feb-2021.