PLIN2108/PLING216 Intermediate Phonetics
UCL Division of Psychology and Language Sciences
PLIN2108/PLING216 Intermediate Phonetics & Phonology A

4. Phonation

Learning Objectives

At the end of this topic the student should be able to:


  1. Phonation
  2. We use the term Phonation to refer to any sound generating process in the larynx. Thus the term phonation can cover whisper as well as voice.

    Cross-linguistic phonetic studies have yielded several insights into the possible states of the glottis. People can control the glottis so that they produce speech sounds with not only regular voicing vibrations at a range of different pitches, but also harsh, soft, creaky, breathy and a variety of other phonation types. These are controllable variations in the actions of the glottis, not just personal idiosyncratic possibilities or involuntary pathological actions. What appears to be an uncontrollable pathological voice quality for one person might be a necessary part of the set of phonological contrasts for someone else. For example, some American English speakers may have a very breathy voice that is considered to be pathological, while Gujarati speakers need a similar voice quality to distinguish the word /ba̤ɾ/ meaning ‘outside’ from the word /baɾ/ meaning ‘twelve’ (Pandit 1957, Ladefoged 1971). Likewise, an American English speaker may have a very creaky voice quality similar to the one employed by speakers of Jalapa Mazatec to distinguish the word /já̰/ meaning ‘he wears’ from the word /já/ meaning ‘tree’ (Kirk et al. 1993). As was noted some time ago, one person's voice disorder might be another person's phoneme (Ladefoged 1983). [M. Gordon, P. Ladefoged, "Phonation types: a cross-linguistic overview", Journal of Phonetics 29 (2001) 383-406]

    Ladefoged suggested that a continuum of phonation types might be defined in terms of the aperture between the arytenoid cartilages, ranging from voiceless (furthest apart), through breathy voiced, to regular, modal voicing, and then on through creaky voice to glottal closure (closest together). See figure below.

  3. Linguistic and paralinguistic uses of phonation
  4. Linguistic uses of phonation
    • Modal phonation (aka normal voicing) is the main sound source for speech production in all spoken languages.
    • Modal phonation is used to create phonetic contrasts in language in the following ways:
      • Presence vs absence of phonation, e.g. voiced vs voiceless fricatives.
      • Sudden stop in phonation, e.g. glottal stop.
      • Relative timing of phonation start, e.g. Voice Onset Time (VOT) in plosives
      • Fundamental frequency change within words, e.g. lexical tone
      • Fundamental frequency change across utterances, e.g. intonation
    • Voice quality changes can also signal phonological contrast in some languages. For example:
      • Modal voice vs. breathy voice. E.ɡ. Newar language /na/ "it melts", /na̤/ "knead".
      • Modal voice vs. creaky voice. E.g. Danish language /hun/ "female", /hṵn/ "dog". This is an example of the Danish stød, see Language of the Week and Research Paper of the Week.
    • Other laryngeal gestures can play a role in reinforcing other contrasts, for example:
      • Aspiration (glottal friction) can reinforce delayed voicing onset in voiceless plosives.
      • Glottalisation (glottal closure) can reinforce plosives, particularly in geminates.
      • Laryngealisation (creaky voice) in a vowel can reinforce upcoming voiceless plosives.
    Paralinguistic uses of phonation
    • Voice quality varies with speaking style, often depending on the context of the communication. E.g. a lecturing voice, a friendly voice, a confidential voice.
    • Voice quality changes contribute to turn-taking in dialogue. The use of creaky voice to signal the end of a dialogue turn has been noted in various languages. (see Gobl & Ní Chasaide reading or this paper about turn-taking in German).
    • Voice quality changes occur as a consequence of physiological or psychological changes such as mental strain, tiredness, emotional state or health (see this newspaper article on voice and health).
    Other aspects of voice quality

  5. Laryngeal Structures
  6. The larynx sits in the airway between the trachea and the pharynx. At the base of the larynx is the cricoid cartilage. Above and attached to the cricoid are the thyroid cartilage and a pair of arytenoid cartilages. Through ligaments and muscles, the thyroid can rock back and forth against the cricoid and the arytenoids can be made to swivel.

    The thyroid cartilage surrounds and supports the vocal folds which are two muscular tissues joined together at the front to the thyroid cartilage and separated at the back by attachment to processes on the arytenoid cartilages. Through muscular control, the arytenoids can be swivelled to draw the vocal folds together across the top of the trachea, thereby closing off the air passageway from the lungs. The vocal folds can be changed in length and tension by movements of the arytenoid and thyroid cartilages, and the tension can also be varied by contracting the thyroarytenoid muscles (sometimes called 'vocalis' muscles) that lay inside the folds. The gap between the vocal folds is called the glottis. The ventricular folds or 'false vocal folds' are fleshy structures above the vocal folds which do not normally take part in phonation.

  7. Modal Phonation
  8. In Modal or normal voice quality, the vocal folds are approximated so that they completely cover the airway. They are also tensed to some degree which sets a fundamental frequency value toward the centre of the range for the speaker. Air from the lungs forces the folds apart and air flow builds up between them. There are two forces which pull the folds back to their central position: the natural elastic qualities of the folds themselves and the Bernoulli effect, which causes a reduction in pressure inside a constricted fluid flow. As these forces pull the folds together, the air flow increases in velocity which increases the reduction in pressure caused by the Bernoulli effect. Eventually the folds ‘snap’ together, cutting off the flow. This ‘snap’ causes a sudden reduction in pressure immediately above the folds, and it is this reduction which is the main source of sound energy for vocal tract excitation. Once closed, the cycle repeats; in modal voice the cycles are regular and the closures are complete. The spectrum of modal phonation is similar to that of a sawtooth waveform (shown below). The voice spectrum is dominated by the lower frequencies, and tails off rapidly with increasing frequency.

    Spectrum of glottal signal

    Spectrum of sawtooth signal

  9. Other Phonation Types
    • In whisper phonation, the folds are tensed and rigid, but held slightly apart. The rigidity prevents the folds from vibrating, while the partially opened glottis forms a narrow opening which causes turbulence.
    • In breathy phonation, the folds are tensed appropriately for vibration but not fully approximated so that complete closures do not occur. This has a number of consequences: firstly that air flow continues throughout the cycle which can lead to turbulence at the glottis, secondly that the closures are less sharp, and thirdly that the vocal folds remain open for a longer portion of the cycle.
    • In creaky phonation in contrast, the vocal folds are lax but tightly approximated and this can lead to cycles which are closed for a longer proportion of the cycle and which are irregular in duration. Creaky voice is commonly found at the bottom of a speaker’s pitch range when the folds are slack anyway. A common form of creaky voice is called Diplophonia, where long and short cycles alternate.
    • In falsetto phonation, the vocal folds are extremely tense and are held in such a way as that only the internal edges of the vocal folds are able to vibrate. This means that the amplitude of phonation is small and of high fundamental frequency.

  10. Laboratory Methods
  11. We can make measurements of the voice from audio recordings but these have some fundamental limitations. Firstly, any audio recording can be corrupted by background noise, reverberation or poor recording technique. This means that we might attribute to the voice characteristics of the recording caused by the environment, microphone or recorder. Secondly, while it is relatively easy to see phonation cycles in the audio signal it is hard to get precise information about the duration and form of each cycle.

    An alternative to audio recording is to use a Laryngograph ®. The Laryngograph is a device that monitors the vocal fold contact in the larynx without interfering with the processes of articulation. It does this by measuring the electrical impedance through the neck at the level of the larynx. To measure impedance (resistance to current flow) the Laryngograph has two guard-ring electrodes that are placed on the skin on either side of the larynx. A small high-frequency voltage is applied to the centre of one electrode and the circuit is completed using the centre of the other electrode. The use of a high frequency current, and the presence of earthed guard rings ensures that the current flows through the neck rather than across the skin.

    Studies with the laryngograph and a simultaneous fibrescope imaging system have shown the relationship between the phases of the Laryngograph waveform (Lx) and the phases of vocal fold vibration. With the vocal folds apart, current flow is at a minimum. As the vocal folds snap together within a normal cycle, the current flow rises rapidly, indicating that it is degree of vocal fold contact that most affects the measured impedance. During the vocal fold closed phase, the current flow rises to a maximum, and as the vocal folds peel apart, the current slowly falls again.

  12. Laboratory measurement of voice quality
  13. Glottal closure marking

    To estimate the regularity of vocal fold vibration it is useful to first delimit individual vocal fold cycles found in the speech signal. It is common to mark the instants of glottal closure, indicated by a sudden increase in energy in the audio signal, or by a sudden increase in current flow from the Laryngograph. Given these glottal closure instants it is then possible to measure how adjacent cycles differ from one another.

    Measures of average pitch

    The pitch of a vowel is measured in terms of its Fundamental frequency (called F0 or Fx), which is simply the number of glottal cycles that occur per second.

    • The mean fundamental frequency may be simply calculated from the number of cycles found over some interval divided by the duration of the interval.
    Measures of regularity

    There are many possible statistics of voicing irregularity that can be estimated from the delimited glottal cycles. Two common measures are called jitter (period perturbations) and shimmer (amplitude perturbations).

    • Jitter measures the regularity of the pitch epochs - are they spaced equally in time? A common jitter measure is the Period Perturbation Quotient (PPQ), which is the relative percentage variation in glottal cycle duration. It is calculated as the normalised absolute difference between the duration of one cycle and the average cycle duration in a window of 5 cycles centred on the cycle.
    • Shimmer measures the regularity of the size of the speech signal across pitch epochs - are they equal in amplitude? A common shimmer measure is the Amplitude Perturbation Quotient (APQ), which is the relative percentage variation in speech amplitude from cycle to cycle. It is calculated as the normalised absolute difference between the peak amplitude of one cycle and the average peak amplitude in a window of 5 cycles centred on the cycle.
    Measures of breathiness

    While differences in the duration or the strength of glottal cycles can be reasonably well assessed by measurements of durational or amplitude variability, it is harder to assess the amount of turbulent noise energy added to the signal during phonation. Such turbulence is commonly caused by inadequate or incomplete vocal fold adduction, such that air leaks through the remaining gap, becoming turbulent in the process. This gives rise to a perceived "breathiness" in the voice.

    • The Harmonic to Noise Ratio (HNR) looks at the waveform shape in adjacent cycles and measures how similar they are. If the jitter is low, but waveform cycles are different to each other, then this is likely to be due to added breathiness.
    Measures of effectiveness

    As well as regularity and breathiness a third characteristic of voice quality is how effective the voice is to carry the linguistic information from speaker to hearer. Voicing has to be loud enough and contain a sufficiently wide range of frequencies to encode speech sounds. A weak/quiet voice has typically little high frequency energy in its spectrum.

    • The Soft Phonation Index (SPI) assesses effectiveness through the average ratio of energy of the speech signal in the low frequency band (70-1600 Hz) to the high frequency band (1600-4500 Hz). A larger number means that energy is concentrated in the low frequencies, indicating a softer voice.




Laboratory Activities

Activities will be comprised of:

  1. Whole-class analysis of voice quality for a modal voiced /ɑː/ vowel. We will look at how measurements of mean fundamental frequency, jitter, shimmer, HNR and SPI vary across the class, and in particular with sex and height.
  2. Individual analysis of breathy and creaky voice. Using the lab computers record and analyse examples of breathy and creaky voice and compare these to a modal voice recording. Can you interpret the changes in the observed measurements?
  3. Whole-class demonstration of the Laryngograph. The Laryngograph is an instrument for the non-invasive measurement of vocal fold contact area during speech. The output of the Laryngograph for different voice qualities will be demonstrated.

This laboratory session will also be the basis for the practice laboratory report. Further details on the Moodle site.

Research paper of the week

Phonetics of the Danish stød

In this article, Fischer-Jorgensen provides a comprehensive account of the Danish stød.

The stød is a prosodic feature bound to definite syllables in certain word types and connected to the latter part of the syllable. Phonetically it is a phonation type related to creaky voice. The first part of the syllable is characterized acoustically by a higher pitch level and often a higher intensity level than syllables without stød, and by a relatively high subglottal pressure and airflow, thus generally by a relatively high expenditure of energy. In the second part, the stød phase proper, there is a considerable decrease in intensity, particularly in the lower part of the spectrum and, for the majority of the speakers, a noticeable decrease in fundamental frequency, and/or aperiodicity. [...] On the boundary between the first and the second phase most speakers have a strong contraction of the vocalis and lateralis muscles, obviously preparing for the glottal constriction of the second phase. [...] It is suggested that the stød in Danish perhaps originated from a reinforcement of the first syllable in combination with reduction and loss of a following syllable in Common Scandinavian. The reinforcement may have been accompanied by a rise in pitch, so that developments in different directions (involving stød or tonal accents) were possible.

A remarkable feature of this article is the number of different experimental techniques Fischer-Jorgensen exploited to analyze the characteristics of the stød. You will find analyses based on: waveforms, pitch tracks, intensity curves, duration, spectral structure, glottal waveform shape, airflow, sub-glottal and pharyngeal pressure, fibreoptic imaging of the vocal folds, and electro-myographic measurements of innervation of the laryngeal muscles. She also finds time to discuss the historical origins of the stød in the evolution of Danish from earlier Scandinavian languages.

The article is too long for a detailed read, but you might dip into the introduction and discussion to get an idea of what a comprehensive account of a Phonetic phenomenon looks like.

Application of the Week


This week's application of phonetics is the assessment of disorders of the voice, known as dysphonia.

What is dysphonia?

Dysphonia is the medical term for disorders of the voice: an impairment in the ability to produce voice sounds using the vocal organs (it is distinct from dysarthria which signifies dysfunction in the muscles needed to produce speech). Thus, dysphonia is a phonation disorder. The dysphonic voice can be hoarse or excessively breathy, harsh, or rough, but some kind of phonation is still possible (contrasted with the more severe aphonia where phonation is impossible).

Dysphonia has either organic or functional causes due to impairment of any one of the vocal organs. However, typically it is caused by some kind of interruption of the ability of the vocal folds to vibrate normally during exhalation. Thus, it is most often observed in the production of vowel sounds. For example, during typical normal phonation, the vocal folds come together to vibrate in a simple open/closed cycle modulating the airflow from the lungs. Weakness (paresis) of one side of the larynx can prevent simple cyclic vibration and lead to irregular movement in one or both sides of the glottis. This irregular motion is heard as roughness.

[source: Wikipedia]

Causes of dysphonia

[source: NetDoctor]

Assessment of dysphonia

Assessment of dysphonia can be divided into subjective and objective approaches.

In a typical subjective approach, trained listeners assess the voice quality of the patient according to a standard set of perceptual rating scales. Overview of perceptual evaluation of voice. A typical clinical rating scale is the Buffalo Voice Profile.

In a typical objective approach, a sample of speech (typically a long /ɑː/ vowel) is input to a signal analysis workstation, and a large number of signal-based measures of pitch, regularity, breathiness and effectiveness are calculated. The results can be compared to a database of analyses of normal voices, and/or against a set of other recordings of the patient. One clinical tool for voice analysis is called the Multidimensional Voice Profile (MDVP) system.

The relative effectiveness and reliability of subjective and objective approaches to assessment is still debated.

Language of the Week

This week's language is Danish, as spoken in the greater Copenhagen area [ Source material ].

The superscript ʔ are examples of the stød - a kind of creaky-voiced laryngeal gesture that is used to create a phonetic contrast. This video demonstrates some minimal pairs - with and without the stød - at around 1:40. Can you hear the difference? (video in Danish!)


You can improve your learning by reflecting on your understanding. Come to the tutorial prepared to discuss the items below.

  1. In the transcription of the Danish passage, what do each of the diacritics mean?
  2. The article by Fischer-Jorgensen suggests a relationship between lexical tone and voice quality. Suggest one reason why there may be a connection.
  3. What is the Bernoulli effect? Why is it important in voice production?
  4. How do you change the loudness of your voice? Is shouting only a change in loudness?
  5. What anatomical structures are involved in changing the pitch of your voice?
  6. Summarise the differences in larynx settings between modal and breathy phonation, and between modal and creaky phonation.
  7. What is the difference between voiceless /h/ and a whispered vowel, if any?
  8. Can whispered speech have intonation? How?
  9. What is meant by 'losing one's voice'?
  10. Why do boys' voices 'break'?

Word count: . Last modified: 09:57 30-Oct-2016.