PALS1004 Introduction to Speech Science
UCL Division of Psychology and Language Sciences
PALS1004 Introduction to Speech Science

4. Voice

Learning Objectives

At the end of this topic the student should be able to:


  1. Use of Voice
  2. Phonation in the Larynx acts as a sound source:

    • Modal phonation (normal voicing) is the main sound source for speech production in all spoken languages.

    Phonation and other Larynx gestures can have linguistic functions:

    • Presence vs absence of phonation, e.g. voiced vs voiceless fricatives.
    • Sudden stop in phonation, e.g. glottal stop.
    • Relative timing of phonation start, e.g. Voice Onset Time (VOT) in plosives
    • Fundamental frequency change within words, e.g. lexical tone
    • Fundamental frequency change across utterances, e.g. intonation
    • Voice quality changes contribute to turn-taking in dialogue.

    Voice also communicates the mood, mental state and health of the speaker:

    • Voice quality varies with speaking style, often depending on the context of the communication. E.g. a friendly conversation, a lecture, a complaint.
    • Voice quality changes occur as a consequence of physiological or psychological changes such as mental strain, tiredness, emotional state or health

    Voice has a role to play in creating an identity for the speaker:

    • Voice pitch and quality are important features used by listeners to identify a speaker.
    • Some authorities have shown that we make judgements of personality based on the quality of a person's voice.

  3. Larynx Anatomy
  4. The larynx sits in the airway between the trachea and the pharynx; you see evidence of it as the Adam's apple in the neck. At the base of the larynx is the cricoid cartilage. Above and attached to the cricoid are the thyroid cartilage and a pair of arytenoid cartilages. Through ligaments and muscles, the thyroid can rock back and forth against the cricoid and the arytenoids can be made to swivel.

    The thyroid cartilage surrounds and supports the vocal folds which are two muscular tissues joined together at the front to the thyroid cartilage and separated at the back by attachment to processes on the arytenoid cartilages. Through muscular control, the arytenoids can be swivelled to draw the vocal folds together across the top of the trachea, thereby closing off the air passageway from the lungs. The vocal folds can be changed in length and tension by movements of the arytenoid and thyroid cartilages, and the tension can also be varied by contracting the thyroarytenoid muscles (sometimes called 'vocalis' muscles) that lay inside the folds. The gap between the vocal folds is called the glottis. The ventricular folds or 'false vocal folds' are fleshy structures above the vocal folds which do not normally take part in phonation.

  5. Phonation Cycle
  6. In the normal phonation cycle, the vocal folds are first approximated so that they cover the airway. They are tensed to some degree which sets a vibrational frequency value toward the centre of the range for the speaker. Air from the lungs forces the folds apart and air flow builds up between them. There are two forces which pull the folds back to their central position: the natural elastic qualities of the folds themselves and the Bernoulli effect, which causes a reduction in pressure inside a constricted fluid flow. As these forces pull the folds together, the air flow increases in velocity which increases the reduction in pressure caused by the Bernoulli effect. Eventually the folds "snap" together, cutting off the flow. This "snap" causes a sudden reduction in pressure immediately above the folds, and it is this reduction which is the main source of energy for vocal tract excitation. Once closed, the cycle repeats.

    Changes in the longitudinal tension of the vocal folds, brought about by the arytenoid cartilages changing their length, causes changes in the repetition frequency of the phonation cycle and hence of the pitch of the voice.

    This video clip shows vocal fold vibration in slow motion using a stroboscope. Notice the processes of adduction/abduction which bring the vocal folds together for phonation and apart for breathing. Notice how the change in length of the vocal folds causes changes in the repetition frequency of vocal fold vibration.

  7. Phonation Types
  8. In everyday or modal phonation, the vocal folds are fully approximated (adducted) and have moderate longitudinal tension. Phonation is regular, of moderate pitch, has rapid closures and a long closed phase duration.

    We can contrast modal phonation with other phonation types ('voice qualities'):

    • In whisper phonation, the folds are tensed and rigid, but held slightly apart. The rigidity prevents the folds from vibrating, while the partially opened glottis forms a narrow opening which causes turbulence.
    • In breathy phonation, the folds are tensed appropriately for vibration but not fully approximated (partially adducted) so that complete closures do not occur. This has a number of consequences: first that air flow continues throughout the cycle which can lead to turbulence at the glottis, second that the closures are less sharp, and third that the vocal folds remain open for a longer portion of the cycle.
    • In creaky phonation in contrast, the vocal folds are lax but tightly approximated (fully adducted) and this can lead to cycles which are closed for a longer proportion of the cycle and which are irregular in duration. Creaky voice is commonly found at the bottom of a speaker’s pitch range when the folds are slack anyway. A common form of creaky voice is called Diplophonia, where long and short cycles alternate.
    • In falsetto phonation, the vocal folds are extremely tense and are held in such a way as that only the internal edges of the vocal folds are able to vibrate. This means that the amplitude of phonation is small and of high fundamental frequency.

  9. Voice as Sound
  10. The voice spectrum is dominated by the lower frequencies, and tails off rapidly with increasing frequency. Significant voice energy can often be found up to 5000Hz. Here is a spectrum of modal phonation:

    The spectrum of modal phonation is similar to that of a sawtooth waveform (shown below).

    The sound generated by phonation passes along the vocal tract tube and is modified according to the frequency response of the tube to emerge as speech sounds. Note that the tube changes the relative amplitudes of frequency components of the sound and hence its timbre but has no effect on its repetition frequency and hence no effect on its pitch. We will look at this acoustic process in more detail in Lecture 5.

  11. Measurement
  12. Pitch epoch marking

    To estimate the regularity of vocal fold vibration it is useful to first delimit individual vocal fold cycles found in the speech signal. From these events, sometimes called "pitch epochs", it is possible to measure how adjacent cycles differ from one another.

    Measures of average pitch

    The pitch of a vowel is measured in terms of its fundamental frequency, which is simply the number of glottal cycles that occur per second. To measure the average fundamental frequency (called F0 or Fx) over some interval, we simply count the number of cycles and divide by the duration of the interval.

    Measures of regularity

    There are many possible statistics of voicing irregularity that can be estimated from the delimited glottal cycles. Two common measures are called jitter (period perturbations) and shimmer (amplitude perturbations).

    • Jitter measures the regularity of the pitch epochs - are they spaced equally in time? A common jitter measure is the Period Perturbation Quotient (PPQ), which is the relative percentage variation in glottal cycle duration. It is calculated as the normalised absolute difference between the duration of one cycle and the average cycle duration in a window of 5 cycles centred on the cycle.
    • Shimmer measures the regularity of the size of the speech signal across pitch epochs - are they equal in amplitude? A common shimmer measure is the Amplitude Perturbation Quotient (APQ), which is the relative percentage variation in speech amplitude from cycle to cycle. It is calculated as the normalised absolute difference between the peak amplitude of one cycle and the average peak amplitude in a window of 5 cycles centred on the cycle.
    Measures of breathiness

    While differences in the duration or the strength of glottal cycles can be reasonably well assessed by measurements of durational or amplitude variability, it is harder to assess the amount of turbulent noise energy added to the signal during phonation. Such turbulence is commonly caused by inadequate or incomplete vocal fold adduction, such that air leaks through the remaining gap, becoming turbulent in the process. This gives rise to a perceived "breathiness" in the voice.

    The Harmonic to Noise Ratio (HNR) looks at the waveform shape in adjacent cycles and measures how similar they are. If the jitter is low, but waveform cycles are different to each other, then this is likely to be due to added breathiness.

    Measures of effectiveness

    As well as regularity and breathiness a third characteristic of voice quality is how effective the voice is to carry the linguistic information from speaker to hearer. Voicing has to be loud enough and contain a sufficiently wide range of frequencies to encode speech sounds. A weak/quiet voice has typically little high frequency energy in its spectrum.

    The Soft Phonation Index (SPI) assesses effectiveness through the average ratio of energy of the speech signal in the low frequency band (70-1600 Hz) to the high frequency band (1600-4500 Hz). A larger number means that energy is concentrated in the low frequencies, giving a soft voice.




Laboratory Activities

In this week's lab session you will take part in the following activities:

  1. An investigation of how the quantitative measures of mean fundamental frequency, jitter, shimmer, HNR and SPI vary across the class for an /ɑː/ vowel produced on modal, breathy and creaky voice qualities.
  2. Whole-class demonstration of the Laryngograph. The Laryngograph is an instrument for the non-invasive measurement of vocal fold contact area during speech. The output of the Laryngograph for different voice qualities will be demonstrated.


You can improve your learning by reflecting on your understanding. Come to the tutorial prepared to discuss the items below.

  1. What anatomical structures are involved in changing the pitch of your voice?
  2. What is the Bernoulli effect? Why is it important in voice?
  3. How do you change the loudness of your voice? Is shouting only a change in loudness?
  4. Summarise the differences in larynx settings between modal and breathy phonation, and between modal and creaky phonation.
  5. What is the difference between voiceless /h/ and a whispered vowel, if any?
  6. Give examples of voice that differ in pitch, in regularity, in breathiness, in effectiveness.
  7. What is meant by 'losing one's voice'?
  8. Why do boys' voices 'break' at puberty?

Word count: . Last modified: 13:46 26-Jan-2017.