PALS0009 Introduction to Speech Science
UCL Division of Psychology and Language Sciences
PALS0009 Introduction to Speech Science

8. Prosody

Key Concepts

Learning Objectives

At the end of this topic the student should be able to:


  1. What is Prosody?
  2. The prosody of a sentence refers to suprasegmental properties speech, i.e. properties beyond those described by the individual segments, for example: speaking rate, timing, pausing, articulatory quality, voice quality and pitch. The same sentence (i.e. the same segment sequence) can have different communicative meanings depending on the choice of prosody. Punctuation hints at some aspects of prosody, although prosody has more subtleties than punctuation allows.

    It's raining.It's raining?
    She dressed and fed the baby.She dressed, and fed the baby.
    You can have beans, or cabbage, ...You can have beans or cabbage.

    The prosody of an utterance is often described in terms of three component aspects: rhythm, stress and intonation. Rhythm refers to the timing pattern of individual syllables, Stress refers to the prominence of individual syllables, while Intonation refers to change in voice pitch. However, there is much overlap between these aspects: for example, rhythm is affected by which syllables are chosen to be made prominent, and intonation patterns tend to be executed on stressed syllables. Instead of trying to define these more carefully, we'll look at how changes in prosody are used communicatively in term of: prosodic phrasing, word focus and sentence function.

    Note: some languages also use pitch changes to help differentiate between words; this is called lexical tone. The languages that use lexical tone are called tone languages and examples are Mandarin, Cantonese and the African language Yoruba.

  3. Prosodic Phrasing
  4. When speaking a long sentence, we naturally break it up into word groups, even if there is no indication in punctuation. In this sentence

    Even if he does come he won't be able to stay very long.

    we make a pause after "come" which also divides the sentence into logical sections.

    This "prosodic phrasing" has multiple functions: it helps the speaker plan the upcoming material, it helps the speaker take breaths, and it helps the listener chunk the material into units for interpretation. In some instances, the phrasing can help the listener choose between alternative interpretations:

    My sister | who lives in Edinburgh | has just had twins ||

    My sister who lives in Edinburgh | has just had twins ||

    Here we use a single bar | to mark a word group boundary, and a double bar || to mark an utterance boundary. How would you mark-up this?

    When I got there the bus had left I was furious

    To mark the end of a prosodic phrase, we naturally slow down our speaking rate as we approach the boundary and may insert a short pause. You may also hear changes in pitch (falls or rises) or changes in voice quality (creakiness).

  5. Word Focus
  6. We can draw the attention of the listener to particular words in an utterance by making them more prominent. We can do this by articulating those words more slowly, more carefully, more effortfully or with a change in pitch.

    A typical use for focus is to differentiate to the listener those aspects of the utterance which are "given" (i.e. already known by both parties), and those that are "new". Interestingly, the fact that a word is not made prominent by the speaker also tells the listener something important - it indicates that these are the background facts that the speaker is assuming to be true. For example:

    Where is my meeting on Friday?I know I have a meeting on Friday, but I don’t know where it is taking place.
    Where is my meeting on Friday?I know where other things are taking place on Friday but not the meeting.
    Where is my meeting on Friday?I know where the meetings are taking place on other days, just not the one on Friday.

    Contrastive focus is when the speaker makes prominent parts of a phrase in order to demonstrate a misunderstanding by the listener. For example "I wanted the RED pen not the BLUE one".

    Focus is not necessarily predictable from text - most sentences can have alternative readings in which different words are made prominent. The choice of which elements are placed in focus depends on the pragmatics of the communication rather than the sentence structure or word meaning, that is, focus is to do with the speaker's intentions within the dialogue.

  7. Intonation
  8. Pitch movements that occur over the domain of a whole prosodic phrase and which are related to the function or meaning of the whole phrase are called "Intonation". The intonation of a phrase provides additional information to the listener about the speaker's intentions, whether for example the speaker is certain about the facts expressed, or is requesting a response from the listener.

    The primary intonational distinction in English is between falling and rising pitch patterns expressed on the last lexical stress in the phrase. This is called the "nuclear accent" or "nuclear tone". A falling nuclear tone indicates to the listener that the phrase is complete or definite:

    1. She lent him her ↘CAR
    2. Would you leave the ↘ROOM
    3. Do be ↘QUIET

    Note that b. is grammatically a question, but is spoken as a command. A rising nuclear tone indicates to the listener that the phrase is open-ended or indefinite, usually inviting a response:

    1. She lent him her ↗CAR (really?)
    2. Would you leave the ↗ROOM (polite request)
    3. Do be ↗QUIET (lack of authority)

    Tonal options of rise and fall can be combined to create rising-falling and falling-rising contours in which the rise can cancel or qualify the definiteness of the fall:

    1. She doesn't lend her car to ↘ANYone (falling - definite statement)
    2. She doesn't lend her car to ↗ANYone (rising - querying the fact)
    3. She doesn't lend her car to ↘↗ANYone (falling+rising - qualified statement)

    A context for the last might be: "she only lends her car to close friends".

    We can summarise some common communicative functions and their typical implementation in terms of changes in pitch:

    FunctionCommunicative taskTypical intonation patternExample
    statementconvey informationlow falling it's ˎraining.
    binary questionanswer yes/no, agree/disagree, true/falselow rising it's ˏraining?
    wh-questionask for specific informationhigh falling who are ˋyou?
    alternatives-questionchoose from listrising on first item, falling on last item ˏred, green and ˎblue.
    exclamationemphatic statementhigh falling it's ˋraining!
    conditional statementagree but with conditionsfalling-rising I ˇwill (but)
    challengeexpress certaintyrising-falling I've told you beˆfore.

    Other intonational functions include an indication of attitude ("good morning" is friendlier than "good morning"), and of grammatical structure ("the red planet, as it's known, is fourth from the sun"). The use of pitch can vary across accents, and it can also have a social function, i.e. to indicate membership of a peer group, such as the contemporary use of a high-rising terminal pitch ("uptalk") in young people.

    You can practise listening to and identifying nuclear tones using the On-line Intonation Practice pages.

  9. Laboratory methods
  10. There are a number of laboratory techniques which are useful in the study of prosody. These include:

    • Annotation: It has proven very useful in phonetic research to annotate speech signals such that the location of segments, syllables and phrases may be found automatically. Such labelling of the signal allows for the large-scale analysis of the phonetic form and variation in the realisation of phonological segments, and has been the basis for much experimental phonetics research as well as for technological applications such as speech recognition and speech synthesis.
    • Unfortunately the manual labelling of speech signals with time-aligned annotations is slow, expensive and error-prone. Thus a number of automatic "phoneme alignment" tools are now available which automatically make an alignment between a phonological transcription and the recorded signal. While such tools may not make as good an alignment as human labellers, the fact that they are automatic means that much larger quantities of material can be annotated.

      The Speech Filing System (SFS) tools contain an automatic alignment tool, demonstrated below:

    • Pitch track: methods exist for estimating the fundamental frequency from a recorded speech signal. From this we can derive a fundamental frequency contour or "pitch track". A pitch track shows how the pitch of the voice changes through an utterance which is a key aspect of its intonation. When we look at an Fx contour we can see many features: (i) changes in fundamental frequency that are associated with pitch accents; (ii) the range of Fx used by the speaker; (iii) voiced and voiceless regions; and (iv) regular and irregular phonation.
    • To extract parameters from the pitch track it is common practice to first model the shape of the contour. A common strategy is to stylise the changes in pitch with a sequence of simple shapes, e.g. straight lines:

      The stylised contour can now be represented in terms of the height and slope of a set of pitch segments.

    • Fundamental frequency statistics: We have seen how a pitch track can be estimated from a speech signal. In week 4 we also saw how individual pitch epochs can be located. Once such measurements have been made, it is then possible to calculate summary statistics of fundamental frequency use. So that such statistics are descriptive of the typical speaking habits of the speaker, it is common practice to analyse a read passage of at least 2 minutes in duration.
    • From the analysis of a passage we an calculate such summary statistics as:

      • Distribution of fundamental frequency is a histogram of how much time was spent by the speaker at each pitch level.
      • Mean, median or modal fundamental frequency are measures of the average fundamental frequency (mean=centre of distribution, median=50th percentile, mode=most commonly used).
      • Range of fundamental frequency is a measure of the breadth of the distribution. This can be measured as the standard deviation (if the distribution is bell-shaped) or in terms of the distance between certain percentiles.
      • Percentage regularity is a measure of what percentage of time the speaker was using regular phonation, i.e. for what fraction of time were glottal cycles similar in duration to their neighbours.

      These can be seen in the figure below:




Laboratory Activities

In this week's lab class we will look at intonation using your recordings of a passage and a couple of sentences:


You can improve your learning by reflecting on your understanding. Come to the tutorial prepared to discuss the items below.

  1. Why is prosody considered a suprasegmental characteristic of speech?
  2. How can you make a word stand out in a phrase?
  3. What is the difference between "unstressed", "stressed" and "accented" syllables?
  4. What can statistics such as mean, median, mode, and range tell us about fundamental frequency usage?

Word count: . Last modified: 13:16 04-Mar-2021.