- Prosody is the study of suprasegmental effects
- Prosody is marked by punctuation to a certain extent in English writing, but not all aspects of prosody can be characterized
- The form of prosody relates to timing, prominence, and pitch
- The functions of prosody relate to phrasing, focus, and sentence function, and sometimes even lexical choice
- Sentence function is denoted primarily through intonation using nuclear tones
- Prosodic functions have notable phonetic characteristics, and experimental methods exist for us to study the phonetic form of prosody
At the end of this topic the student should be able to:
- list the utterance characteristics associated with prosody
- describe the main communicative functions of prosody in English
- identify by listening where significant rises and falls of pitch occur in simple utterances
- explain how pitch changes can convert statements into yes/no questions
- What is Prosody?
The prosody of a sentence refers to suprasegmental properties speech, i.e. properties beyond those described by the individual segments, for example: speaking rate, timing, pausing, articulatory quality, voice quality and pitch. The same sentence (i.e. the same segment sequence) can have different communicative meanings depending on the choice of prosody. Punctuation hints at some aspects of prosody, although prosody has more subtleties than punctuation allows.
It's raining. It's raining? Hello. Hello! She dressed and fed the baby. She dressed, and fed the baby. You can have beans, or cabbage, ... You can have beans or cabbage.
The prosody of an utterance is often described in terms of three component aspects: rhythm, stress and intonation. Rhythm refers to the timing pattern of individual syllables, Stress refers to the prominence of individual syllables, while Intonation refers to change in voice pitch. However, there is much overlap between these aspects: for example, rhythm is affected by which syllables are chosen to be made prominent, and intonation patterns tend to be executed on stressed syllables. Instead of trying to define these more carefully, we'll look at how changes in prosody are used communicatively in term of: prosodic phrasing, word focus and sentence function.
Note: some languages also use pitch changes to help differentiate between words; this is called lexical tone. The languages that use lexical tone are called tone languages and examples are Mandarin, Cantonese and the African language Yoruba.
- Prosodic Phrasing
When speaking a long sentence, we naturally break it up into word groups, even if there is no indication in punctuation. In this sentence
Even if he does come he won't be able to stay very long.
we make a pause after "come" which also divides the sentence into logical sections.
This "prosodic phrasing" has multiple functions: it helps the speaker plan the upcoming material, it helps the speaker take breaths, and it helps the listener chunk the material into units for interpretation. In some instances, the phrasing can help the listener choose between alternative interpretations:
My sister | who lives in Edinburgh | has just had twins ||
My sister who lives in Edinburgh | has just had twins ||
Here we use a single bar | to mark a word group boundary, and a double bar || to mark an utterance boundary. How would you mark-up this?
When I got there the bus had left I was furious
To mark the end of a prosodic phrase, we naturally slow down our speaking rate as we approach the boundary and may insert a short pause. You may also hear changes in pitch (falls or rises) or changes in voice quality (creakiness).
- Word Focus
We can draw the attention of the listener to particular words in an utterance by making them more prominent. We can do this by articulating those words more slowly, more carefully, more effortfully or with a change in pitch.
A typical use for focus is to differentiate to the listener those aspects of the utterance which are "given" (i.e. already known by both parties), and those that are "new". Interestingly, the fact that a word is
not made prominent by the speaker also tells the listener something important - it indicates that these are the background facts that the speaker is assuming to be true. For example: Where is my meeting on Friday? I know I have a meeting on Friday, but I don’t know where it is taking place. Where is my meeting on Friday? I know where other things are taking place on Friday but not the meeting. Where is my meeting on Friday? I know where the meetings are taking place on other days, just not the one on Friday.
Contrastive focus is when the speaker makes prominent parts of a phrase in order to demonstrate a misunderstanding by the listener. For example "I wanted the RED pen not the BLUE one".
Focus is not necessarily predictable from text - most sentences can have alternative readings in which different words are made prominent. The choice of which elements are placed in focus depends on the pragmatics of the communication rather than the sentence structure or word meaning, that is, focus is to do with the speaker's intentions within the dialogue.
Pitch movements that occur over the domain of a whole prosodic phrase and which are related to the function or meaning of the whole phrase are called "Intonation". The intonation of a phrase provides additional information to the listener about the speaker's intentions, whether for example the speaker is certain about the facts expressed, or is requesting a response from the listener.
The primary intonational distinction in English is between falling and rising pitch patterns expressed on the last lexical stress in the phrase. This is called the "nuclear accent" or "nuclear tone". A falling nuclear tone indicates to the listener that the phrase is complete or definite:
- She lent him her ↘CAR
- Would you leave the ↘ROOM
- Do be ↘QUIET
Note that b. is grammatically a question, but is spoken as a command. A rising nuclear tone indicates to the listener that the phrase is open-ended or indefinite, usually inviting a response:
- She lent him her ↗CAR (really?)
- Would you leave the ↗ROOM (polite request)
- Do be ↗QUIET (lack of authority)
Tonal options of rise and fall can be combined to create rising-falling and falling-rising contours in which the rise can cancel or qualify the definiteness of the fall:
- She doesn't lend her car to ↘ANYone (falling - definite statement)
- She doesn't lend her car to ↗ANYone (rising - querying the fact)
- She doesn't lend her car to ↘↗ANYone (falling+rising - qualified statement)
A context for the last might be: "she only lends her car to close friends".
We can summarise some common communicative functions and their typical implementation in terms of changes in pitch:
Function Communicative task Typical intonation pattern Example statement convey information low falling it's ˎraining. binary question answer yes/no, agree/disagree, true/false low rising it's ˏraining? wh-question ask for specific information high falling who are ˋyou? alternatives-question choose from list rising on first item, falling on last item ˏred, green and ˎblue. exclamation emphatic statement high falling it's ˋraining! conditional statement agree but with conditions falling-rising I ˇwill (but) challenge express certainty rising-falling I've told you beˆfore.
Other intonational functions include an indication of attitude ("good ↘morning" is friendlier than "good ↘morning"), and of grammatical structure ("the red planet, as it's known, is fourth from the sun"). The use of pitch can vary across accents, and it can also have a social function, i.e. to indicate membership of a peer group, such as the contemporary use of a high-rising terminal pitch ("uptalk") in young people.
You can practise listening to and identifying nuclear tones using the On-line Intonation Practice pages.
- Laboratory methods
There are a number of laboratory techniques which are useful in the study of prosody. These include:
- Annotation: It has proven very useful in phonetic research to annotate speech signals such that the location of segments, syllables and phrases may be found automatically. Such labelling of the signal allows for the large-scale analysis of the phonetic form and variation in the realisation of phonological segments, and has been the basis for much experimental phonetics research as well as for technological applications such as speech recognition and speech synthesis.
- Pitch track: methods exist for estimating the fundamental frequency from a recorded speech signal. From this we can derive a fundamental frequency contour or "pitch track". A pitch track shows how the pitch of the voice changes through an utterance which is a key aspect of its intonation. When we look at an Fx contour we can see many features: (i) changes in fundamental frequency that are associated with pitch accents; (ii) the range of Fx used by the speaker; (iii) voiced and voiceless regions; and (iv) regular and irregular phonation.
- Fundamental frequency statistics: We have seen how a pitch track can be estimated from a speech signal. In week 4 we also saw how individual pitch epochs can be located. Once such measurements have been made, it is then possible to calculate summary statistics of fundamental frequency use. So that such statistics are descriptive of the typical speaking habits of the speaker, it is common practice to analyse a read passage of at least 2 minutes in duration.
- Distribution of fundamental frequency is a histogram of how much time was spent by the speaker at each pitch level.
- Mean, median or modal fundamental frequency are measures of the average fundamental frequency (mean=centre of distribution, median=50th percentile, mode=most commonly used).
- Range of fundamental frequency is a measure of the breadth of the distribution. This can be measured as the standard deviation (if the distribution is bell-shaped) or in terms of the distance between certain percentiles.
- Percentage regularity is a measure of what percentage of time the speaker was using regular phonation, i.e. for what fraction of time were glottal cycles similar in duration to their neighbours.
Unfortunately the manual labelling of speech signals with time-aligned annotations is slow, expensive and error-prone. Thus a number of automatic "phoneme alignment" tools are now available which automatically make an alignment between a phonological transcription and the recorded signal. While such tools may not make as good an alignment as human labellers, the fact that they are automatic means that much larger quantities of material can be annotated.
The Speech Filing System (SFS) tools contain an automatic alignment tool, demonstrated below:
To extract parameters from the pitch track it is common practice to first model the shape of the contour. A common strategy is to stylise the changes in pitch with a sequence of simple shapes, e.g. straight lines:
The stylised contour can now be represented in terms of the height and slope of a set of pitch segments.
From the analysis of a passage we an calculate such summary statistics as:
These can be seen in the figure below:
- N.Hewlett, M.Beck, “Introduction to the Science of Phonetics”, Lawrence Erlbaum, 2006, Chapter 9, Fundamental Frequency.
- J.C.Wells, "English Intonation", Cambridge, 2006. [in library].
In this week's lab class we will look at intonation using your recordings of a passage and a couple of sentences:
- Distributional analysis of the pitch of a read passage
- Analysis of the pitch contour of a sentence and a question
- Pitch manipulation of a statement into a question
You can improve your learning by reflecting on your understanding. Come to the tutorial prepared to discuss the items below.
- Why is prosody considered a suprasegmental characteristic of speech?
- How can you make a word stand out in a phrase?
- What is the difference between "unstressed", "stressed" and "accented" syllables?
- What can statistics such as mean, median, mode, and range tell us about fundamental frequency usage?
Word count: . Last modified: 13:16 04-Mar-2021.