PALS0009 Introduction to Speech Science

2. Sound and Hearing

Key Concepts

Sounds are tiny, rapid fluctuations in atmospheric pressure that propagate rapidly away from physical vibrations in air.
The subjective dimensions of sound are loudness, pitch and timbre which are (non-linearly) related to the physical sound properties of amplitude, repetition frequency and spectrum
Loudness perception is seen to follow Weber's law, in which perceptual responses are proportional to the logarithm of stimulus size.
Our hearing is extremely sensitive to small sound pressure variations - and may easily be damaged by loud sounds.
Auditory sensations are determined by how sound travels through the ear and how it is converted to nerve firing in the cochlea.

Learning Objectives

At the end of this topic the student should be able to:

explain the difference between objective and subjective descriptions of sound
give a description and an explanation of Weber's Law
describe the three subjective dimensions of sound: loudness, pitch and timbre
explain in general terms how loudness, pitch and timbre are related to the physical sound attributes of amplitude, repetition frequency and spectrum.
express knowledge of the structure of the hearing organ and its functional elements

Topics

Psychoacoustics

We can describe sounds in two radically different ways: objectively in terms of physics or subjectively in terms of perception. In physical terms, sounds are fluctuations in air pressure that propagate rapidly in all directions from a vibrating object. In perceptual terms, sounds are sensations that arise in a listener when the fluctuations in air pressure cause neural activity to be sent from the hearing organ to the brain. The study of the relationship between the objective and the subjective character of sound is called psychoacoustics.

Physical character of sound

We live in an environment of high-pressure gas. At sea level, atmospheric pressure is 101,325Pa, equivalent to a 10,000kg weight on every square metre. Sounds are tiny, rapid fluctuations in that pressure caused by vibrating objects. Vibrational movements disturb the gas local to the object, and the resultant changes in pressure propagate out rapidly in all directions. At normal temperature and pressures, sound propagates at 340ms^-1 (the speed of sound).

Don't confuse the propagation of sound with the movement of air. Sound is the propagation of a pressure wave and does not result in any net movement of the air. Air movements, such as caused by wind (or breathing), occur at velocities much lower than the speed of sound.

A microphone converts the sound pressure fluctuations into voltage changes; these can be sampled to create a record of how the pressure fluctuations vary with time. A graph of sound pressure against time is called a sound waveform.

Perceptual character of sound

It is easy to show that our perception of a steady-state sound can change in at least three independent ways:

Loudness: the sound can change in quantity
Pitch: the sound can change in terms of its melodic aspect
Timbre: the sound can change in quality or colour

These are the three dimensions of sound perception. We can create pairs of sounds that differ only in loudness, only in pitch, or only in timbre.

In addition, sounds can change in time - either slowly such as the onset and offset of a note played on a musical instrument, or rapidly such as vibrato (fluctuations in pitch) or roughness (fluctuations in loudness).

Spoken language uses sequences of sounds differing in loudness, pitch and timbre to encode information.

Loudness

Our sense of loudness is related to the physical size of the pressure fluctuations picked up by our hearing. Roughly speaking we are sensitive to pressure fluctuations between ±20µPa and ±200Pa which occur on top of atmospheric pressure. Fluctuations below 20µPa are inaudible, while fluctuations above 200Pa cause pain. Even 200Pa is a tiny pressure change: it is only about 0.2% of atmospheric pressure.

Thus our hearing has an enormous range of 1:10,000,000 in pressure between quietest and loudest sounds. For this to be possible, the ear maps the pressure levels to loudness levels in a non-linear way.

A side note on Weber's Law

When psychologists studied perception across all senses they noted a common phenomenon - that our perceptions of external stimuli are non-linearly related to their physical size. A linear mapping would be one in which a doubling in the physical size of the stimulus would cause a doubling of the perceptual response. This does not turn out to be the case, as discovered by Ernst Weber:

Weber found that the just noticeable difference between two weights was approximately proportional to the weights. Thus, if the weight of 105 g can (only just) be distinguished from that of 100 g, the jnd (or differential threshold) is 5 g, or in the SI system, a force or weight of 0.005xg N. If the mass is doubled, the differential threshold also doubles to 10 g, so that 210 g can be distinguished from 200 g. In this example, a weight (any weight) seems to have to increase by 5% for someone to be able to reliably detect the increase, and this minimum required fractional increase (of 5/100 of the original weight) is referred to as the "Weber fraction" for detecting changes in weight. Other discrimination tasks, such as detecting changes in brightness, or in tone height (pure tone frequency), or in the length of a line shown on a screen, may have different Weber fractions, but they all obey Weber's law in that observed values need to change by at least some small but constant proportion of the current value to ensure human observers will reliably be able to detect that change. [ Wikipedia ]

Weber's law states that the size of the just-noticeable difference between two stimuli is proportional to the size of the stimuli themselves. The implication of this is that our perception is actually related to the logarithm of the stimulus size (this is sometimes called Feschner's Law). In the diagram below you can see that small changes in x (stimulus size) cause large changes in log(x) (perceptual response) only when x is small. Equal steps of log(x) are caused by changes of contant proportion in x - for example in the graph below the change in log(x) from x=10 to x=20 is the same as the change from x=50 to x=100.

Why has evolution given us a perceptual system which responds to the logarithm of the stimulus size? Firstly such a system makes errors which are also proportional to the stimulus size - and thereby of appropriate consequence. Secondly, a logarithmic mapping allows for a wider dynamic range, it provides perceptible values for a wider range of stimulus sizes.

The audio samples below contrast a sequence of noises increasing linearly in pressure (i.e. 1,2,3,4,5) with a sequence increasing logarithmically (i.e. 1,2,4,8,16). Note how the second set are more "evenly" spaced in loudness.

Linear:

Logarithmic:

Since our subjective perception of loudness is non-linearly related to pressure, it is also useful to use a non-linear scale for objective measures of sound level. The decibel scale is commonly used for logarithmic mappings of this kind. On the decibel scale, 1dB = a change of about 12% (×1.12), 6dB is a doubling of amplitude (×2), and 20dB is a change by a factor of ten (×10).

Using the decibel scale we can then say that our hearing extends over a range of 140dB starting at 20µPa. Conveniently a 1dB change is also close to the Just Noticeable Difference (JND) for loudness, so we can also say that we can perceive about 100 noticeably different levels of loudness.

Pitch

Our sense of pitch is related to the repetition frequency of the sound pressure fluctuations. However, only a subset of sounds have pressure waveforms showing a repeating pattern - these are called periodic sounds - and hence only these have pitch. Sounds that do not have a repeating pattern are called aperiodic sounds (or noise). Here is a periodic waveform (top) and an aperiodic waveform (bottom):

Repetition frequency is measured in terms of hertz (Hz), that is, the number of repetitions per second. The ear responds to repetition frequencies between about 20Hz and 20,000Hz, although listeners vary and there is some debate about whether the perception of high repetition frequencies (above 4000Hz) is best described by the pitch mechanism or the timbre mechanism.

Our sense of pitch allows us to rank periodic sounds of constant loudness on a scale from low to high. In music, the mapping between pitch and repetition frequency is performed using a logarithmic scale. One semitone corresponds to a change of about 6% in repetition frequency. So 12 semitones corresponds to a doubling in repetition frequency or a change by one octave.

In the diagram below, notice how the change in frequency from one note to the next is about ×1.06 (e.g. note 71 to 72 is 523.25/493.88 = 1.059).

The JND for pitch is very small, less than 1%. This means that there are at least 1400 noticeably different pitches between 20 and 4000Hz, although there are only 120 notes in the musical scale.

Timbre

Sounds not only vary in overall amplitude and repetition frequency, they also vary in terms of the waveform shape. Here are two waveforms of the same size and repetition frequency but different shape:

The change in waveform shape changes the proportions of the different elementary sound frequency components present in the sound. Our ear is able to isolate these different components, and our sense of timbre is primarily related to the amplitudes and frequencies of these components.

One way to understand the analysis performed by the ear is to visualise it as a set of 20 or so energy detectors, each tuned to a range of frequency values. Sound components that fall within the range of acceptable frequencies of a detector will will tend to excite it. Since sounds vary in terms of the amplitude of sound components, then the detectors will have different outputs for different sounds. If the difference between the detector outputs for two sounds is more than about 1dB, then we hear the sounds as having a different "quality" or timbre.

The diagrams below show how the amplitude distribution of frequency components present in a sound changes between sounds of two different timbres:

Auditory System

The features of the ear important to the psychology of hearing are:

The pinna is the visible part of the ear. It helps collect and direct sound into the entrance to the ear canal. It also helps locate sounds in space.

The ear canal (external auditory meatus) channels sound toward the ear drum. It also protects the ear drum.
The ear drum (tympanic membrane) converts the sound pressure variations into physical movements. Small changes in air pressure above or below atmospheric pressure cause the membrane to bow inwards or outwards.
The Eustachian tube connects the inner side of the ear drum to the back of the throat. It ensures that the inner side of the membrane is also at atmospheric pressure.
The ossicular chain connects the ear drum to the oval window of the cochlea. It allows for the efficient transfer of energy from the ear drum to the cochlea. It does this by a combination of a leveraging effect and the change in size between ear drum and oval window. The ossicular chain is made up of three bones called the malleus (hammer), incus (anvil) and the stapes (stirrup).
The cochlea contains the organ of hearing. It is a spiral cavity filled with fluid, divided along its length by the cochlear partition. Vibrations entering the oval window cause fluid movement up the spiral along one side of the partition, then down the spiral along the other side of the partition and finally cause movement of the round window.

The basilar membrane forms one part of the boundary between the two partitions. The basilar membrane is like a flexible membrane, but which is narrow and stiff at the base of the cochlear spiral, but wide and flexible at the apex. Different places along the basilar membrane are tuned to best vibrate at different frequencies, with the basal end vibrating best for high frequencies (up to 20,000Hz), and the apical end vibrating best for low frequencies (down to 20Hz).

The organ of Corti sits on the basilar membrane and moves transversally to the fluid flow as the basilar membrane vibrates in response to the sound. The movement of the organ of Corti results in a shearing motion of fine structures called stereocilia that protrude from a type of nerve sensory cell called a hair cell.

Stereocilia are small hair-like structures that bend under influence of movements of the organ of Corti. When they bend they allow ion flow between the surrounding fluid and the inside of the hair cells. This flow causes depolarisation of the cell and ultimately causes a signal to be sent to the brain along the auditory nerve.

The auditory nerve communicates signals between the organ of hearing and the brainstem. From there nerve connections pass signals on to the cochlear nucleus and the auditory cortex.

Readings/Learning Activities

Essential

This video provides an excellent introduction to the structure and function of the ear. However it does confuse the terms "frequency" and "repetition frequency", so take care.

Background

Web tutorial on Logarithms. Provides an introduction to logarithms.
Web tutorial on Loudness. Provides a more detailed discussion about our sense of loudness.
Web tutorial on Pitch. Provides a more detailed discussion about our sense of pitch.
Hewlett and Beck, Introduction to the Science of Phonetics, Chapter 13 - The Mechanism of Hearing. [in library].

Laboratory Activities

This week's lab session consists of the following activities:

Just Noticeable Differences. This is a listening experiment in which we will measure your sensitivity to small changes in loudness, pitch and timbre. We will also calculate a class average.
Pitch & Repetition Frequency. You will use a program that measures the repetition frequency of sounds and reports the values in Hertz and in musical notation.
Sound Localisation. You will hear some sounds recorded using a special binaural recording technique that preserves the features that help localise them in space.

Reflections

You can improve your learning by reflecting on your understanding. Come to the tutorial prepared to discuss the items below.

What is the difference between the loudness of a sound and the intensity of a sound?
If the period of a sound vibration is 1ms, and the speed of sound is 340ms^-1, how big in space is one wave cycle? (This is called the wavelength of the sound)
What is meant by the logarithm of a number?
What are the advantages of perceiving loudness on a logarithmic scale of pressure?
On the decibel scale, an increase in value by a factor of ×2 is equivalent to adding 6dB, and an increase in value by a factor of ×10 is equivalent to adding 20dB. What decibel value is equivalent to an increase of ×4? An increase of ×20? An increase of ×800?
Given a waveform of a periodic sound, how would you measure its repetition frequency?
Draw parallels between the ear's sensitivity to timbre and the eye's sensitivity to colour. What do you think makes up "white noise"?
What is the pinna useful for? What is the Eustachian tube useful for?
How are you able to hear sounds underwater? Why can't you hear people speaking above the surface?
Why might loud sounds damage your hearing?

Word count: . Last modified: 15:50 07-Jan-2019.