UCL Division of Psychology & Language Sciences

John Wells’s phonetic blog


To see the phonetic symbols in the text, please ensure that you have installed a Unicode font that includes all the IPA symbols, for example Charis SIL (free download).


Browsers: Some versions of Internet Explorer and Safari have bugs that prevent the proper display of certain phonetic symbols. I recommend Firefox (free) or, if you prefer, Opera (also free).


RSS feed for this site

John Wells

Thursday 15 March 2007

Chinese apical vowels

Mandarin Chinese has two very unusual vowels. We touched on them when considering the question of shih tzus (26-27 Jan).

Both of these vowels involve the tip or blade of the tongue, as well as the body, and can therefore be called apical or apicalized vowels. One is found in the syllables that in Hanyu Pinyin are written shi, zhi, chi. The other is found in those written si, zi, ci. (Unlike these, the syllables written xi, ji, qi have an ordinary sort of [i].)

Cheung Kwan-hin has been advising me how best to write them in IPA, for a possible future revised edition of LPD. (The symbols [ʅ,ɿ], used by sinologists, are not recognized by the IPA.) After discussion, we have finally decided to transcribe them as you see here.

If I’ve got it right, the vowel of shi, zhi, chi is basically a retroflex approximant. There is also raising of the central part of the tongue, giving an [ɨ] colour. So we write it with the symbol for a retroflex approximant, [ɻ], with a following superscript [ɨ] to show the close central unrounded coloration.

The vowel of si, zi, ci basically has an alveolar approximant articulation, so [z] with, subscript, the lowering diacritic [˕], and a following superscript [ɯ] to show the accompanying close back unrounded coloration.

In transcription one often has to choose between detailed phonetic specification and transcriptional convenience. For the latter the Pinyin with its implied phoneme /i/ is adequate. But for my purposes I felt it would be better to err on the side of phonetic explicitness.

Now there was this empress called 慈禧 Cíxī... I wonder if she had a shīzi dog.

Empress Dowager Cixi


Wednesday 14 March 2007

URLs: ASCII only!

I’m ashamed to say that I got caught out by a restriction on WWW filenames that I had not realized existed.

What happened was this. I received from an undergraduate in Iceland an electronic copy of her BA dissertation, the topic of which was Estuary English. She wondered whether I would like to put it on the Estuary English webpage that I look after.

I thought it was quite a good essay, particularly because it includes a phonetically transcribed interview with the footballer David Beckham. So I uploaded it onto the server for the Estuary site and placed a link to it on the EE homepage.

The student’s name is Guðlaug Hilmarsdóttir. The filename she had given her dissertation was very long, and I had decided — as is my custom — to replace it with something shorter. Usually I just take the author’s name, so I named the file guðlaug.pdf. I checked that the link to it worked properly, and thought no more about it.

But then I got a complaint from the student’s supervisor at the University of Iceland, Pétur Knútson, that people were getting ‘404 Error file not found’ when they clicked on the link.

I checked. Everything still worked perfectly well on my browser (Firefox). But then, just to be sure, I tried it on Internet Explorer. Ha! File not found. I tried it on the Opera browser. Same thing. And I realized that I had broken the rules by using the Icelandic letter eth (ð) in the filename. The only letters allowed in URLs are those of the basic ASCII set: no diacritics, no special letters. Speakers of virtually every language other than English are discriminated against.

Firefox is admirably error-tolerant, as I have mentioned before. In this case it tolerates a non-ASCII letter in the URL. The other browsers don’t. Update: Thanks to Kilian Hekhuis for explaining to me that it’s not quite as simple as that. He concludes, “This has nothing to do with FF being error-tolerant and tolerating non-ASCII letters, or IE/Opera being not, but the way the browser converts a non-ASCII letter (UTF-8 or not) and whether the server understands UTF-8.” You can read more.

Now I know why the Saarbrücken city website is called, not www.saarbrü

I have given Guðlaug’s dissertation a new, all-ASCII, filename.

Seal of he University of Iceland

þ ð

Tuesday 13 March 2007

Txtin in dialect

Here is an anonymous text-message-style posting from the gay sex contacts section of a message board on the web. What do features of the spelling tell you about the geographical or social origins of the writer? (Turnmills is the name of a London night club.)

hi im 19 goin out 2 turnmills 2nit with my str8 curious lookin for a bit of fun 2nit.will b wid m8s so has 2 b discreet so if u r goin send me a message and a face pic den we can exchange numbers nd meeet in turnmills or something.no1 over 24 nd must have a face pic

First, some spellings that don’t tell you anything particular.

txt standard equivalent
im I’m
2 to
2nit tonight
m8s mates
str8 straight (= heterosexual)
b be
u r you are
pic picture
nd and (weak form!)
no1 no one

And now some that do.

txt standard equivalent notes
wid with rather puzzling. This spelling reflects a pronunciation that has both a voiced consonant at the end of with (which rules out the Scots and the Americans) and TH Stopping (typical of the Irish and the West Indians). Cockney/Estuary would tend to have TH Fronting in this word, for which the corresponding spelling would be wiv. So this looks like a young Englishman affecting a Caribbean pronunciation (see blog, 20 Feb). Even if he’s black, it could still count as an affectation, i.e. he’s putting on an act.
den then more TH Stopping, but this time in initial position: common in London
neva never non-rhotic, therefore not Scots, Irish, or American (unless Black, New Englander, or Southern)
wana want to not RP of my generation! I can reduce want to to /ˈwɒntə/ but not to /ˈwɒnə/. However I think someone of a similar background to me but fifty years younger could lose the /t/. So can millions of others.

In place of standard -ing (i.e. /ɪŋ/), the writer uses -in (i.e. /ɪn/), which has strongly working-class connotations, as indeed does TH Stopping. But this doesn’t necessarily mean that he is working-class. It might also readily be used in a txt by a middle-class person who doesn’t want to seem pretentious or who seeks street cred. And the fact that the writer spells discreet correctly, and that there are no actual spelling mistakes, suggests that he’s reasonably well educated. (On the other hand, the fact that he uses no spaces after his full stops and doesn’t punctuate properly suggests he hasn’t been taught how to type or set out his work.)

I think he’s a student, probably English, from London.

I could be quite wrong. We’ll never know.


Monday 12 March 2007

Phonetic symbols in email

A correspondent who wishes to be known as SAF writes “When I type an email in my native language using IPA characters, the recipient of my email tells me the words were all encrypted. How do I type IPA on email so they do not appear encrypted?’

Although the writer did not say where he was from, the fact that his native language uses ‘IPA characters’ suggests that in all probability he is from West Africa, since that is where we most typically find the use in orthography of such letters as ɛ, ɔ, ŋ. And so it proved to be: my correspondent was from Cameroon.

In my reply (with some help from Mark Huckvale) I said “Both you and your correspondent must use a Unicode-compliant email programme. A good one is Thunderbird, available free. Then you must choose to send the message in HTML/Unicode, not as plain text. You must choose a font that contains the characters you want. Exactly the same rules apply to sending messages in a language that does not use the basic Latin script, such as Arabic, Hindi, or Chinese.”

Here you can see a partial screenshot of a sample message (‘welcome to Saarbrücken’ in German) being composed. There are phonetic symbols not only in the body of the text but also in the subject line. I used the UCL Unicode Phonetic Keyboard to input them. (The red underlining shows that the spellchecker does not know what to make of them.)




Alongside you see a bidirectional mixed text from my inbox, with the same Arabic word written in three ways. First is a Latin-letter version, which includes a diacritic. Then comes the Arabic-script equivalent. Although the letters were input in the order qāf-’alif-tā’, the software automatically makes them read right-to-left (the circle with the two dots above is the Arabic q, qāf, ق). The software also knows that the ’alif has to be rendered as a ligature with the qāf, not written separately. Lastly comes IPA. The software automatically switches the direction of the letters back to left-to-right.

Everything works perfectly. Thunderbird is a delight.

Friday 9 March 2007

Transcribing English diphthongs

Michael Covarrubias writes: “You transcribe the probable American pronunciation [of Chagos] with // in the final syllable. Is there a reason for your preference of a two vowel diphthong in place of /ow/ or the labialized // for less of a falling nucleus? I note that your convention is to use the two vowel transcription. Is there an analysis that relies on (and of course supports) this system?”

Here is an expanded version of my answer:
“I use the two-vowel-symbol notation for English diphthongs because the diphthongs behave as single indivisible units. A vowel-plus-glide notation would imply the identification of the first part of the diphthong with one of the simple (non-diphthong) vowels. If the nucleus of English goat is taken as /Vw/, what is the /V/? It could be (BrE) /ɜː/, the vowel of bird, or /ɔː/ the vowel of thought, or /ɒ/ the vowel of lot, or /ʌ/ the vowel of strut, or /ə/ a schwa. In this context there is no phonemic contrast between these vowels, and no strong reason to choose one solution over the other. By treating the diphthong as indivisible we avoid facing this false choice. (This concerns its phonological analysis. We still have to choose a notation for it in transcription.)

A similar problem arises in deciding the first element of //, of //, and of other diphthongs. Furthermore, in English the distinction between diphthong and long vowel is not always clear, as for example in the vowel of square, which some write // or /ɛə/, others as /ɛː/.

The notation with a raised w is not a possible IPA notation, since [ʷ] (or [w]) denotes labialization, and you can’t labialize a labial or round an already rounded vowel. And of course the [w] element here is meant by those who use it to indicate something sequential, not a simultaneous secondary articulation (as in IPA).

In other languages, things are different. I have no problem in taking Polish diphthongs as /Vj/ etc., or Spanish diphthongs as /VV/. These are languages in which there is no difficulty in identifying the Vs of such sequences with the simple vowels of the language.”

A goat, uncompressed and indivisible

Thursday 8 March 2007

The Chagos archipelago

The Chagos islanders are in the news because of the threat by the president of Mauritius to leave the Commonwealth in protest at the UK’s ‘barbarous’ treatment of the islanders. They were forced from their homes at gunpoint in the 1960s, as an entire British colony was handed to the US so that a monitoring station could be built in the Indian Ocean. The exiles now live in the slums of Mauritius where they belong neither socially nor economically. They’re campaigning to return to the Chagos islands.

On the Today programme on BBC Radio 4 yesterday two people discussed the issue. One of them pronounced the name of the archipelago /ˈʃɑːɡɒs/, the other /ˈʃæɡɒs/.

This suggests that neither of them had consulted the BBC Pronunciation Unit. Possibly, as experts on the subject, they felt they didn’t need to. Because what the OBGP recommends (‘reflecting the experience of the BBC Pronunciation Unit’) is /ˈtʃɑːɡəʊs/.

In view of the fact that the islands were discovered by Vasco da Gama and presumably named in Portuguese, we should expect the spelling ch to correspond to /ʃ/ rather than // (which is what it would imply in Spanish).

The adjacent islands, Mauritius and the Seychelles, are French-speaking, which would again favour ch = /ʃ/. Until 1965 the Chagos islands were administered as part of Mauritius.

I have an awful feeling that the BBC Pronunciation Unit must have consulted the Americans, current masters of the islands, rather than the islanders themselves, Mauritians, or British colonial administrators. (Update: Catherine Sangster tells me they simply took it from the Duden Aussprachewörterbuch.) Because I can quite see that Americans (who usually know some Spanish) would in all likelihood pronounce the name /ˈtʃɑːɡoʊs/.

The on-line Merriam-Webster, however, gives /ˈtʃɑːɡəs/ with a weak vowel in the final syllable. But it still has that Spanish-style //.

Click to enlarge

coral islet
Photo: Mark D. Spalding

Wednesday 7 March 2007

Abject haplologies

We’ve all heard library pronounced as /ˈlaɪbri/, or probably as /ˈprɒbli/. Some of us know that these are examples of haplology, defined as the omission of a repeated sound or syllable.

I remember at school, when I was in the classical sixth, the teacher wrote φιλόγος on the board instead of φιλόλογος. “Please sir,” I said, “you’ve committed a haplology.” (I was a real smart-arse as a teenager.) The teacher replied sarcastically, “I have made a simple mistake”.

If I’d been even smarter, I’d have known that the term for a written as opposed to a spoken example is a haplography.

Anyhow, I heard a new phonetic haplology the other day. It was from a medical consultant giving a talk on blood pressure. She repeatedly referred to the instrument used to measure blood pressure, the sphygmomanometer, as a /ˌsfɪɡməˈnɒmɪtə/.

A sphygmomanometer

Tuesday 6 March 2007

GOAT compression

OK, this isn’t about the maltreatment of smallholding livestock. No, it’s about the possible weakening and subsequent compression of the vowel we use in the GOAT lexical set (see my Accents of English, p. 146-147).

In RP and similar accents this is the vowel whose strong version we transcribe /əʊ/.

It used to have a weak version that Daniel Jones transcribed [o] in the first twelve editions of EPD, as in November /noˈvembə/. When Gimson took over the editorship, he replaced this (quite rightly) with plain schwa, thus /nəˈvembə/, and nowadays we reckon that /əʊ/ indeed normally weakens to /ə/: microcosm, biosphere, proˈtest, Yellowstone, possibly also window-sill, tomorrow morning.

But this doesn’t apply prevocalically. How can we weaken the unstressed first vowel of, say, oasis? The answer is, we can’t.

However, there are one or two words in which the GOAT vowel appears to weaken prevocalically to /u/. A good example is tomorrow evening. Weakening to /u/ produces a candidate for compression, since this vowel can be compressed to /w/ before a weak vowel, so losing its syllabicity (blog, 16-17 January). One word in which this can happen is following, where /ˈfɒləʊɪŋ, ˈfɒluɪŋ/ can be compressed to /ˈfɒlwɪŋ/. This also applies in the archaic (thou) followest /ˈfɒlwɪst/, as proved by verse scansion. As usual, hymns show this well. This one was written as late as 1882.

O Love that wilt not let me go,
I rest my weary soul in thee;
I give thee back the life I owe,
That in thine ocean depths its flow
May richer, fuller be.

O light that followest all my way,
I yield my flickering torch to thee;
My heart restores its borrowed ray,
That in thy sunshine’s blaze its day
May brighter, fairer be...

A compressed goat

Monday 5 March 2007

Place names

Driving to Gatwick Airport a few days ago to meet an arriving passenger, I passed through the village of Burgh Heath. As on previous occasions when I have travelled that route, I wondered idly how it’s pronounced. Is the first word /bɜː/ or /ˈbʌrə/?

When I got back home I looked it up in the BBC Pronouncing Dictionary of British Names (G. Pointon, 1990), which says it can be either. Just not /bɜːɡ/.

I further learnt that Burgh in Norfolk is /ˈbʌrə/, but Burgh in adjoining Suffolk is /bɜːɡ/. Things are different in the north of England: Burgh-by-Sands in Cumbria is metathesized to /brʌf/, which must mean that for many locals it’s more like [brʊf].

It’s worse than -ough.

Tomorrow I have to go to Birmingham. To reach my destination the map says I have to look for the road leading to Alcester. Er... what was that? I checked with my brother, who lives not too far away, and he says it’s /ˈɔːlstə/. Then I looked in LPD and found that I agree.

And there’s no call for Americans to feel superior to the wacky British. In the States you never know what will happen with Spanish names. I remember passing through Salida, Colorado. That’s the Spanish for ‘exit’, and it was at the mouth of a canyon, so I thought that in English it would be /səˈliːdə/. But the local radio station announcers, who should know, pronounced it /səˈlaɪdə/.

Even English-derived names can be surprising. I remember driving through Placerville, California, and discovering to my surprise that it was not /ˈpleɪsɚvɪl/ but /ˈplæsɚvɪl/.

Friday 2 March 2007

Train times

“The 'train at platform \/three | ”, says the automated train announcement at the railway station, “is the four/teen | forty-/two | to \Woking.” (In Britain we do use the 24-hour clock for travel announcements.)

What’s wrong with that?

Well, that’s not how a live native speaker of English would actually say those words.

The announcement was obviously based on recordings of individual words or phrases spoken by a human being but concatenated on the fly by a computer.

The announcer or actor being recorded must have gone through all the numbers that were needed, probably using a rising tone until he got to the end of the list.

/One, | /two, | /three, | ... e/leven, | /twelve, | ...

Being a trained speaker, he must have drilled himself to say each numeral as if it were in isolation. Because he seems to have continued

thir/teen, | four/teen, | fif/teen...

where a person counting would ordinarily use contrastive stress among the teen numerals:

/thirteen, | /fourteen, | /fifteen, ...

The result is that the computer putting the message together could draw on recorded instances of



forty /two

but could not apply ‘stress shift’ (perhaps better termed accent shift), in the way that native speakers do when a teen numeral (or any other double-stressed item) is followed by another accented word:

'fourteen forty-'two

The irony is that if he’d let himself follow his instinct and had recorded

/thirteen, | /fourteen, | /fifteen, ...

then everything would have come out fine.

The 'train at platform \/three | is the /fourteen | forty-/two | to \Woking.

But there’d still have been a problem, because the version of each number above eleven suitable for the hours would not have been suitable for the minutes:

(hours) /thirteen, | /fourteen, | ... /twenty-one...
(minutes) thir/teen, | four/teen, | ... twenty-/one...

The only satisfactory solution would be to have made two recordings of each item, and to have built in to the software the equivalent of the stress shift rule. But unfortunately speech engineers tend not to consult phoneticians.

1 March 2007


We learn something new every day. My cardiologist has recently started me on a new drug called amiodarone. He pronounces it /ˌæmiˈɒdərəʊn/, and that’s what my GP says, too.

But today I went for a blood test to check that we have correctly estimated the change in the dose of Warfarin, which I am also taking, to allow for the interaction of the two drugs. The phlebotomist calls my new drug /ˌæmiˈəʊdərəʊn/.

So should it be /ɒ/ or /əʊ/? A short o or a long one? Who cares? The spelling’s the same, which is what matters for the pharmacist who has to dispense it.

As with so many learned, scientific or technical words, the spelling is fixed while the pronunciation fluctuates. (Having boned up on heart disease, I am almost inclined to say it fibrillates.) That’s because instead of hearing other speakers and imitating what they say, we often create a pronunciation for ourselves on the basis of the spelling, using the reading rules of English, which are notorious for their uncertainty.

See yesterday’s comment on cervical.

Archived from previous months:

Current blog
my home page