Computer-coding the IPA: a proposed extension of SAMPA

Summary version, not requiring an IPA character set. (Full version)

John Wells
Department of Phonetics and Linguistics, University College London

What follows is a proposed keyboard-compatible coding for the entire set of IPA symbols. It covers everything on the 1993 IPA Chart, including diacritics and tone marks, and is put forward as a proposed standard way to transmit IPA-transcribed material by e-mail and for similar purposes. It is an extension of the SAMPA standard, with which colleagues may be familiar. The most frequently used symbols are mapped onto single keystrokes in the ASCII range 33..126. Less frequently used symbols are mapped onto a single keystroke plus the backslash, \. Diacritics (other than those already catered for in SAMPA) are mapped onto a keystroke with a preceding underscore, _. Thus for example the voiced velar fricative (gamma) becomes G, the voiced uvular plosive G\, and the velarization diacritic _G (so that for example velarized d appears as d_G). Note that upper-case must be distinguished from lower-case, but that there is no need to separate successive symbols by spaces: X-SAMPA symbol strings are uniquely parsable.

These proposals are fully set out with a reasoned explanation, and all the correct IPA symbols, in my 7000-word draft article "Computer-coding the IPA: a proposed extension of SAMPA". If you can't read it here (using Acrobat Reader standalone or your browser's plug-in), it is also available as a Postscript file and can be downloaded by anonymous ftp from , internet address, in directory /pub/sam, file name Log in with username ftp, password ftp. The file should be fetched in ascii mode and sent to a postscript printer.

Using these codes, you can for example include IPA-phonetic transcriptions of all kinds in e-mail messages or other forms of electronic exchange. Wherever an IPA character set is not available, X-SAMPA will provide a workable alternative. Any reactions from colleagues to these proposals will be very welcome. Feel free to pass this file on to anyone interested.

This summary is in the form of two columns. In the first is a phonetic label (since this is a simple ASCII file, I don't show IPA symbols); in the second is the proposed coding, which we can refer to as X-SAMPA (extended SAMPA). The listing follows the order of the Chart, and should be read in conjunction with it.

It is assumed that the reader is familiar with terms used for the classification of sound-types and with the IPA Chart and the symbols shown on it.

Note that IPA symbols belonging to the ordinary Roman lower-case alphabet (e.g. u, x) remain the same. They are not listed below.

					X-SAMPA			IPA Unicode (hex, dec)

Consonants (pulmonic)

retroflex plosive, voiceless		t` (` = ASCII 096)	0288, 648
retroflex plosive, voiced		d`			0256, 598
labiodental nasal			F 			0271, 625
retroflex nasal			        n` 			0273, 627
palatal nasal				J 			0272, 626
velar nasal				N 			014B, 331
uvular nasal				N\			0274, 628

bilabial trill				B\ 			0299, 665
uvular trill				R\ 			0280, 640
alveolar tap				4			027E, 638
retroflex flap				r` 			027D, 637
bilabial fricative, voiceless		p\ 			0278, 632
bilabial fricative, voiced		B 			03B2, 946
dental fricative, voiceless		T 			03B8, 952
dental fricative, voiced		D 			00F0, 240
postalveolar fricative, voiceless	S			0283, 643
postalveolar fricative, voiced	        Z 			0292, 658
retroflex fricative, voiceless	       	s` 			0282, 642
retroflex fricative, voiced		z` 			0290, 656
palatal fricative, voiceless		C 			00E7, 231
palatal fricative, voiced		j\ 			029D, 669
velar fricative, voiced	        	G 			0263, 611
uvular fricative, voiceless		X			03C7, 967
uvular fricative, voiced		R 			0281, 641
pharyngeal fricative, voiceless	    	X\ 			0127, 295
pharyngeal fricative, voiced	       	?\ 			0295, 661
glottal fricative, voiced		h\			0266, 614

alveolar lateral fricative, vl.	       	K 
alveolar lateral fricative, vd.	        K\

labiodental approximant			P (or v\) 
alveolar approximant		       	r\ 
retroflex approximant		        r\` 
velar approximant			M\

retroflex lateral approximant	       	l` 
palatal lateral approximant	        L 
velar lateral approximant		L\


bilabial				O\	(O = capital letter) 
dental					|\
(post)alveolar			       	!\ 
palatoalveolar		        	=\ 
alveolar lateral			|\|\

Ejectives, implosives

ejective				_>	e.g. ejective p		p_>
implosive				_<	e.g. implosive b	b_<


close back unrounded			M
close central unrounded 		1 
close central rounded		      	} 
lax i					I 
lax y					Y 
lax u					U

close-mid front rounded			2 
close-mid central unrounded	       	@\ 
close-mid central rounded		8 
close-mid back unrounded		7

schwa					@

open-mid front unrounded		E 
open-mid front rounded			9
open-mid central unrounded	       	3 
open-mid central rounded		3\ 
open-mid back unrounded			V 
open-mid back rounded			O

ash (ae digraph)			{ 
open schwa (turned a)	        	6

open front rounded			& 
open back unrounded	        	A 
open back rounded			Q

Other symbols

voiceless labial-velar fricative	W 
voiced labial-palatal approx.	       	H 
voiceless epiglottal fricative		H\ 
voiced epiglottal fricative		<\ 
epiglottal plosive			>\

alveolo-palatal fricative, vl. 	        s\ 
alveolo-palatal fricative, voiced	z\ 
alveolar lateral flap			l\ 
simultaneous S and x		        x\ 
tie bar					_


primary stress			        " 
secondary stress			% 
long					: 
half-long				:\ 
extra-short				_X 
linking mark				-\

Tones and word accents

level extra high			_T 
level high				_H
level mid				_M 
level low				_L 
level extra low			        _B
downstep				! 
upstep			        	^	(caret, circumflex)

contour, rising			        _R 
contour, falling			_F 
contour, high rising			_H_T 
contour, low rising			_B_L 

contour, rising-falling		        _R_F 
(NB Instead of being written as diacritics with _, all prosodic marks can alternatively be placed in a separate tier, set off by < >, as recommended for the next two symbols.)
global rise				<R> 
global fall				<F>


voiceless				_0	(0 = figure), e.g. n_0
voiced					_v 
aspirated				_h 
more rounded		        	_O	(O = letter) 
less rounded				_c 
advanced				_+
retracted				_-
centralized				_" 
syllabic				=	(or _=) e.g. n= (or n_=) 
non-syllabic				_^ 
rhoticity				`

breathy voiced			        _t 
creaky voiced		        	_k
linguolabial				_N 
labialized				_w 
palatalized				'	(or _j) e.g. t' (or t_j) 
velarized				_G 
pharyngealized			       	_?\

dental					_d 
apical					_a 
laminal				        	_m
nasalized				~	(or _~) e.g. A~ (or A_~) 
nasal release				_n
lateral release			        _l 
no audible release			_}

velarized or pharyngealized	       	_e 
velarized l, alternatively		5 
raised					_r 
lowered					_o 
advanced tongue root		       	_A 
retracted tongue root		        _q

Go back or onwards to SAMPA home page, UCL Phonetics and Linguistics home page, University College London home page.

For queries please contact John Wells by e-mail or at

Department of Phonetics and Linguistics, 
University College London, 
Gower Street, 
London WC1E 6BT.

Tel. +44 171 380 7175

last revised 2000 May 03 (Unicode values added)