A Tutorial Introduction to SALT

7. The <LISTEN> element

A SALT <listen> element creates an object that can recognise speech spoken by the user. We can script control of the object to specify the recognition grammar and when it should start. The object returns a recognition structure which can contain detailed acoustic scores and semantic mark-up as well as a word string. The semantic mark-up is discussed later in the tutorial. Here we give a basic introduction to the <listen> element below, but for a more complete description refer to the SALT specification or the SASDK documentation.

7.1 XML structure

The essential XML structure of a <prompt> object is as follows:

    <listen>
       <grammar attributes />      // Describes the recognition grammar
       <bind attributes  />        // Passes recognised text to another object
       <record attributes />       // Configures recording of audio
       <param attributes />        // Speech system configuration
       <audiometer attributes />   // Configures level meter display
    </listen>
    

The <audiometer> element is specific to the Microsoft Speech add-in for Internet Explorer.

7.2 Attributes

The attributes of the <listen> element are as follows:

Attribute Description
id=name Provides a unique object name for the listen object.
initialtimeout=dur (Optional) Specifies the maximum allowable time in milliseconds between the beginning of recognition and the detection of speech.
babbletimeout=dur (Optional) Specifies the maximum allowable time in milliseconds for the duration of an utterance.
maxtimeout=dur (Optional) Specifies the maximum allowable time in milliseconds between the beginning of recognition and when recognition results are returned.
endsilence=dur (Optional) Specifies the duration of silence in milliseconds at the end of an utterance after which the speaker has assumed to have finished speaking.
reject=thresh (Optional) Sets the lowest acceptable recognition confidence score. Recognitions with confidence scores below this value are rejected.
xml:lang=en-US (Optional) Sets input language.
mode=automatic|multiple|single (Optional) Sets recognition mode. Default: automatic.
onreco=func() (Optional) Causes script function func() to be called when the recognizer has a recognition result available.
onsilence=func() (Optional) Causes script function func() to be called when no speech is detected by the recognition platform before the duration of time specified in the initialtimeout attribute.
onspeechdetected=func() (Optional) Causes script function func() to be called when the recognition engine detects speech.
onnoreco=func() (Optional) Causes script function func() to be called when the recognition engine is unable to return valid recognition results - when confidence is below threshold, for example.
onerror=func() (Optional) Causes script function func() to be called when the speech recognition platform is unable to process a recognition request - when there is an error in the grammar, for example.

7.3 Properties

A listen object has the following interesting properties:

Property Description
recoresult Contains an XML string containing the recognition results in Semantic Markup Language (SML) format.
text Contains a plain text string of the recognised words.
status Contains a status code describing result of last operation. A code of 0 means success, other status codes may be useful to debug error conditions.
recordlocation Contains the location of the recorded audio file. Used only when recording is enabled.
recordduration Contains the approximate length of the recorded file in milliseconds. Used only when recording is enabled.
recordsize Contains the size of the recorded file in bytes. Used only when recording is enabled.

7.4 Methods

A listen object has the following methods that may be called from a script function.

Method Description
Start() Starts recognition.
Stop() Stops recognition and forces a result to be returned.
Cancel() Stops recognition without returning a result.

7.5 The listen <grammar> element

The <grammar> sub-element can be used to enclose a grammar specification in W3C Speech Recognition Grammar Specification (SRGS) format. It also has the following attributes:

Attribute Description
name=name (Optional) Uniquely identifies the grammar element.
src=URI (Optional) Specifies a grammar stored in another file.
xmlns=namespace (Optional) Specifies the namespace for the inline grammar, for example "http://www.w3.org/2001/06/grammar".

The speech recognition grammar format is described in the next section.

To store grammars in an external file, use SALT mark-up like this:

    <salt:grammar src="colours.grxml" />
    

Then the file "colours.grxml" might contain:

    <grammar version="1.0" xml:lang="en-US"
     xmlns="http://www.w3.org/2001/06/grammar" root="colours">
     <rule id="colours" scope="public">
      <one-of>
       <item>black</item>
       <item>blue</item>
       <item>gray</item>
       <item>green</item>
       <item>purple</item>
       <item>red</item>
       <item>silver</item>
       <item>white</item>
       <item>yellow</item>
      </one-of>
     </rule>
    </grammar>
    

7.6 The listen <bind> element

The <bind> sub-element can be used to send recognition results to some other object in the web page. It has the following attributes:

Attribute Description
targetelement=object Identifies document object to which to send text.
targetattribute=property Specifies the name of the object property which is set to the recognition result. Default: "value".
targetmethod=method Specifies the name of the object method which is called with the recognition result.
value=component Specifies which part of the recognition results to send to the object. An argument of "/" sends the whole SML structure, an argument of "//" sends just the text of the word string.

7.7 The listen <record> element

The <record> sub-element specifies that an audio recording is to be made. If a grammar element contains a <record> element and a grammar, then the record element takes precedence and a recording rather than a recognition is performed. The record element has the following attributes:

Attribute Description
type=audioformat Specifies the audio format in which the recording is made. Default="audio/wav;codec=g711.
beep=true|false Specifies whether the start of recording is marked with an audible beep.

Example audio recording application

The following application records an audio message and replays it. Try out on your computer: Normal version, Debug version.

    <html xmlns:salt="http://www.saltforum.org/2002/SALT">
    <object id="speech-add-in" CLASSID="clsid:33cbfc53-a7de-491a-90f3-0e782a7e347a">
    </object>
    <?import namespace="salt" implementation="#speech-add-in"/>
    <!-- SALT: Record audio -->
    <salt:listen id="recAudio" onreco="doSaveAudio()" onnoreco="doSaveAudio()">
     <salt:record beep="true" type="audio/x-wav" />
    </salt:listen>
    <!-- SALT: Playback audio -->
    <salt:prompt id="playAudio">
     <salt:content id="contentAudio" href="" type="audio/x-wav">
    </salt:prompt>
    <body>
    <h1>SALT: Record and Echo Audio</h1>
    <p><input name="txtFilename" type="text" size=80 onclick="recAudio.Start()" />
    <p>Click in text field, wait for level meter, speak message.
    </body>
    <script>
    function doSaveAudio()
    {
        // set text field to name of temporary file
        var pField=document.getElementById("txtFilename");
        var pAudio=document.getElementById("recAudio");
        pField.value=pAudio.recordlocation;
    
        // set replay audio source to filename
        var pContent=document.getElementById("contentAudio");
        pContent.href=pAudio.recordlocation;
    
        // start replay
        var pPlay=document.getElementById("playAudio");
        pPlay.Start();
    }
    </script>
    </html>
    

Next: speech recognition grammars.

A Tutorial Introduction to SALT © 2005 Mark Huckvale, Phonetics and Linguistics, University College London

University College London - Gower Street - London - WC1E 6BT - Telephone: +44 (0)20 7679 2000 - Copyright © 1999-2013 UCL