A Tutorial Introduction to SALT

8. Speech Recognition Grammars

Control of how a SALT <LISTEN> object recognises speech is performed using a recognition grammar written in a special formalism called the Speech Recognition Grammar Specification (SRGS). SRGS is an XML-based markup language that can be used to specify the allowed word sequences that can be recognised. It also allows the use word occurrence probabilities to affect the likelihood of different recognition outcomes. We give a basic introduction to SRGS below, but full details can be found in the W3C Speech Recognition Grammar Specification (SRGS) or in the SASDK Documentation.

8.1 SRGS Elements

Element Description
<grammar>..</grammar> Encloses a grammar marked up in SRGS format.
<rule>..</rule> Marks a single grammar rule. One rule must be the top-level or 'root' rule.
<item>..</item> Marks a single recognised token.
<one-of>..</one-of> Marks a set of token alternatives.
<ruleref /> References a rule stored in an external resource.
<tag>..</tag> Provides means to assign semantics to rule.

Each of these is discussed in more detail below.

<grammar> element

The grammar element is the highest level container and indicates the name of the top-level grammar rule.

Attribute Description
root=rootrule Specifies the name of the top-level grammar rule.
xmlns=namespace Specifies the XML namespace for W3C speech recognition grammar. The XML namespace is http://www.w3.org/2001/06/grammar.

<rule> element

Each grammar must be made up from one or more rules which describe sequences or alternatives of recognised elements. One rule is identified as the root rule.

Attribute Description
id=name Specifies a unique name for the rule.
scope=private|public Specifies whether this rule is private or can be referred to by other grammars. Default: private.

<item> element

The item element marks up a recognisable element. The element may contain a single word or a phrase. Sequences of elements must be recognised in sequence, unless they are enclosed in a "one-of" element. Attributes allow you to specify if the item can be repeated.

Attribute Description
repeat=rep Controls how many times the item may be repeated. The repetition is specified as "min-max", i.e. "1-2" means once or twice, "0-1" means optional, and "4" means exactly four times.
repeat−prob=probability Specifies the likelihood of repetitions.
weight=probability Multiplicative weight used to bias recognition alternatives. Default: 1.0.

<one-of> element

The one-of element marks up alternative selections. When a number of item elements occur within a one-of element, only one can be chosen within a single recognised input. See examples below.

<ruleref> element

The ruleref element allows rules to be included within one another to form a hierarchy. This allows rules to be re-used within one phrase, or rules for common components like numbers and dates to be re-used. A special application of the ruleref element is to match extraneous input.

Attribute Description
uri=URI Specifies location of external resource.
special=NULL|VOID|GARBAGE Specifies a special recognition rule. NULL=always matches, VOID=never matches, GARBAGE=ignores stretch of speech.

<tag> element

The tag element can be used to associate certain actions with a choice of a recognised element in a grammar rule. Typically this involves setting the value of some text within the XML structured recognition results. This allows you to "parse" the input into meaningful units during the recognition rather than having a separate set of rules for parsing the recognised word string.

The typical contents of a tag element look like this:

    <tag> $.property={}; $.property._value="desired value"; </tag>
    

In this example '$' refers to the root of the XML recognition structure. The statement $.property={}; adds a new child element called 'property', while $.property._value="desired value" stored 'desired value' into the new child element. You can now access this stored text from your Javascript once the recognition is complete.

Here is an example:

    <rule id="WeekdaySelection">
      <one-of>
        <item>Monday<tag>$.daynum={}; $.daynum._value="1";</tag></item>
        <item>Wednesday<tag>$.daynum={}; $.daynum._value="3";</tag></item>
        <item>Friday<tag>$.daynum={}; $.daynum._value="5";</tag></item>
     </one-of>
    </rule>
    

The recognised output from the LISTEN object might then look like:

    <SML confidence="0.850" text="Monday" utteranceConfidence="0.850">
      <daynum confidence="0.850">1</daynum>
    </SML>
    

8.2 Grammar Examples

This is an example of the use of weighted alternatives:

    <grammar root="PizzaSize" xml:lang="en-US" version="1.0" xmlns="http://www.w3.org/2001/06/grammar">
      <rule id="PizzaSize" scope="public">
        a
          <one-of>
            <item weight=".5">small</item>
            <item>medium</item>
            <item weight="2">large</item>
          </one-of>
        pizza
      </rule>
    </grammar>
    

This is an example of a rule being used twice within one recognised phrase using the ruleref element.

    <grammar root="buyShirt" xml:lang="en-US">
        <rule id="buyShirt" scope="public">
            <item>
               Get me a <ruleref uri="#ruleColors" />
               shirt and a <ruleref uri="#ruleColors"/>
               tie</item>
        </rule>
    
        <rule id="ruleColors" scope="public">
             <one-of>
                <item>red</item>
                <item>white</item>
                <item>green</item>
            </one-of>
        </rule>
    </grammar>
    

8.3 Semantic Mark-up Demonstration

In this demonstration we show how to display and process the recognised XML formatted recognition result. We use an external grammar which associates colour names with hexadecimal values that can be used to control the background colour of a table cell.

This is the grammar, stored in a file colours.grxml:

    <grammar version="1.0" xml:lang="en-US"
     xmlns="http://www.w3.org/2001/06/grammar" root="colours"
     tag-format="semantics-ms/1.0">
     <rule id="colours" scope="public">
      <one-of>
       <item>Black<tag> $.hex={}; $.hex._value="#000000"; </tag></item>
       <item>Blue<tag> $.hex={}; $.hex._value="#0000FF"; </tag></item>
       <item>Brown<tag> $.hex={}; $.hex._value="#A52A2A"; </tag></item>
       <item>Gray<tag> $.hex={}; $.hex._value="#808080"; </tag></item>
       <item>Green<tag> $.hex={}; $.hex._value="#008000"; </tag></item>
       <item>Indigo <tag> $.hex={}; $.hex._value="#4B0082"; </tag></item>
       <item>Orange<tag> $.hex={}; $.hex._value="#FFA500"; </tag></item>
       <item>Pink<tag> $.hex={}; $.hex._value="#FFC0CB"; </tag></item>
       <item>Purple<tag> $.hex={}; $.hex._value="#800080"; </tag></item>
       <item>Red<tag> $.hex={}; $.hex._value="#FF0000"; </tag></item>
       <item>Violet<tag> $.hex={}; $.hex._value="#EE82EE"; </tag></item>
       <item>White<tag> $.hex={}; $.hex._value="#FFFFFF"; </tag></item>
       <item>Yellow<tag> $.hex={}; $.hex._value="#FFFF00"; </tag></item>
      </one-of>
     </rule>
    </grammar>
    

A larger version of the grammar, with more colours is also available.

This is the application code. Try out on your computer: Normal version, Debug version.

    <html xmlns:salt="http://www.saltforum.org/2002/SALT">
    <object id="speech-add-in" CLASSID="clsid:33cbfc53-a7de-491a-90f3-0e782a7e347a">
    </object>
    <?import namespace="salt" implementation="#speech-add-in"/>
    <!-- SALT: Recognise HTML Colour Names -->
    <salt:listen id="recColour" onreco="doColour()">
     <salt:grammar src="colours.grxml" />
     <salt:bind targetelement="txtDebug" value="/" />
    </salt:listen>
    <body>
    <h1><center>SALT: Recognise HTML Colours</center></h1>
    <p><center><table border=1 bgcolor="white">
     <tr>
      <td colspan=2>
       <textarea id="txtDebug" rows=5 cols=50>
        SML format recognition results appear here.
       </textarea>
      </td>
     </tr>
     <tr>
      <td align=center><input name="txtColour" type="text" width="10" /></td>
      <td width=150 id="txtCell">&nbsp;&nbsp;&nbsp;&nbsp;</td>
     </tr>
    </table></center>
    <p><center><input type="button" value="Click to Speak" onclick="recColour.Start()">
    <p>Click button, wait for level meter, speak colour name.
    </body>
    <script>
    function doColour()
    {
        // set text field to colour name
        var pRecog=document.getElementById("recColour");
        var pField=document.getElementById("txtColour");
        pField.value=pRecog.text;
    
        // but set cell colour to hex value
        var pCell=document.getElementById("txtCell");
        var pNode = pRecog.recoresult.selectSingleNode("//hex");
        if (pNode != null) pCell.style.background=pNode.text;
    }
    </script>
    </html>
    

Particular aspects to note are:

  • Use of tag-format="semantics-ms/1.0" in the salt:grammar tag to indicate that we are using semantic mark-up.
  • Use of <tag> mark-up to set the contents of a node called hex with the hexadecimal code for the colour value.
  • Use of a textarea called "txtDebug" which accepts the SML output through a salt:bind element.
  • The doColour() function fired by the onreco= attribute of the salt:listen object which accesses both its text and its recoresult properties.
  • The use of the XML Document Object Model function selectSingleNode() to gain access to the hex value stored in the recognition result object.

This is how it should look:

Next: Simple dialogue application.

A Tutorial Introduction to SALT © 2005 Mark Huckvale, Phonetics and Linguistics, University College London

University College London - Gower Street - London - WC1E 6BT - Telephone: +44 (0)20 7679 2000 - Copyright © 1999-2013 UCL