A Tutorial Introduction to SALT

4. Simple Speech Recognition

4.1 Recognition from a fixed set of phrases

Our next example is a program that recognises the names of countries that make up the European Union. This is, of course, a very easy recognition task and the recogniser should not really make any mistakes.

    <html xmlns:salt="http://www.saltforum.org/2002/SALT">
    <object id="speech-add-in" CLASSID="clsid:33cbfc53-a7de-491a-90f3-0e782a7e347a">
    </object>
    <?import namespace="salt" implementation="#speech-add-in"/>
    <!-- SALT: Recognise European Union country -->
    <salt:listen id="RecogEU">
     <salt:grammar>
      <grammar version="1.0" xml:lang="en-US"
       xmlns="http://www.w3.org/2001/06/grammar" root="EUCountry">
       <rule id="EUCountry" scope="public">
        <one-of>
         <item>Austria</item><item>Belgium</item><item>Cyprus</item>
         <item>Czech Republic</item><item>Denmark</item>Estonia</item>
         <item>Finland</item><item>France</item><item>Germany</item>
         <item>Greece</item><item>Hungary</item><item>Ireland</item>
         <item>Italy</item><item>Latvia</item><item>Lithuania</item>
         <item>Luxembourg</item><item>Malta</item><item>Poland</item>
         <item>Portugal</item><item>Slovakia</item><item>Slovenia</item>
         <item>Spain</item><item>Sweden</item><item>The Netherlands</item>
         <item>United Kingdom</item>
        </one-of>
       </rule>
      </grammar>
     </salt:grammar>
     <salt:bind targetelement="txtCountry" value="//" />
    </salt:listen>
    <body>
    <h1>SALT: Recognise EU Country</h1>
    <input name="txtCountry" type="text" onclick="RecogEU.Start()" />
    <p>Click in text field, wait for level meter, speak country name.
    </body>
    </html>
    

Try this out on your computer: Normal version, Debug version.

Let's look at the element tags in this example in more detail

<salt:listen id="RecogEU">
This element sets up a recognition object called 'RecogEU' which we use to perform the recognition of the country names. Note that it is the 'Start()' method of this object that is called when you click in the input text field.
<salt:grammar>
This element introduces the recognition grammar, that is it defines what word sequences may be recognised by the RecogEU object.
<grammar version="1.0" xml:lang="en-US" xmlns="http://www.w3.org/2001/06/grammar" root="EUCountry">
The recommended format for SALT grammars is the Speech Recognition Grammar Format defined not by SALT, but by the World Wide Web Consortium (W3C). Here we open an XML description of the recognition grammar using the 'grammar' element, referring to the grammar namespace at http://www.w3.org/2001/06/grammar. Importantly, we also specify here that the language to be recognised will be US English with xml:lang="en-US". Since a recognition grammar may be made up of many rules, this tag also specifies the 'root' of the grammar (i.e. which rule describes the topmost non-terminal node). The attribute root="EUCountry" defines the root of the grammar to be the rule that has the id 'EUCountry'.
<rule id="EUCountry" scope="public">
This element specifies one grammar rule according to the W3C standard. The name of this rule is 'EUCountry'. We also make the rule 'public' although it is not required in this application. Public rules can be referred to by other grammars, that is we could make our EUCountry rule available to everyone on the web.
<one-of>
Our grammar is made up from a single rule, and our rule is made up of a single set of phrase alternatives demarcated by the 'one-of' element.
<item>
Each phrase to be recognised is enclosed within these 'item' tags.
<salt:bind targetelement="txtCountry" value="//" />
The SALT 'bind' tag provides a simple means to return the recognition result from the RecogEU object. Here all we do is connect the recognition result to the document element called 'txtCountry' which is the name we have given to an input text field. The 'value' attribute specifies what aspect of the recognition is to be returned. The results of recognition are actually put into a structure that may contain recognition alternatives and confidence scores. The argument "//" is shorthand for the text of the most likely recognised element.

Here is the application running:

Note the little volume meter window. This provides feedback to the user that the system is waiting for speech, the volume of the sounds being recorded, and the amount of time available before recognition time out occurs. We will see how to adjust the time-out duration in a later example.

4.2 Simple variation

In the simple variation below, we recognise the country name and report the name of the capital of the country. We also refine the grammar so that the user can say "What is the capital of ..." if they want.

    <html xmlns:salt="http://www.saltforum.org/2002/SALT">
    <object id="speech-add-in" CLASSID="clsid:33cbfc53-a7de-491a-90f3-0e782a7e347a">
    </object>
    <?import namespace="salt" implementation="#speech-add-in"/>
    <!-- SALT: Recognise European Union country -->
    <salt:listen id="RecogEU" onreco="FindCapital()" babbletimeout="4">
     <salt:grammar>
      <grammar version="1.0" xml:lang="en-US"
       xmlns="http://www.w3.org/2001/06/grammar" root="EUCountry">
       <rule id="EUCountry" scope="public">
        <item repeat="0-1">What is the capital of</item>
        <one-of>
         <item>Austria</item><item>Belgium</item><item>Cyprus</item>
         <item>Czech Republic</item><item>Denmark</item><item>Estonia</item>
         <item>Finland</item><item>France</item><item>Germany</item>
         <item>Greece</item><item>Hungary</item><item>Ireland</item>
         <item>Italy</item><item>Latvia</item><item>Lithuania</item>
         <item>Luxembourg</item><item>Malta</item><item>Poland</item>
         <item>Portugal</item><item>Slovakia</item><item>Slovenia</item>
         <item>Spain</item><item>Sweden</item><item>The Netherlands</item>
         <item>United Kingdom</item>
        </one-of>
       </rule>
      </grammar>
     </salt:grammar>
    </salt:listen>
    <body>
    <h1>SALT: Find Capital of EU Country</h1>
    <p>The Capital of
    <input name="txtCountry" type="text" onclick="RecogEU.Start()" />
    is
    <input name="txtCapital" type="text" onclick="RecogEU.Start()" />.
    <p>Click in text field, wait for level meter, speak country name.
    </body>
    <script>
    function FindCapital()
    {
        var pRecog=document.getElementById("RecogEU");
        var pCountry=document.getElementById("txtCountry");
    
        var country = new String(pRecog.text);
        country = country.replace("What is the capital of ","");
        pCountry.value=country;
    
        var pCapital=document.getElementById("txtCapital");
        pCapital.value=LookupCapital(country);
    }
    
    function LookupCapital(country)
    {
        if (country=="Austria") return("Vienna");
        else if (country=="Belgium") return("Brussels");
        else if (country=="Cyprus") return("Nicosia");
        else if (country=="Czech Republic") return("Prague");
        else if (country=="Denmark") return("Copenhagen");
        else if (country=="Estonia") return("Tallinn");
        else if (country=="Finland") return("Helsinki");
        else if (country=="France") return("Paris");
        else if (country=="Germany") return("Berlin");
        else if (country=="Greece") return("Athens");
        else if (country=="Hungary") return("Budapest");
        else if (country=="Ireland") return("Dublin");
        else if (country=="Italy") return("Rome");
        else if (country=="Latvia") return("Riga");
        else if (country=="Lithuania") return("Vilnius");
        else if (country=="Luxembourg") return("Luxembourg");
        else if (country=="Malta") return("Valletta");
        else if (country=="Poland") return("Warsaw");
        else if (country=="Portugal") return("Lisbon");
        else if (country=="Slovakia") return("Bratislava");
        else if (country=="Slovenia") return("Ljubljana");
        else if (country=="Spain") return("Madrid");
        else if (country=="Sweden") return("Stockholm");
        else if (country=="The Netherlands") return("Amsterdam");
        else if (country=="United Kingdom") return("London");
        else return("Unknown");
    }
    </script>
    </html>
    

Try this out on your computer: Normal version, Debug version.

The particular changes of note are:

  • The change to a timeout of 4 seconds in the RecogEU object gives the user more time to say the phrase.
  • The addition of an optional grammar item for "What is the capital of". The repeat attribute sets the number of allowed repetitions: here 0 or 1.
  • The recognition object now fires the Javascript function 'FindCapital()' on a successful recognition, rather than using salt:bind. The function name is specified in the listen object attribute "onreco=".
  • The Javascript function strips off any leading "What is the capital of" then sets the contents of the input fields.

This is how it looks:

An exercise for you

Adapt the European Capitals script so that it speaks the answer!

Next: the <PROMPT> element in detail.

A Tutorial Introduction to SALT © 2005 Mark Huckvale, Phonetics and Linguistics, University College London

University College London - Gower Street - London - WC1E 6BT - Telephone: +44 (0)20 7679 2000 - Copyright © 1999-2013 UCL