A Tutorial Introduction to SALT

1. Overview

1.1 What is SALT?

The Industry collaboration The SALT Forum describes SALT as:

"The Speech Application Language Tags (SALT) 1.0 specification enables multimodal and telephony-enabled access to information, applications, and Web services from PCs, telephones, tablet PCs, and wireless personal digital assistants (PDAs). The Speech Application Language Tags extend existing mark-up languages such as HTML, XHTML, and XML. Multimodal access will enable users to interact with an application in a variety of ways: they will be able to input data using speech, a keyboard, keypad, mouse and/or stylus, and produce data as synthesized speech, audio, plain text, motion video, and/or graphics. Each of these modes will be able to be used independently or concurrently."

What this means is that SALT is a kind of programming language for the construction of multi-modal applications. That is it allows software designers to construct applications that take keyboard, mouse and spoken input - and can respond using text, graphics, audio and spoken output. An article by K.Wang of Microsoft describes the design aims of SALT. See Frequently Asked Questions about SALT for more background.

SALT is unusual in that it builds on and integrates with many existing web technologies rather than being a separate and independent software system. In particular, SALT is implemented as XML tags which can be embedded into web pages which in turn allow speech I/O objects to be manipulated using server-side or client-side web scripting languages. The architecture picture below (from the SALT forum) shows some of the components:

In this diagram, note how SALT applications can be delivered from a web server to a range of end-user devices from telephones to PDAs to PCs. On a phone, an application might appear as voice only; while on a PC it might take a visual form and accept a mixture of keyboard, mouse and speech input. The speech synthesis and recognition might run on the local client or it may reside at the remote server.

SALT is just the specification for the mark-up language with which applications are built. Members of The SALT Forum aim to construct operational software systems which use this formalism. A list of available implementations is available from the SALT forum. Microsoft's Speech Group is the leading player in the consortium and they have released a SALT implementation that we will be using in this tutorial to create a speech-enabled web browser on a Windows PC.

1.2 How SALT works

SALT is a set of mark-up tags that can be mixed with HTML and XML tags in the creation of web pages/applications. SALT applications run either on a speech server, where for example a telephony interface can be used to build interactive telephone enquiry services, or on client machines directly where client-side scripting languages can be used to build interactive web pages.

This tutorial is centered on client-side scripting using Microsoft Internet Explorer (IE) and an Active-X object (plug-in) called the Microsoft Speech Add-in for Internet Explorer. This add-in is a large set of software components that allow IE to produce speech output through text-to-speech and to recognise speech using simple word-based grammars. Speech output is replayed through the computer's sound card, and speech input is made through the computer's microphone input.

We control the Speech add-in by writing web pages which are displayed in IE. These web pages contain normal HTML mark-up which controls the look of the page, SALT mark-up which configures the speech synthesis and recognition system, and Javascript functions which tie the speech I/O to the text and graphics on the page. Clicking interface elements on the page can cause speech output to be produced, and spoken input can be translated into changes in page content. Speech I/O can be combined with keyboard and mouse input, as well as with graphical and audio output.

1.3 This tutorial

This tutorial is designed to get you started with SALT and client-side scripting of the Microsoft Speech add-in for Internet Explorer. The next page shows you how to download and install the add-in object. Subsequent pages take you through simple speech output and speech input, describe the SALT tags' capability in more detail, then demonstrate some more advanced applications.

The tutorial does not look at server-side operation for telephony applications.

You will need some basic knowledge of HTML and Javascript to understand the tutorial. However you do not need to own Microsoft Visual Studio .NET, you do not need to run a web server, and you do not need to have expertise in ASP programming nor detailed knowledge of speech technology.

Next: software installation.

A Tutorial Introduction to SALT © 2005 Mark Huckvale, Phonetics and Linguistics, University College London

University College London - Gower Street - London - WC1E 6BT - Telephone: +44 (0)20 7679 2000 - Copyright © 1999-2013 UCL