SFS Manual for Users

5. Speech Measurement Language

The Speech Measurement Language (SML) provides a programming environment for measuring and manipulating SFS files. However it can also be considered to be a programming language in its own right. The technical specification of SML is given in its manual page: SML(SFS1). This chapter gives a step-by-step introduction to the language suitable for users with some programming knowledge (of, for example, BASIC).

5.1 Basics

Format of SML Programs

SML is an interpreted programming language with a syntax loosely modelled on the 'C' language. The following SML program simply says hello:

/* hello -- 'hello world' program in SML */
init {
  print "Hello World!\n"
}

The text between '/*' and '*/' is a comment that is ignored by the interpreter. The procedure 'init' defines a section of code that is executed immediately the program is run - we will meet the two other procedures of SML in section 5.6. The limits of the init procedure are defined by the matching braces '{}'. The init procedure contains a single statement, namely 'print' which has as its action the printing of text onto the console. The print statement has a single argument, namely a text string containing the message to be printed. The text string is bounded by double quotes "..", and contains normal text characters and a special command for printing a newline character: '\n'.

To create this program it is necessary to use a text-editor to take text from the keyboard and place it in a source file, say 'hello.sml'. To execute this program it is necessary to pass the source file name to the SML interpreter sml:

% sml hello.sml
Hello World!
%

The layout of sml source statements is fairly unconstrained within a line - extra space and tab characters are ignored. Comments may appear anywhere white space can occur. Each SML statement must occur on its own line since there is no statement separator symbol in SML (unlike 'C' and Pascal which use ';'). SML statements can extend over more than one line if each statement line to be continued ends in a backslash character: '\'.

Printing Strings

The 'print' statement can take a list of arguments separated by commas, thus the print statement above could have been written:

print "Hello"," ","World!","\n"

and the output would have been identical.

As well as "\n" representing the newline character, SML has other special codes that may be embedded in a text string:

"\n" newline
"\b" backspace
"\r" carriage return
"\f" form feed
"\t" tab
"\\" backslash
"\"" double-quotes

String Expressions

SML also provides simple mechanisms for cutting up strings and joining them together. String expressions of the type:

allow the cutting up of the string from the left or the right:

"abcdef":4 -> "abcd"
"abcdef":-4 -> "cdef"
"abcdef":7 -> "abcdef "
"abcdef":-7 -> " abcdef"

String expressions of the type:

<string>:<start>:<end>

allow the selection of sub-sequences of the string:

"abcdef":2:4 -> "bcd"
"abcdef":4:8 -> "def "

String expressions of the type:

<string> ++ <string>

allow the concatenation of strings:

"Hello " ++ "World!" -> "Hello World!"

These string operators can be combined into more complex expressions with the use of parentheses '()':

("abc" ++ "def"):3:5 -> "cde"

Note that the special codes ("\n" etc) occupy only a single character position in a string.

5.2 Arithmetic

Numbers

SML has provision for performing accurate arithmetic. Numbers can be used in a program in any of the following formats:

1, -3, 1000000 integer-form
1.23, -123456.7 real-form
1E10, 3.5E-6 exponent-form

The following operators may be used to generate arithmetic expressions:

+ addition
- subtraction
* multiplication
/ division
% modulus (remainder after division)

Parentheses may be used to alter the order of evaluation as this example program demonstrates:

/* arith -- demonstrate simple arithmetic */
init {
  print "2+2=", 2+2, "\n"
  print "2*3+4=", 2*3+4, "\n"
  print "2*(3+4)=", 2*(3+4), "\n"
  print "small number =", 1.6E-10, "\n"
  print "large number =", 3.7E12, "\n"
}
% sml arith.sml
2+2= 4
2*3+4= 10
2*(3+4)= 14
small number= 1.6E-10
large number= 3.7E12

Number Formatting

The arguments to the print statement can be arithmetic expressions, as one can see. The format of a printed number is ordinarily 10 characters wide with 4 digits after the decimal point, although this format is changed for very big and very small numbers. The format can be controlled by the programmer using the following expressions:

<number>:<field>:<precision>
<number>:<field>

In the first expression, the number is printed in a field of characters of given width, with a given precision indicating the number of digits to print after the decimal point. The second expression prints the number as an integer, with no decimal point, in a field of given width. Thus:

2.3:6:2 -> "  2.30"
2.3:6 -> "     2"

Numerical values are held to approximately 10 figures of accuracy. If a number is too large to print in a given field width, the width is automatically extended. Thus to print an integer with no leading or trailing spaces it is sufficient to use a field width of 1:

234:1 -> "234"

Arithmetic Functions

SML contains a number of built-in functions for extended arithmetic:

abs(x) returns the absolute value of number x

acos(x) returns arc-cosine in radians of x.

asin(x) returns arc-sine in radians of x.

atan(x) returns arc-tangent in radians of x.

atan2(x,y) returns arc-tangent in radians of ratio x/y.

cos(x) returns cosine of x measured in radians.

exp(x) returns exponential of number x

log(x) returns natural logarithm of number x

log10(x) returns logarithm base 10 of x.

pow(base,x) returns the value of base taken to the power x.

random(x) returns a random number in a uniform distribution greater than or equal to zero and less than x.

sin(x) returns sine of x measured in radians.

sqrt(x) returns the square-root of number x

srandom(seed) resets the random number generator with the supplied seed value. A seed of 0 uses the current time as a seed.

tan(x) returns tangent of x measured in radians.

trunc(x) returns the integer part of number x

For example:

/* exp -- demonstrate arithmetic functions */
init {
  print "log(15)=",log(15),"\n"
  print "exp(log(15))=",exp(log(15)),"\n"
}
% sml exp.sml
log(15)= 2.7081
exp(log(15))= 15

ERROR value

Certain arithmetic requests can be syntactically correct, but meaningless. For example what should be the result of dividing a number by 0 ? Or taking the square root of -1 ? SML has an unusual method for dealing with these issues that provides the user with error recovery. The result of dividing a number by zero or of square-rooting a negative number is a special numerical value signifying ERROR. As we shall see in later sections, it is possible to test an arithmetic expression to see if it is equal to ERROR, and take appropriate action.

The ERROR value propagates through expressions, so that any arithmetic expression that has a single component equal to ERROR will evaluate as a whole to ERROR, Thus:

/* error -- demonstrate ERROR value */
init {
  print "10*sqrt(-1)=",10*sqrt(-1),"\n"
  print "3/0=",3/0,"\n"
}
% sml error.sml
10*sqrt(-1)= E=R=R=O=R
3/0=sml: division by zero near line 4
 E=R=R=O=R

The print statement responds to a request to print the ERROR value by printing the string "E=R=R=O=R". In the second print statement above, the division by zero causes an expression of value ERROR to be printed, but for this type of error sml also prints a run-time error message.

5.3 Input/Variables

Variables

SML has five types of variables that can be used to hold data values inside an SML program. These types are called: var, string, stat, file, and item. The stat type variable will be introduced in section 5.8, the file variable in section 5.9, and the item variable will be introduced in section section 5.12. Before variables can be used, they must be declared. Variables may be declared inside a procedure, in which case they may only be used inside the procedure, or they may be declared outside a procedure, in which case they are available to all procedures. The first type of variables are called local, and the second type global. We shall examine the difference in section 5.6.

Variable declarations are made by listing their names after a declaration of their type. Variables for holding arithmetic values are declared with the statement:

var <variable>(,<variable>)

e.g.

var value, number_of_annotations, i

Variables for holding text strings are declared with the statement:

string <variable>(,<variable>)

e.g.

string Name, SerialNo

The conventions for naming a variable are that the first letter of the variable name must be an alphabetic character and that subsequent characters can be alphabetic, numeric, the dollar sign: $, or the underline: _. The case of alphabetic letters is important (i.e. 'Name' is different from 'name'), and the maximum length name is 80 characters. All variables are initialised, numerical variables to 0.0, and string variables to "".

Variables can be given values in three ways: by the assignment statement, by the input statement, or by a call to a function. The last will be described in section 5.10.

Assignment

The assignment statement is of the form:

<var type variable> = <arithmetic expression>

<string type variable> = <string expression>

Here is a program to demonstrate:

/* assign -- demonstrate assignment statement */
init {
  var aval
  string  sval
  aval = 10 * (3 + sqrt(2))
  sval = "good" ++ "bye"
  print "aval=",aval:10:6," sval=",sval,"\n"
}
% sml assign.sml
aval= 44.142136 sval=goodbye
%

String-type and var-type variables may be placed in the argument list of a print statement in the same way as string expressions and arithmetic expressions. In general, string-type variables can be used anywhere a string may be used, and var-type variables can be used anywhere a number may be used (the only exception to this is the switch statement described in section 5.5).

Input

The input statement accepts input from the console and decodes it into values that are placed in the variables comprising its argument list. Thus the statement:

input val1,val2

where 'val1' and 'val2' are var-type variables, will cause the program to wait for the user to type a line of text at the console and then decode the text into two numbers which are then assigned to the two values. The decoding process is very simple and expects the user to type the correct format: in this case two numbers separated by spaces. If the input statement cannot decode the input text properly, then one or more of its arguments may be set to ERROR. String variables can also be assigned using the input statement, but once again, space characters are used to delimit individual assignments (to input an entire line, space characters and all, use the statement inputline described in section 5.9).

Here is an example of input:

/* input -- demonstrate input statement */
init {
  var val1,val2
  string firstname,lastname
  print "Type two numbers:"
  input val1,val2
  print "The sum is ",val1+val2,"\n"
  print "Type your name:"
  input firstname,lastname
  print "Hello ",firstname,", I mean Dr.",lastname,"!\n"
}
% sml input.sml
Type two numbers:12 13
The sum is 25
Type your name:Mark Huckvale
Hello Mark, I mean Dr. Huckvale!
%

The print statement can be used to prompt the user to enter the required format text. Since the print statement did not end in a "\n", the input text was taken from the same line on the console as the print statement. To show what can go wrong however, consider:

% sml input.sml
Type two numbers:Hello program
The sum is: E=R=R=O=R
Type your name:Mark A. Huckvale
Hello Mark, I mean Dr. A.!
%

At this point we clearly need some mechanism for testing expressions inside the program so that we can take alternative courses of action. For this we require the if statement.

If

The if statement has the general form:

if (<conditional expression>) {
  <statements if expression was true>
}
else {
  <statements if expression was false>
}

Where a conditional expression is a number of comparison operations combined with the conjunctions && (and), || (or), and ! (not). The comparison operations are only defined for arithmetic expressions, and they are == (equal), != (not equal), < (less than), > (greater than), <= (less or equal), >= (greater or equal). We shall see how conditional expressions are created in the following examples.

To test a simple variable, we might use:

if (year >= 0) {
  age = "A.D."
}
else {
  year = -year
  age = "B.C."
}

Where the var-type variable year is compared with 0 and the assignment statements chosen according to whether the comparison was true or false. Simple comparisons can be combined as in:

if ((val > 0) && ((val%2)==0)) {
  print "Positive, even number\n"
}
else {
  print "Negative or odd number\n"
}

It is good practice to enclose each comparison in its own set of parentheses before combining them with the logical conjunctions.

If layout

The general form of the if statement can be simplified under certain circumstances. If the else portion of the statement is not required, it may be left out:

if (invoice_total > 0) {
  print "The amount of",invoice_total," is now due\n"
}

If the set of statements to be executed consists of only a single statement, the braces may be dropped and the statement placed on the same line as the 'if' or the 'else'. Note, however that the 'else' statement must also be written on the same line as the 'if' if no braces are used for the first set of statements. Thus, the following are all equivalent:

if (i > 10) {
  j=1
}
else {
  j=2
}
if (i > 10) {
  j=1
}
else j=2
if (i > 10) j=1 else {
  j=2
}
if (i > 10) j=1 else j=2

But the following is an error:

if (i > 10) j=1
else j=2

SML follows the 'C' conventions for attaching trailing 'else' statements to preceding 'if's, namely to attach it to the last 'if' that has no 'else'. Thus in the following code:

if (i >= 0) if (i > 5) print "A" else print "B"

nothing would be printed if i was negative.

ERROR Conditions

The evaluation of conditional expressions has peculiar characteristics when one of its components is equal to ERROR. ERROR values propagate through conditional expressions, ensuring that the whole expression is equal to ERROR. Under these circumstances, neither the TRUE block of statements nor the FALSE block are executed. That is, a conditional expression value of ERROR is treated differently from an expression of FALSE. Care must be taken that comparisons made in a conditional expression are not made on ERROR values. It is possible to test for the ERROR condition by using the value itself as a conditional expression. Thus:

if (val) print "ok" else print "error"

would print "ok" if val held a normal number, and "error" if it was set to ERROR. Compare this with:

if (val < 0) print "negative" else print "positive"

which would print neither "positive" nor "negative" if val was set to ERROR.

5.4 Arrays/Loops

Variable Arrays

The var and string statements used for declaring simple variables can also be used to declare one- or two-dimensional arrays of variables. The form of an array declaration is one of:

<type name> <variable name> [ <end index> ]
<type name> <variable name> [ <start index> : <end index> ]
<type name> <variable name> [ <numrows> , <numcols> ]

For example:

var primes[1000]
frequencies[0:5000]
table[10,20]
string names[1:5]
var year[1800:1988]

If no start index is specified then it defaults to 1. Two dimensional arrays are always indexed from 1.

Individual elements of the arrays are accessed by indexing the array name with an arithmetic expression that must evaluate to within the declared range (inclusive). For example "primes[100]" or "table[3,4]". An index outside the declared range causes a run-time error and returns the contents of the first element of the array. Thus using the examples above, the statement:

print "year[1988]=",year[1988]," year[10]=",year[10],"\n"

might print:

year[1988]= 0 year[10]=sml: subscript out of range error near line 6
0

Within a program, the lowest available index of an array can be found using the function lobound(). Similarly, the highest index can be found using the function hibound(). Thus, using the declaration above:

lobound(names) == 1
hibound(names) == 5

For two dimensional arrays, lobound() returns 1, while hibound() returns numrows*numcols. Two dimensional arrays can also be accessed as if they were one-dimensional and arranged across columns and then down rows.

While

The SML looping constructs are often used in association with arrays. There are two forms: the while statement and the for statement. The general form of the while statement is:

while ( <conditional expression> ) {
  <statements to be executed while expression is true>
}

For example:

powersoftwo=0
while (val >= 2) {
  val = val / 2
  powersoftwo = powersoftwo + 1
}

In this case, the statements in braces are executed repetitively until 'val' has a value less than 2. If there is only one statement in the while loop, the braces can be dropped, and the statement placed immediately after the conditional expression:

while ((i < hibound(vector)) && (vector[i] != 0)) i=i+1

For

The form of the for statement is at first sight complicated:

for (<statementA>;<conditional expression>;<statementB>) {
  <statements to be executed while expression is true>
}

For example:

for (i=0;i < 10;i=i+1) {
  print i,i*i,"\n"
}

Which would print a table of integers and squares for the range 0 to 9 inclusive. However any for statement can be replaced by an equivalent while statement:

<statementA>
while (<conditional expression>) {
  <statements to be executed while expression is true>
  <statementB>
}

Hence the best way to view the for statement is as a convenient shorthand for a common form of while statement. The alternative form for the for example above would be:

i=0
while (i < 10) {
  print i,i*i,"\n"
  i=i+1
}

The for statement can also be written on a single line if only a single statement is to be repetitively executed. To total an array one might use:

sum=0
for (i=lobound(array);i<=hibound(array);i=i+1) sum=sum+array[i]

5.5 Switch/Pattern Matching

String Functions

SML contains a number of functions for manipulating strings over and above the string operators for cutting and joining strings.

Function	Description
strlen(str)	Returns the number of characters in the string. strlen("abc") == 3
ascii(str)	Returns the character code for the first character in the string. ascii("ABC") == 65 ascii("") == 0
char(expr)	Returns the string character from the ASCII code. char(65) == "A" char(0) == ""
val(str)	Returns the numeric value of a string. val("31.6") == 31.6 val("hello") == ERROR
compare(str,str)	Compares two strings. Returns a value 0 if the two strings are the same. Returns a value -1 if the first string is alphabetically earlier than the second. Returns a value 1 if the first string is alphabetically later than the second. /* sort an array of strings / init { stringnames[0:1000] vari,j,num / initialise names[1] to names[num] / / insertion sort */ for (i=2;i<=num;i=i+1) { j=i; names[0]=names[j] while (compare(names[j-1],names[0]) > 0) { names[j]=names[j-1] j=j-1 } names[j]=names[0] } }
entry(str,str_array)	Searches the array of strings for the first entry that matches the given string exactly. Returns the array index, or ERROR if not found.
index(pattern,target)	Matches a given pattern against a target string (using a string matching language described below), returning the index of the start of the match if found, or ERROR if not.

Pattern Matching

String matching in SML is performed using regular expressions, an unfortunately complicated but powerful technique for comparing a string with a general pattern rather than with a particular character sequence. We shall use the index() function to demonstrate the basic characteristics of regular expressions.

A regular expression is simply a special form of string expression where some characters have special meaning. In particular the characters '.' '*' '+' '[' ']' '^' '$' '(' ')' '{' '}' have special meaning. All of the other characters act as normal - so that the letter 'a' in a regular expression means that the pattern to be matched must contain an 'a'. If you want to use any of the above special characters in a pattern, e.g. a pattern that contains a dollar sign, you must escape the character with a backslash character, '\$'.

Now for some pattern matching rules:

a string of ordinary characters in the pattern will match an exactly equivalent string anywhere along the length of the target:
```
index("cde","abcdef") == 3
```
```
index("hello","Hello Mark") == ERROR
```
the character '.' matches any single character:
```
index("c.e","abcdef") == 3
```
the character '^' matches the beginning of the target:
```
index("^M","Mark") == 1
```
```
index("^M","KLMNOP") == ERROR
```

the character '$' matches the end of the target:

index("ly$","slowly") == 5

index("ly$","lying") == ERROR

square brackets contain alternative characters at a single position in the target:
```
index("[ns]o[ru]th","north") == 1
```
```
index("[ns]o[ru]th","south") == 1
```
Any number of alternative characters can be placed in the brackets, and for convenience the form [a-e] means [abcde]:
```
index("[0-9]\.[0-9]","312.65") == 3
```
definitions of a character match for a single position in the target can be extended to match one or more repetitions with '+' and extended to zero or more repetitions with '*':
```
index("a+b","aaaaabcdef") == 1
```
```
index("a*b","aaaaabcdef") == 1
```
```
index("a+b","bcdef") == ERROR
```
```
index("a*b","bcdef") == 1
```
```
index(".*","anything at all") == 1
```
```
index("[0-9]+","3215") == 1
```
definitions of a character match for a single position in the target can be extended to match between m and n times inclusively with '{m,n}':
```
index("^.{2,4}$","the") == 1
```
```
index("^.{2,4}$","there") == ERROR
```

Pattern Alternatives

A number of regular expressions can be combined into a single string using the special string concatenation operator, ">>". Such a sequence of patterns is treated as alternative matches - so that if (and only if) the first pattern does not match, the second pattern is used, and so on. To create a text string containing ">>", one can either include the digraph in the string or concatenate sub-strings:

index("a">>"b">>"c","lunch") == 4

index("a>>b>>c","lunch") == 4

index("a>>b">>"c","lunch") == 4

index("[Mm]ark">>"[Jj]ill","jill") == 1

Switch

Performing a range of actions as a function of the contents of a variable can be performed using a chain of if statements:

if (val==1) {
  /* do first action */
}
else if ((val>=2) && (val<10)) {
  /* do different action */
}
else {
  /* default action */
}

However, a more convenient method is to use the switch statement. The switch statement takes either an arithmetic expression or a string expression and executes one block of statements as a function of the value of the expression. The general form of the switch statement is:

switch (<expression>) {
  <case definition>: {
    <statements to be executed if expression fits case>
  }
  <case definition>: {
    < etc >
  }
}

The allowed forms of the case definitions depend on whether an arithmetic or string expression is used in the switch. For arithmetic expressions, the allowed forms are:

case <number>
range <number>:<number>
default

The case form introduces a single numerical value, the range form introduces a range of values (inclusive lower bound, exclusive upper bound), and the default form introduces the statements to be executed if no other match is found. For string expressions, the allowed forms are:

case <string constant>
pattern <string constant>
default

The case form introduces a single string value, the pattern form introduces a regular expression match, and the default form introduces the statements to be executed if no other match is found. Hence the if chain example above might have been written:

switch (val) {
case 1:/* do first action */
range 2:10:/* do different action */
default:/* do default action */
}

An example based on a string expression:

switch (reply) {
  case "Hello": {
    print "Hello\n"
  }
  pattern "[Gg]oodbye":\
    print "Goodbye\n"
  default:\
    print "What ?\n"
}

Notes on the switch statement: (i) cases are searched in the order specified, so that the default case must come last, (ii) there is currently no mechanism for having more than one case label for a single set of statements, (iii) the values in the case definitions must be constants.

5.6 Procedures

Procedural Structure

In this section we will introduce the major procedural devices for processing SFS data files within SML. The original design for SML foresaw a report generation language in which it would be possible to write a specification for a set of measurements to be made on a number of SFS data files and to print the results. The structure of this specification was:

Initialise control variables.
For each file to be processed:
  make measurements on a file.
Print summary of results.

This report-writing structure is reflected in the procedural structure of an SML program:

init {
  /* initialise control variables */
}
main {
  /* make measurements on a file */
}
summary {
  /* print summary of results */
}

An SML program can contain a maximum of three procedures: called 'init', 'main' and 'summary'. The 'init' procedure is executed once, before any other processing is done. The 'main' procedure is executed once for each SFS data file supplied to the SML interpreter sml. The 'summary' procedure is executed once after all the files have been processed with 'main', and immediately before the program terminates. All of the procedures are optional, but at least one must be present for the program to do anything !

Processing Data Files

The names of the SFS data files to be processed are passed on the command line, either as individual files or as the name of directories to be searched. Consider this program:

/* files -- demonstrate execution of SML procedures */
init {
  print "Doing Initialisation.\n"
}
main {
  print "Processing '",$filename,"'.\n"
}
summary {
  print "All done. Processed ",$filecount:1," files.\n"
}
% sml files.sml data1 data2 data3
Doing Initialisation.
Processing 'data1'.
Processing 'data2'.
Processing 'data3'.
All done. Processed 3 files.
%

But note:

% sml files.sml
Doing Initialisation.
All done. Processed 0 files.
%

Built-in Variables

Introduced in this program are the variables '$filename' and '$filecount'. These contain the name of the current file and the number of the current file respectively. They are two of a number of built-in variables that are automatically updated as the program executes. The full set is:

var $filecount	counts files examined
string $filename	holds current filename
string $date	holds current date and time
string $head_username	owner of current file
string $head_source	source of file
string $head_dbase	name of database
string $head_speaker	name of speaker
string $head_session	description of recording session
string $head_sessdate	date of recording session
string $head_token	name of token
string $head_rep	repetition of token

The '$head_..' variables are read from the main header of the current file (and are only as reliable as the entries in the main header).

In this example, a database is searched to find files that do not have the main header 'speaker' field initialised, and also to print a list of all speakers:

/* speaker -- find speakers */
/* global data */
string names[1:100]
var    namecount
/* main processing of files */
main {
  if (compare($head_speaker,"")==0) {
    /* speaker field not initialised */
    print $filename,": speaker name not set\n"
  }
  else {
    /* add speaker to list */
    if (entry($head_speaker,names)) {
      /* already in list - do nothing */
    }
    else {
      /* add name to list */
      namecount=namecount+1
      names[namecount]=$head_speaker
    }
  }
}
/* summary processing */
summary {
  var i
  /* print table of speakers found */
  print "Speakers found:\n"
  for (i=1;i<=namecount;i=i+1) print names[i],"\n"
}

% sml speaker.sml /speech/database
/speech/database/digits/one.2: speaker name not set
/speech/database/sentences/vase.1: speaker name not set
Speakers found:
mb
jh
ea
jw
%

In this example, the name of a directory containing SFS data files is passed to the SML interpreter rather than the names of the files. When sml sees a directory name, it recursively searches the directory and any sub-directories for SFS files.

Variable Scope

The previous example demonstrates the concept of global and local variables. The variables 'names' and 'namecount' had to be available for use in both the 'main' and the 'summary' procedure. To achieve this, their declaration was made outside and before the main and summary procedures. That is, their scope was global to all procedures. However the variable i is used only in the summary procedure, it was declared within the summary procedure and it can not be accessed by any other procedure. Its scope is said to be local.

Local variables in different procedures can have the same name without interfering with each other. A local variable can have the same name as a global variable, but access to the global variable is lost within the procedure. Variables can have the same names as built-in variables or functions, but access to these will be lost locally or globally as appropriate.

Escapes

The SML abort statement causes the program to terminate with a given message when it is executed. It can be used to halt execution when a non-recoverable error is found, for example:

summary {
  if ($filecount==0) abort("no data files supplied\n")
  /* ... summary processing */
}

The SML statement break is less severe than abort, it simply causes execution within a loop to be terminated, or if not in a loop, causes the current procedure to be terminated. In the case of the 'init' and 'summary' procedures, a break outside a loop stops program execution; within the main procedure a break outside a loop causes main to skip to the next file to be processed:

/* break -- demonstrate 'break' statement */
var namecount
string names[1:100]
init {
  string name
  /* get table of speakers */
  print "Input speaker name:"
  input name
  while (compare(name,name)) { /* loop until ERROR */
    if (compare(name,"")==0) break /* loop until NULL */
    namecount=namecount+1
    names[namecount]=name
    print "Input speaker name:"
    input name
  }
  if (namecount==0) break /* no names entered */
}
main {
  /* check speaker against table */
  if (!entry($head_speaker,names)) break
  print $filename," is not on list\n"
}
% sml break.sml /speech/database
Input speaker name:jb
Input speaker name:fred
Input speaker name:
/speech/database/digits/one.jb is not on list
%

5.7 Annotations

Annotation Functions

SML has a number of built-in functions that are able to extract information from annotation data sets stored in SFS files. These functions are a subset of the SML functions for making measurements on data sets which will be listed in the next section.

The functions may be used within the 'main' procedure of an SML program. They take parameters, such as a pattern to match against an annotation string, then locate an annotation data set in the current file, and return some values, such as a location in time.

The annotation functions are:

time(s)	Takes a pattern match string, s, and returns the time at which the first matching annotation is located. print "/t/ located at ",time("t")," seconds\n"
length(s)	Takes a pattern match string, s, and returns the duration of the first matching annotation. print "duration of /a/ vowel =",length("a")," seconds\n"
match(s)	Takes a pattern match string, s, and returns the full text of the first matching annotation. print "first annotation is ",match("."),"\n"
numberof(s)	Takes a pattern match string, s, and returns the number of annotations in the whole data set that match. print "number of vowels =", numberof("[aeiou]"),"\n"
timen(s,n)	Takes a pattern match string, s, and an instance n, and returns the time at which the nth matching annotation is located. print "1st syllable length =", timen("\$",2)-timen("\$",1),"\n"
lengthn(s,n)	Takes a pattern match string, s, and an instance n, and returns the duration of the nth matching annotation. print "last stop length =", lengthn("[ptkbdg]",numberof("[ptkbdg]")),"\n"
matchn(s,n)	Takes a pattern match string, s, and an instance n, and returns the full text of the nth matching annotation. /* list all annotations */ for (i=1;i<=numberof(".");i=i+1) { print timen(".",i)," : ",matchn(".",i),"\n" }

Annotation Matching

The pattern match expressions used by the annotation functions have slight differences to the expressions used by the function 'index' and by the switch 'pattern' statement described in section 5.5.

The annotation functions automatically insert an initial "^" character into all annotation pattern matches. This means that all matches start from the beginning of an annotation. Thus the call 'match("s")' will only return an annotation that begins with 's'. However if an exact match to an annotation 's' is required, then the pattern must be given as "s$".

There is a simple mechanism for matching a sequence of annotations. The special string concatenation operator: ">>" can be used to string together a number of individual matches. When called with a sequence of matches in this way, the 'time' functions return the time of a matching sequence of annotations, the 'length' functions return the duration of the matching sequence, and the 'match' functions return a concatenation of matched annotations.

/* tactics -- count syllable-initial consonant sequences */
string smatch[0:3]/* sequence matches */
var scount[0:3]/* sequence counts */
init {
  string syl,con,vow
  syl = "\$"/* match syllable boundary marker */
  con = "[bcdfghjklmnpqrstvwxyz]"/* match consonants */
  vow = "[aeiou]"/* match vowels */
  smatch[0] = syl >> vow
  smatch[1] = syl >> con >> vow
  smatch[2] = syl >> con >> con >> vow
  smatch[3] = syl >> con >> con >> con >> vow
}
main {
  var i
  for (i=0;i<=3;i=i+1) {
    /* get counts for each pattern */
    scount[i] = scount[i] + numberof(smatch[i])
  }
}
summary {
  var i
  for (i=0;i<=3;i=i+1) {
    /* print table of counts */
    print "Size = ",i:1
    print " Number = ",scount[i]:1,"\n"
  }
}

With

Detailed examination of an SFS file with many annotations can be cumbersome if only the annotation matching functions are used. For example, while it is straightforward to count the number of syllables, it is less easy to investigate further syllable n. Consider the task of printing the durations of 'p' annotations in the context of 's _ l'. It is easy to identify the location and duration of the whole consonant sequence:

loc = time("s>>p>>l")

dur = length("s>>p>>l")

but the command

pdur = lengthn("p",n)

can only be used if we know that the 'p' annotation matched is the one that was found between 's' and 'l'. We could establish this by testing its location in time (between 'loc' and 'loc+dur') but this is very messy. Instead SML provides the with statements.

The with statements provide a procedural mechanism for reducing the range of the file over which annotation matching takes place. It can be viewed as a window on the file. Within the domain of a with statement, all annotation matching performed using time(), length(), etc, only takes place within the window. The three forms of the with statement allow you to specify the window with an annotation match, the nth match of an annotation, or by specific times:

with (<annotation match>) {
  <statements to be limited>
}
withn (<annotation match>,<count>) {
  <statements to be limited>
}
within (<start time>,<end time>) {
  <statements to be limited>
}

Thus, the 'p' duration measurements could be made with:

/* with -- demonstrate 'with' statement */
string context
string target
init {
  context = "s" >> "." >> "l"
  target = "p"
}
main {
  var i,num,dur
  num = numberof(context)
  for (i=1;i<=num;i=i+1) withn(context,i) {
    dur = length(target)
    if (dur) print "/p/ duration in /spl/ =",dur,"\n"
  }
}

5.8 SFS measurement

Built-in measurement functions

So far, the only measurements we have made on SFS files have been durations based on the location of annotations. Of course SFS files can contain a great deal of other information that we would like to be able to measure: a fundamental frequency at a given time, or the amplitude of the first formant, or the energy at 500Hz.

SML contains a number of built-in functions for obtaining values from data sets stored in SFS files. They all take a time and some take other parameters to locate a value in a data set and return it to the program. For the functions to operate, the data sets must be present in the files. That is SML does not perform any processing on the file, it only retrieves the results of previous processing. Later in this section we shall see how to check files to see if particular data sets are present. In general, the built-in measurement functions simply return ERROR if the required measurement can not be made.

The built-in measurement functions are:

Name	Description
co(entry,t)	returns the value of the requested parameter number, at given time from CO item. Parameters are numbered sequentially from 0, with 0=position, 1=size, 2=flag, 3=mix, 4=gain. Parameters 5 and onward are the data values.
energy(f,t)	returns energy in dB from coefficient item at frequency f and time t. This function needs parameters minf and maxf to be set in the coefficient item and for frequencies to be allocated linearly
f1(t), a1(t), b1(t)	returns frequency, amplitude and bandwidth of formant 1 from synthesizer control item at time t.
f2(t), a2(t), b2(t)	returns frequency, amplitude and bandwidth of formant 2 from synthesizer control item at time t.
f3(t), a3(t), b3(t)	returns frequency, amplitude and bandwidth of formant 3 from synthesizer control item at time t.
f4(t), a4(t), b4(t)	returns frequency, amplitude and bandwidth of formant 4 from synthesizer control item at time t.
fm(entry,t)	returns the value of the given entry in a raw formant estimates frame at time t. The 'entry' parameter indexes into the record as if it was an array, thus entry==4 would return the 'npeaks' value.
sp(t)	returns the sample value at time t from a speech item.
fx(t)	returns the fundamental frequency in Hz at time t from a fundamental frequency item.
lp(attr,t)	returns the value of the given attribute at time t from a phonetic item.
sy(entry,t)	returns the value of the given entry in a synthesizer control item at time t. The 'entry' parameter indexes into the frame as if it was an array, thus entry==3 would return the 'F1' value.
tr(t)	returns the value of a parameter track item at time t.
tx(t)	returns the excitation period in seconds from an excitation item at time t.
pc(entry,t)	returns the value of the requested parameter number, at given time from PC item. Parameters are numbered sequentially from 0, with 0=position, 1=size, 2=flag, 3=mix, 4=gain. Parameters 5 and onward are the data values.
vu(entry,t)	returns the value of the requested parameter number, at given time from VU item. Parameters are numbered sequentially from 0, with 0=position, 1=size, 2=flag, 3=mix, 4=gain.

In this example, measurements are made of fundamental frequency from TX, FX, LP and SY items:

/* freq -- demonstrate item measurement */
main {
  var t
  t = time("a")/* locate vowel */
  print "TX data gives ",1.0/tx(t)," Hz\n"
  print "FX data gives ",fx(t),"Hz\n"
  print "LP data gives ",lp("PITCH",t),"Hz\n"
  print "SY data gives ",sy(0,t),"Hz\n"
}

Item Selection

In the same way that SFS processing programs need an item selection switch (-i item) on the command line to identify which of a number of possible input items is the one to be processed, SML requires a mechanism for selecting which of a number of possible items is the one to be measured with the measurement functions above. SML has two mechanisms for item selection: command-line defaults and item selection functions.

When a measurement function such as tx() is called, SML locates the currently selected item in the file. If no item is selected, then the file is searched for the first item that will satisfy the request (here a TX item), and makes that the current selection. The choice of item when there is no current selection can be altered by command-line switches to sml using the same scheme as other SFS programs, namely '-i item'. Thus if the program above was executed with:

% sml freq.sml data1

Then the first TX item in the file 'data1' would be chosen as the source of measurement made with tx(). If, however, the command was:

% sml -itx freq.sml data1

Then the last TX item in the file would be chosen (see section 2.3). The other formats for item selection are also supported.

The choice of item can also be affected inside a program by calling one of the following item selection functions.

Name	Description
select(item)	makes the specified item the current selection. Returns the number of frames in the item (if found) or ERROR (if not). Note that select() will override the command-line defaults only if a sub-type number is given. That is: select(3.01) will override: % sml -itx.03 freq.sml data1 but select(3) will not.
selectitem(item)	does the same function as select(), but returns the item number of the selected data set rather than the number of frames in the data set: print "Using annotation item ",select(AN),"\n"
selectmatch(spec)	selects the first item that matches the given specification, returning the item number or ERROR. The format of the specification string is the same as the argument to the '-i' command line switch. selectmatch() always overrides command-line defaults. selectmatch("tx^HQtx")
selectnext(item)	selects the next item of the same type after the specified item number. Returns the next item number or ERROR. Always overrides command-line defaults. it = selectnext(3) while (it) { print "TX item: ",it:4:2,"\n" it = selectnext(it) }

Once any of these functions has been called, the item remains selected until another item selection function is called for the same item type, or until the 'main' procedure moves on to the next file, or until the 'system()' function is requested (see section 5.9).

To avoid having to remember all the item numbers, the following built-in constants are provided:

SP, LX, TX, FX, AN, LP, SY, DI, CO, FM, PC, TR

Of use when selecting items for measurement is the function:

history(itemtype) which returns as a string the history field for the currently selected item of the given type.

/* items -- demonstrate item selection functions */
main {
  /* find parameters used in TX calculation */
  var it,offset,threshold
  string hist
  /* loop through all TX items processed by tx program */
  it = selectnext(TX)
  while (it) {
    hist = history(TX)
    if (index("^tx\(.*\)$",hist)) {
      offset = index("thresh=",hist)
      threshold = val(hist:offset+7:strlen(hist))
      print $filename,": item ",it:4:2
      print " threshold=",threshold,"\n"
    }
    it = selectnext(it)
  }
}
% summary -atx txfile
3. TX 3.01 200 frames from tx(2.01;thresh=4.0)
5. TX 3.02 150 frames from tx(2.01;thresh=6.0)
% sml items.sml txfile
txfile: item 3.01 threshold= 4
txfile: item 3.02 threshold= 6

Data Events

The measurement functions return values in seconds, Hertz and dB regardless of the actual coding used in the data sets. They also accept arguments in these units. Thus the programmer can make measurements on data sets without knowing details of sampling rate, or the size of spectra. There are occasions, however, when it is necessary to know the timing of the individual frames of data within a data set: to be able to copy out information only once from each frame, say. SML provides a function next() which, given an item type and a time, returns the time of the next frame of data in the currently selected item of that type that occurs after the given time. (In fact it returns a time 1 microsecond after that event, so that the returned value can be reliably used in measurement functions).

Thus to run through all the individual TX entries in a data set, one might use a piece of code such as:

var t
t = next(TX,-1)
while (t) {
  print "tx value = ",tx(t),"\n"
  t = next(TX,t)
}

To determine the sampling frequency of a speech item, one might use:

var t0,t1
t0 = next(SP,-1)
t1 = next(SP,t0)
print "sampling rate = ",1.0/(t1-t0),"Hz\n"

Stat variables

Thus far we have only used two types of SML variable: var and string. In this section we introduce a third type: stat, which makes the collection of information about distributions of measurements of data sets considerably easier.

A stat variable actually contains six var values, these values are initially set to zero but changed automatically as the stat variable is updated with arithmetic values. The internal values of a stat variable are:

count	number of updates
sum	sum of updates
sumsq	sum of squared updates
mean	mean of updates
variance	variance of updates
stddev	standard deviation of updates

Stat variables are updated with a special form of the assignment statement:

<stat variable> += <arithmetic expression>

The individual fields of a stat variable may be accessed as ordinary arithmetic values using the syntax:

<stat variable>.<stat field name>

where the stat field name is one of 'count', 'sum', etc from the list above.

A piece of code to find the average fundamental frequency between two markers ('start' and 'end') might look like:

var t
stat fxval
t = next(FX,time("start"))
while (t < time("end")) {
  fxval += fx(t)
  t = next(FX,t)
}
print "mean FX=",fxval.mean," +/-",fxval.stddev," Hz\n"

The updating process for the generation of the variance and standard deviation values from the record of values follows the usual formulae for estimating the parameters of a population from a sample:

stat.mean= stat.sum / stat.count
stat.variance= (stat.sumsq - (stat.sum)²/stat.count)/(stat.count-1)
stat.stddev= sqrt(stat.variance)

5.9 File I/O

SML has built in facilities for manipulating text files. This allows programs that analyse data files to record results for later processing, or to read from a file details of what processing should take place.

File Channels

We have met the two statements used to read and write to files already, they are simply input and print. The forms of these statements that we have seen are simply special cases of forms able to read and write using a file channel. In the cases described so far the file channel has been implicit and connected to the terminal.

File channels are values held by a fourth type of SML variable, the file type. There are also three pre-defined file channels:

stdin    standard input channel
stdout   standard output channel
stderr   standard error channel

These channels are always open and are usually used to communicate with the users terminal. They may however be redirected to files or other processes using the standard mechanisms of the Unix shell.

The declaration of a file variable must be made outside any procedure; that is all file variables are global. This is to prevent the duplication of file channels to a single file, and to allow file channels to be closed properly at the end of the program. File channels can be opened by the init procedure, used by the main procedure, and closed by the summary procedure.

The opening and closing of file channels is performed using the following functions:

openin(<file variable>,<file name>) opens a file channel for reading

openout(<file variable>,<file name>) opens a file channel for writing

openappend(<file variable>,<file name>) opens a file channel for appending

close(<file variable>) closes a file channel

Each function returns a normal arithmetic value on success, and ERROR on failure. File channels only need to be closed explicitly in an SML program if the file variable is to be re-used.

File I/O statements

The general form of the print and input statements is:

print # <file channel> <print list>

input # <file channel> <input list>

In addition, there is a second input statement we have not met before, namely inputline:

inputline # <file channel> <string variable>

This statement reads an entire line from the file channel, removes the final "\n", and stores it in the given string variable.

When no file channel is given, the '#' symbol should be dropped and the standard defaults are used:

print "Message"   ->   print # stdout "Message"

input answer      ->   input # stdin answer

inputline line    ->   inputline # stdin line

Errors discovered while executing the input statement are indicated by setting one or more of the variables in the input list to ERROR.

The following program copies one text file to another, eliminating duplicate lines:

/* unique -- SML program to demonstrate file I/O */
file in,out
init {
  string  line,last
  /* open file channels */
  openin(in,"ipfile")
  openout(out,"opfile")
  /* get first line, check ok */
  inputline # in last
  if (!compare(last,last)) break
  print # out last,"\n"
  /* scan rest of input file */
  inputline # in line
  while (compare(line,line)) {
    if (compare(line,last)!=0) print # out line,"\n"
    last = line
    inputline # in line
  }
  close(in)
  close(out)
}

File channels can also be opened onto other processes using the openin() and openout() functions. If the first character of the file name provided in an openout() call is '|', then the file channel opened is actually a pipe to the process detailed by the rest of the filename string. Similarly, if the last character of the file name provided in an openin() call is '|', then the file channel opened is actually a pipe from the process detailed in the file name string.

The following piece of code might be used to plot part of a waveform (but see built-in plotting functions of SML in section 5.11):

/* wave - demonstrate use of pipes */
file wavein,plotout
main {
  var count
  string samp
  openin(wavein,"splist -isp. "++$filename++" |")
  openout(plotout,"| plot -g | dig")
  inputline # wavein samp
  while (compare(samp,samp)) {
    if (count < 1000) print # plotout samp,"\n"
    count=count+1
    inputline # wavein samp
  }
  close(wavein)
  close(plotout)
}

The program splist) prints out the sample values from SP, LX, TX and TR types. The programs plot(1G) and dig produce an x-y graph when supplied with a list of y values.

The only manipulation of file variables that SML allows is assignment:

<file variable> = <file channel>

for example:

file op
init {
  print "print to screen ?"
  input ans
  if (index("^[Yy]",ans)) op=stdout else openout(op,"listfile")
}

System calls

SML provides a facility for executing other programs as execution proceeds. This can provide a convenient mechanism for running speech processing programs on SFS files while they are being analysed.

The function system() takes a single string as argument, and passes the string to a shell for execution. The function returns a normal arithmetic value if the shell exits with zero status, and ERROR otherwise. The following piece of code generates a TX item from an LX item if no TX is present in the file:

if (select(TX)) {
  print "TX found\n"
}
else if (!select(LX)) {
  print "No LX or TX found\n"
}
else if (system("tx "++$filename)) {
  print "TX calculated\n"
}
else {
  print "unable to calculate TX\n"
}

Since the system() call may change the contents of the data file being processed, the use of it has an important side effect: all item selections are cancelled. Thus if the program is measuring a data set other than the default set when a call to system() is made, that data set must be reselected after the call. That is you must manually call the 'select' functions after a call to system().

5.10 User Functions

We have seen that SML programs are built around the three procedures init, main, and summary using calls to a large number of built-in functions. In this section we shall see how the programmer can write new functions that can be called from within the three procedures, these are user-defined functions.

Function Declarations

The text of a new function must be placed in the source of an SML program before the text of any procedure, and before the text of any function that calls it. That is, the definition of a function must precede any call to it.

The general format of a function declaration is:

function <type> <function name> ( <dummy argument list> )
<address-type dummy argument declarations>
<static variable declarations>
{
  <value-type dummy argument declarations>
  <local variable declarations>

  <function text>
}

The components of the format are as follows:

<type>
    type of value returned by function, may be var or string only.
<function name>
    name given to function.
<dummy argument list>
    list of named parameters required by function.
<address-type dummy argument declarations>
    declarations of variables passed to the function
    that are used to return values to the calling routine.
<value-type dummy argument declarations>
    declarations of variables passed to the function
    that are copies of values provided by the calling function.
<static variable declarations>
    declarations of variables local to the function
    that retain their value from one call to the next.
<local variable declarations>
    declarations of variables local to the function
    that are initialised on each call.
<function text>
    sequence of statements making up the processing
    provided by the function.

We shall explain these components with the help of a sequence of example functions:

/* function to return square of a number */
function var square(val)
{
  var val
  val = val * val
  return(val)
}

This function 'square', takes a single var argument and returns the square of the value. The type of the function is var, the name of the function is square, the value-type dummy argument is val, and the statement return causes the function to terminate at that point, returning the supplied value to the calling routine. Thus this function might be called with:

for (i=0;i<10;i=i+1) print i,square(i),"\n"

or even

print square(1.414),"\n"

The following function takes three arguments and checks to see whether the second is outside a range delimited by the first and the third. If the value is outside the range, the supplied value is modified and returned:

/* function to check number range */
function var range(lo,val,hi)
var val
{
  var lo,hi
  if (val < lo) val = lo
  if (val > hi) val = hi
}

This 'function' does not use its return value at all (in fact a default value is returned). Instead it modifies the value of one of the arguments presented to it. We might call it with:

for (t=0;t<1;t=t+0.01) {
  fxtable[i]=fx(t)
  range(40,fxtable[i],800)
}

Since changes to values passed to the 'range()' function should have effect outside the function (unlike val in the first example), the interpreter must be informed that the variable val should be passed by address and not by value. This is done through the use of the address-type dummy variable declaration. This implies that function arguments declared as address-type must be provided using a variable and not an expression. Thus for the function range, the following call is an error:

range(40,fxtable[i]+100,800)

The most common use for address-type arguments is for arrays, where it is very inefficient (but not prohibited) to copy all the values of an array at a function call. The following function returns the sum of all values in any array:

/* function to sum array */
function var total(array)
var array[]
{
  var i,lo,hi,tot
  lo = lobound(array)
  hi = hibound(array)
  for (i=lo;i<=hi;i=i+1) tot = tot + array[i]
  return(tot)
}

Note that declaration of dummy arguments indicating arrays use '[]' to show that an array of unknown size must be supplied in the call. The variables i,lo,hi and tot are local variables to the function.

SML functions support recursion, that is functions are able to call themselves:

/* factorial function: factorial(n) = n*(n-1)*(n-2)*..*2*1 */
function var factorial(val)
{
  var val
  if (val <= 1) return(1) else return(val * factorial(val-1))
}

Static local variables may be used to keep a record of values from one call of the function to the next:

/* function to return pseudo-random letter sequence */
function string randomletter()
var hold
{
  var letter
  hold = (1013 * (hold + 997)) % 1000
  letter = 1 + hold % 26
  return("abcdefghijklmnopqrstuvwxyz":letter:letter)
}

Library Functions

Clearly, some functions may be useful in more than one program, and SML provides a mechanism by which a function declaration may be shared across a number of programs. The library statement indicates that the text of a function is not stored in the current text file but may be found in a file in one of a standard list of directories (similar to the #include statement in 'C'). For the statement:

library func

the SML interpreter attempts to locate the following files in the following order:

./func.s
~/sml/func.s
/usr/lib/sml/func.s

Text for the function is then read from this file before compilation continues with the main source file. More than one function can be declared in a library file, and library files may be nested.

5.11 Graph Plotting

SML provides some basic graph-plotting functions for producing x-y plots, scatterplots, bar charts and histograms from values held in program arrays. These graphs are produced using the Device Independent Graphics library (as used by the SFS utilities Ds and dig amongst others) so that graphs can be displayed on most terminals, stored in files, edited and printed.

The function that actually performs the plotting is called plot():

plot(<file channel>,<graph number>,<y data>,<y count>)

This function takes an array of values y data containing a number of points to be plotted y count, and plots a graph on the file, device or pipe indicated by file channel. Each output screen can be divided into a number of areas, numbered left-to-right, top-to-bottom and selected by graph number.

Thus the following piece of code plots the simplest x-y graph:

var i,y[1:100]
for (i=1;i<=100;i=i+1) y[i]=exp(i/100)
plot(stdout,1,y,100)

All of the other functions associated with graph plotting simply affect the detail or the format of the graph produced by plot().

The function plotparam() provides control over a number of aspects of the graph produced by plot(). It must be called prior to plot() and its effects remain until a subsequent call to plotparam(). The function takes a string in the form "<parameter>=<value>", for example:

plotparam("box=no")

(which prevents the graph being drawn within an outline box). Other possible parameters and values are:

"vertical=<num>" number of graphs vertically

"horizontal=<num>" number of graphs horizontally

"type=<point/line/bar/hist>" type of graph:

point = scattergram
line = x-y plot
bar = bar chart
hist = histogram

"char=<character>" mark every point with character

"npoint=<num>" mark every <num>th point

"box=<yes/no>" plot graph in a box

"xzero=<yes/no>" x axis extends to zero

"yzero=<yes/no>" y axis extends to zero

"equal=<yes/no>" makes axes equal

"xpos=<top/bottom>" x axis drawn at top/bottom of graph

"ypos=<left/right>" y axis drawn at left/right of graph

"xscale=<yes/no>" draw x axis scale

"yscale=<yes/no>" draw y axis scale

"xtitle=<string>" specify x axis label

"xlabel=<top/bottom>" x label drawn at top/bottom of graph

"ytitle=<string>" specify y axis label

"ylabel=<left/right>" y label drawn at left/right of graph

"title=<string>" specify graph title

There are also the following graph plotting functions:

plotxdata(xdata,flag): specifies x co-ordinates for each of the values in the ydata array. If flag is zero, xdata contains an entry for every point in ydata. If flag is 1, xdata[1] = smallest-x, xdata[2] = largest-x. If flag is 2, xdata[1] = smallest-x, xdata[2] = x increment. The default setting is: flag = 2, xdata[1] = 0, xdata[2] = 1.
plotaxes(file,graphno,xmin,xmax,ymin,ymax): draws a specified set of axes prior to drawing any graph (normally axes are calculated automatically by plot).
plottitle(file,string): adds an overall title to a page of graphs.
plotclear(file): clears the screen, or starts a new set of graphs.

In the following example, SFS files containing some formants and some spectral coefficients are used to generate a graph showing a cross-sectional spectrum at a given time and also the location of the formants.

/* graph - SML program to draw spectrum and formant positions */
/* input files need CO item (from e.g. spectran -c)
                and SY item (from e.g. fmanal & fmtrack) */
/* graphics output channel */
file gop
/* function to plot spectrum cross section */
function var plotcross(t)
{
  var t
  var f,xdata[0:100],ydata[0:100],ynum
  for (f=50;f<=5000;f=f+50) {
    xdata[ynum]=f
    ydata[ynum]=energy(f,t)
    if (ydata[ynum]) ynum=ynum+1
  }
  plotparam("type=line")
  plotparam("char=")
  plotxdata(xdata,0)
  plot(gop,1,ydata,ynum)
}
/* function to plot positions of formants */
function var plotformants(t)
{
  var t
  var i,xdata[0:4],ydata[0:4],ynum
  for (i=1;i<=4;i=i+1) {
    xdata[ynum] = sy(3*i,t)/* formant frequencies */
    ydata[ynum] = sy(3*i+1,t)/10/* formant amplitudes */
    if (xdata[ynum]) ynum=ynum+1
  }
  plotparam("type=point")
  plotparam("char=F")
  plotxdata(xdata,0)
  plot(gop,1,ydata,ynum)
}
/* initialisation */
init {
  string ans
  /* where to send graphs */
  print "Send graphs to printer ? (Y/N) "
  input ans
  if (index("^[yY]",ans)) openout(gop,"|dig -p") else gop=stdout
  /* init basic parameters of graph */
  plotparam("xtitle=Frequency (Hz)")
  plotparam("ytitle=Amplitude (dB)")
}
/* main processing */
main {
  if ($filecount > 1) plotclear(gop)
  plottitle(gop,$filename++": spectrum + formants")
  plotaxes(gop,1,0,5000,0,60)
  plotcross(time("marker"))
  plotformants(time("marker"))
}

5.12 Scripted Access to SFS Data Sets

From version 4, SML supports a scripting interface to individual SFS data sets in files. The functions allow whole data sets to be loaded and analysed, to create new data sets, and to write data sets to SFS files. Whole data sets are referenced through a new type of variable called an 'item' variable. These variables must be declared with global scope (similar to file variables), for example

item spitem
item fxitem

Loading and Saving Items

The following functions are available to load, save and create new items:

sfsgetitem(item,filename,itemstring)

This function reads an item specified by 'itemstring' in file 'filename' into item variable 'item'. For example:

sfsgetitem(spitem,"myfile.sfs","1.01")

It returns 0 on success, and ERROR on error.

sfsdupitem(item1,item2)

This function makes a copy of item stored in variable 'item2' into variable 'item1'. A byproduct of the copy is that the history string is reset to refer to the current script. This allows the item to be saved directly back into the file without overwriting the original data set. Returns 0 on success, ERROR on error.

sfsnewitem(item,datatype,frameduration,offset,framesize,numframes)

This function creates a new empty item in variable 'item' of type 'datatype' (SP, LX, TX, FX, etc), with the time interval associated with each frame set to 'frameduration' and the overall time offset of the item set to 'offset', each frame is made up of 'framesize' basic elements, and room should be reserved for 'numframes' frames of data. Although 'numframes' cannot be dynamically expanded, it is not necessary for all of the frames allocated by sfsnewitem() to be written to a file with sfsputitem(). The function sets the history to a default value based on the name of the script and the type of the output item. For example:

sfsnewitem(fxitem,FX,0.01,0.0,1,1000)

Returns 0 on success, ERROR on error.

sfsputitem(item,filename,numframes)

Stores the first 'numframes' frames of data in the data set referred to by 'item' into the file 'filename'. Take care that when saving a data set to an SFS file, any existing data set with the same history string is replaced. Returns 0 on success, ERROR on error.

Reading/Writing Item Headers

The following functions provide access to the data set parameters stored in the item header:

sfsgetparam(item,param): Gets the value of a numerical parameter with name 'param' from the data set header referred to by 'item'. Available parameters are: "numframes", "frameduration", "offset", "framesize", and "itemno". Returns value of parameter or ERROR.
sfsgetparamstring(item,param): Gets the value of a string parameter with name 'param' from the data set header referred to by 'item'. Available parameters are: "history", "params", "processdate", and "itemno". Returns string value of parameter or ERROR.
sfssetparamstring(item,param,value): Sets the value of a string parameter with name 'param' from the data set header referred to by 'item'. Available parameters are: "history" and "params". Returns ERROR on invalid parameters.

Reading/Writing Data Values

The following functions provide access to data values stored in the items. SFS items are made up from a sequence of frames, where each frame can hold a vector of values. For some data types, each frame also has a number of other descriptive fields. See the Programmers Manual for more detail.

sfsgetdata(item,frameno,index): Returns a value from the data set referred to by 'item'. The value is taken at offset 'index' in frame number 'frameno' . Frame numbers range from 0 to sfsgetparam(item,"numframes")-1. Index numbers range from 0 to sfsgetparam(item,"framesize")-1. Returns value from data set or ERROR.
sfsgetstring(item,frameno): Returns a string value from frame 'frameno' of the data set referred to by 'item'. Frame numbers range from 0 to sfsgetparam(item,"numframes")-1. Use this to load annotation labels. Returns the string value or the ERROR string.
sfsgetfield(item.frameno,field): Returns a value from the frame header for structured data types. The frame is referred to by number 'frameno', and the field is referred to by number 'field'. For example for CO data sets, field0 is the frame position, field1 is the frame length, field2 is the voicing flag, field3 is the voicing mixture, field4 is the frame gain. Returns a value from a frame header or ERROR.
sfsgetarray(item,start,count,array): Loads a section of any 1-dimensional item into an array. Data is copied from the waveform or track referred to by 'item' starting at offset 'start' for 'count' samples into array 'array'. The destination array should be big enough to take the number of samples, otherwise only enough samples to fill the array are transferred. Returns the number of values copied or ERROR.
sfssetdata(item,frameno,index,value): Stores a particular numerical expression 'value' into a data set referred to by 'item' at frame number 'frameno' at frame offset 'index'. See sfsgetdata() for explanataion of frame and index numbering. Returns value stored or ERROR.
sfssetfield(item,frameno,field,value): Stores a particular numerical expression 'value' into the frame header of a frame number 'frameno' of data set 'item' at field position 'field'. See sfsgetfield() for details about field numbering. Returns value stored or ERROR.
sfssetstring(item,frameno,string): Stores a string expression into frame 'frameno' of the data set referred to by 'item'. Use this function to set annotation labels. Returns 0 on success, or ERROR.

Processing Data Sets through SFS programs

The following function allows items to be processed by named SFS programs. To run a program, be sure to add the SFS/Program directory to the executables search PATH.

sfsprocessitem(item1,progname,item2,rettype)

Processes the data set referred to by 'item2' using the program and arguments in 'progname' and optionally loads a resultant data set of type 'rettype' into output item variable 'item1'. This function first saves item2 to a temporary file and runs the specified program on it. If 'rettype' is not an empty string then it is used to select the item to be loaded back in to item1. For example, to filter a speech signal in item ipitem to item opitem: use:

sfsprocessitem(opitem,"genfilt -l1000",ipitem,"sp")

Example Script

Here is a complete example which calculates a spectral representation from a speech item and stores the result in a coefficients item.

/* spblock - example of block processing of speech */

item sp;		/* input speech item */
item co;		/* output spectral coefficients */
var window[0:10000];	/* input window */
var mag[0:10000];	/* spectral magnitudes */
var phase[0:10000];	/* spectral phases */

main {
	var	numf;
	var	fsize;
	var	fdur;
	var	i,j,f;
	var	xsize,cnt;

	/* load speech item from current file */
	sfsgetitem(sp,$filename,"sp.");

	/* get processing parameters */
	numf = sfsgetparam(sp,"numframes");
	fdur = sfsgetparam(sp,"frameduration");
	fsize = 0.025/fdur;	/* 25ms window */
	xsize = 16;		/* FFT size */
	while (xsize < fsize) xsize = xsize*2;
	xsize = xsize/2;

	/* make up a coefficients item */
	sfsnewitem(co,CO,fdur,0,xsize,1+2*numf/fsize);

	/* process in blocks */
	f=0;
	for (i=0;(i+fsize)<numf;i=i+fsize/2) {
		/* get window */
		sfsgetarray(sp,i,fsize,window);

		/* perform FFT */
		cnt=fft(window,fsize,mag,phase);
		if (cnt!=xsize) abort("size error");

		/* store in frame */
		sfssetfield(co,f,0,i);
		sfssetfield(co,f,1,fsize);
		sfssetfield(co,f,2,0);
		sfssetfield(co,f,3,0);
		sfssetfield(co,f,4,0);
		for (j=0;j<xsize;j=j+1) {
			sfssetdata(co,f,j,20*log10(mag[j]));
		}
		f=f+1;
	}

	/* save spectral coefficients back to file */
	sfsputitem(co,$filename,f);
}

5.13 Miscellaneous

Clear

The clear() statement re-initialises to zero any variable or array. It is the only way to reset the contents of a stat variable.

Stopwatch

The built-in stopwatch() function allows timing of processing. When stopwatch() is called with the argument 0.0 an internal timer is reset to zero. When stopwatch() is called with a non-zero argument it returns the time in seconds since it was reset.

Fourier Transform

The built-in fft() function transforms a real vector into two arrays representing the magnitudes and phases of its discrete fourier transform.

The syntax of the call is fft(inarray,length,magarray,phasearray) where inarray is a section of waveform of length length and magarray and phasearray are arrays to hold the returned spectral component magnitudes and phases. The function returns the number of spectral components represented in the output arrays. This number is always an integer power of two less than or equal to length. See example in section 5.12.

Debugging

SML provides a trace facility to aid in the debugging of programs. When the statements traceon and traceoff are included in a program, all execution of statements between the two are listed to the terminal. The listing comprises the line numbers and source lines of the program, and after each statement the result of any arithmetic evaluation that took place.

Tracing can be enabled for the whole program by using the sml switch '-t'.

File searching

When the SML interpreter is executing a program, it executes the main procedure once for each SFS file found on the command line. If a file identified on the command line is not an SFS file, the interpreter reports a warning and carries on with the next file. The reporting of such warnings can be suppressed with the '-n' flag to sml. If an entry on the command line is found to be a directory name, the interpreter recursively searches the directory for files or sub-directories.