SML

NAME

sml - speech measurement language interpreter

SYNOPSIS

sml (-I) (-s|-S) (-c|-C) (-t|-T) (-r) (-f) (-i item) (-d dic) program (datafile(s))

DESCRIPTION

sml is the interpreter for a programming language called SML which is described below. sml takes the source of a SML program, compiles it to an intermediate code and executes it against a given list of database files (which may be empty). The name of a directory containing datafiles may be used instead of a datafile, in which case the contents of the directory are (recursively) extracted.

Options and their meanings are:

-I Identify interpreter and exit.

-s|-S Dump symbol table after compilation. "-s"=part, "-S"=full.

-c|-C Dump intermediate code after compilation. "-c" dumps source and code, "-C" dumps code only.

-t|-T Trace mode. Detail program execution. "-t" traces execution of source. "-T" traces execution of source and intermediate code.

-r Report run-time statistics after execution of program.

-f Disable file check reporting. File checking is still carried out, but the names of non-datafiles are not reported.

-i item Select input item numbers. These become initial defaults, they may be overridden by "select", see below.

-d dic Use file dic instead of the default phonetic dictionary

LEXICAL ITEMS

Lexical items in SML are reserved words, functions, variables, numbers, strings or punctuation. Reserved words are those used in the syntax of the language, described below. Functions are either user-defined or built in to the language and described in two sections below. Variables and function names are sequences of alphanumeric characters of any length, with a leading alphabetic character. Case may be used to distinguish variables and function names. Built-in names may be used as variables, but access to the function is lost. Numbers may be represented as a digit string in standard integer, floating point or exponential format (e.g. -2, 0.0023, 1.223E-1). Strings are sequences of printable characters enclosed in double quotes. The usual conventions for "escaped" sequences may be used to obtain newline, tab, double quote, etc within a string.

OPERATORS

Arithmetic operators are: "+", "-", "*", "/" and "%" (modulus) with usual precedence.

Boolean operators are "==", "!=", "<", "<=", ">=", "&&", "||" and "!".

String operators are "++" (concatenate) , ">>" (special concatenate) and for editing strings: <string>:<width>, and <string>:<start>:<end>. Thus:

     "abc" ++ "def" -> "abcdef"
     "abc" >> "def" -> "abc>>def"
     "abcdef":2     -> "ab"
     "abcdef":8     -> "abcdef  "
     "abcdef":-2    -> "ef"
     "abcdef":-8    -> "  abcdef"
     "abcdef":2:4   -> "bcd"

Parentheses may be used to change order of arithmetic, boolean or string operator evaluation.

EXPRESSION EVALUATION

Every simple variable or arithmetic expression evaluates to a floating point number or to a special ERROR value. ERROR values propagate through expressions, so that an entire expression evaluates to ERROR if any of its components evaluates to ERROR. Arithmetic functions return ERROR when they cannot satisfy a given request (e.g. sqrt(-1)). Arithmetic functions can be used as statements, in which case they cause a run-time error if they return ERROR.

Conditional expressions return one of three values: TRUE, FALSE or ERROR. A conditional expression evaluates to ERROR if any of its components evaluates to error. An arithmetic expression may be used as a conditional expression, in which case it evaluates to FALSE if the arithmetic expression evaluates to ERROR, and TRUE otherwise.

String functions return the string "E=R=R=O=R" in case of a fault. Functions that take string arguments and return numeric values return ERROR if suppied with the error string.

PROCEDURAL STRUCTURE

A SML program consists of up to three procedures, the first called "init", the second "main" and the third "summary". A typical schema is:

		Global_declarations
		Function_declarations
		Global_declarations
		init {
			Local_declarations
			Statements
		}
		Global_declarations
		main {
			Local_declarations
			Statements
		}
		Global_declarations
		summary {
			Local_declarations
			Statements
		}

The statements in the "init" procedure are executed once, before any database file is accessed. The statements in the "main" procedure are executed once per database file on the command line (zero times if no files given). The statements in "summary" are executed once, after all the database files have been accessed. Variable declarations may be made outside the procedures, in which case they are available to subsequent procedures, or they may be made within a procedure, in which case they are only available for use within that procedure. Before any of the three procedures, functions may be defined. Functions accept parameters by value or by address and may utilise both static and automatic storage for local variables.

VARIABLE DECLARATION

There are four kinds of variables: var or simple variables, stat or statistics variables, string or character string variables, and file or file i/o variables. Statistics variables have six "internal" fields that are automatically updated during use, see list below. These internal values are accessed by <variable name>.<field name>, but may not be directly assigned to. Variables and arrays of variables are declared as in the following example:

		var incount,targets[1:10],vals[1000],table[5,10]
		stat tardist,dists[1:100],stab[100],stable[10,20]
		string sys,names[5:20],addr[12],nametab[100,2]
		file op

Array declarations of form [lo:hi] set the low and high array bounds, Declarations of the form [num] is the same as [1:num]. Declarations of the form [nrow,ncol] create a 2D array with 1:nrow rows, and 1:ncol columns. Elements of 2D arrays are accessed as table[r,c], however 2D arrays can also be accessed with a one dimensional index between 1:nrow*ncol. The upper and lower bounds of arrays are fixed. Arrays of file and local file variables are not allowed. File variables must be associated with external files or pipelines using the functions "openin", "openout" or "openappend" before they may be used in print or input statements. Functions may only return var or string variables. All variables are automatically initialised to zero or "".

STATEMENTS

Assignment statements are of the form:

		var_variable    =  expression
		stat_variable   += expression
		string_variable =  string_expression

In the first case, the expression is evaluated and assigned to the variable on the left-hand-side. In the second case, the internal fields of the statistics variable are updated with the value of the expression (no change if evaluates to ERROR). In the third case, the string expression replaces the existing contents of the string variable.

Conditional statements are of the form:


		if (conditional_expression) statement
    or
		if (conditional_expression) statement else statement

Where "statement" may be a compound statement (statements in braces). In the first case, the statement is executed if the conditional expression evaluates to TRUE. In the second case, the first statement is executed if the conditional expression evaluates to TRUE, the second if it evaluates to FALSE, and neither if it evaluates to ERROR.

While statements are of the form:


		while (conditional_expression) statement

Where "statement" is executed for as long as the conditional expression evaluates to TRUE.

For statements are of the form:


		for (statement-A;conditional_expression;statement-B) statement-C

This is equivalent to

		statement-A
		while (conditional_expression) {
			statement-C
			statement-B
		}

Switch statements are of the form:


		switch (expression) {
			case_matches
		}

where "expression" can be arithmetic or string, and where "case matches" are from:

		case NUMBER : statement
    or
		case STRING : statement
    or
		pattern STRING : statement
    or
		range NUMBER : NUMBER : statement
    or
		default : statement

Matches are always performed in the order specified - thus the default case must always come last. The "pattern" type matches with a regular expression. The "range" type matches with a numeric range (inclusive lower bound, exclusive upper bound). The "statement" block is executed for the first match found.

With statements are of the form:


		with (pattern_string) statement
    or
		withn (pattern_string , count) statement
    or
		within (start_time , end_time) statement

These are procedural devices that restrict the timerange of pattern matching functions (described below) within "statement". The first type restricts matching to the time domain of the annotation(s) matching a given pattern. The second type restricts matching to the time domain of the count annotation match. The third type restricts matching to within two given times. See description of pattern matching below. If the time range evaluates to ERROR then 'statement' is not executed.

There is a loop "break" statement of the form


		break

which if executed within a "for" or "while" loop immediately exits to the next most nested level. If not executed in a loop construct, "break" terminates the program (inside 'init' and 'summary') or skips the current data file (inside 'main').

Print statements are of the form:


		print print_list
    or
		print # file_variable print_list

The first type sends output to the standard output. "print_list" is a sequence of expressions and strings separated by commas. Printable items have a default field width which may be changed using:

		<expression> : <field_width>
    or
		<expression> : <field_width> : <fraction_length>
    or
		<string> : <field_width>
    or
		<string> : <start> : <end>

For strings, the field width manipulations are really string editing operations.

Input statements are of the form:


		input input_list
    or
		input # file_variable input_list
    or
		inputline string_variable
    or
		inputline # file_variable string_variable

Where "input_list" is a sequence of var variables separated by commas or a single string variable. Numbers are terminated in the input by whitespace, strings by [RETURN].

There are two built in procedures:

"\fIclear\fR(variable)" resets variables or arrays to zero, empty or null, as appropriate.

"\fIabort\fR(string_expression)" halts program execution with given message.

For debugging purposes there are the statements traceon and traceoff which enable source-line tracing of program execution of the bracketed lines.

USER-DEFINED FUNCTIONS

Users may define functions at the beginning of each program (before init). The structure of a function definition is:

		function type name (parameters)
		address_parameters
		static_local_variables
		{
			value_parameters
			automatic_local_variables
			statements
		}

Where "type" is one of var or string, and "name" is the unique name given to the function. The "parameters" are a list of dummy identifiers separated by commas. The type and access method of the parameters is given in the sections "address parameters" and "value parameters". The first section identifies the parameters as access by address (i.e. changes by the function have effect outside the function). The second section identifies the parameters as access by value (i.e. they are copies of the contents of the calling parameters). The parameters are also allocated types in these sections. Arrays are indicated by placing "[]" after the array name. File variables are always accessed by address. Static local variables are variables that may only be accessed from within the function but which retain their value from one call of the function to the next. Automatic local variables are variables which are instantiated once per call of the function - even if the function calls itself. Thus a complete function declaration might look like:

		function var anal(ar,lo,hi,mn,dv)
		var ar[],mn,dv
		{
			var lo,hi,i
			stat st
			for (i=lo;i<=hi;i=i+1) st += ar[i]
			mn = st.mean
			dv = st.stddev
			return(st.count)
		}

The return statement returns the bracketed expression to the calling function and terminates execution of the subroutine. The implementation of functions incorporates type and access method checking and full recursion.

Functions may also be stored in external files and included into the source being compiled. The mechanism also provides for 'library' routines that will be supplied for general use in a central repository. External functions are declared using:


		library <function_name>

     e.g.

		library anovar

The interpreter attempts to open a file containing the sources of these functions using the following strategy: for a function called <func>, it attempts to open files "./<func>.s", "~/sml/<func>.s" and "/usr/lib/sml/<func>.s" in that order. SML source is then read from these files instead of from the main program source file. Library files may be nested up to 5 deep.

BUILT IN VARIABLES

"system variables" $filecount Number of the file currently being examined. $filename Name of the file currently being examined. $date String variable holding current date and time.

"main header strings" $head_username owner of current file $head_source source of file $head_dbase name of database $head_speaker name of speaker $head_session description of session $head_sessdate date of session $head_token name of token $head_rep repetition of token

"field names of STAT variables" count number of assignments sum sum of assignments sumsq sum of squared assignments mean mean of assignments variance variance of assignments stddev standard deviation of assignments

"file variables" stdin standard input stdout standard output stderr standard error

"built in constants" SP, LX, TX, FX, AN, LP, SY, DI, CO, FM, PC, TR

ERROR

BUILT IN ARITHMETIC FUNCTIONS

ANNOTATION

"length(pattern)" returns length of pattern match to annotation(s) in seconds or ERROR.

"lengthn(pattern, n)" returns length of nth pattern match to annotation(s) in seconds or ERROR.

"numberof(pattern)" returns number of pattern matches to annotation(s).

"time(pattern)" returns time at which pattern matches annotation(s) in seconds or ERROR.

"timen(pattern, n)" returns time of nth pattern match to annotation(s) in seconds or ERROR.

ARITHMETIC

"abs(expression)" returns absolute value of expression.

"acos(expression)" returns arc-cosine in radians of expression.

"asin(expression)" returns arc-sine in radians of expression.

"atan(expression)" returns arc-tangent in radians of expression.

"atan2(expression,expression)" returns arc-tangent in radians of ratio of two expressions.

"cos(angle)" returns cosine of angle in radians.

"exp(expression)" returns exponential of expression.

"fft(inarray,length,magarray,phasearray)" calculates the discrete fourier transform of the vector in inarray of given length, returning the magnitudes and phases in the supplied arrays. Returns the number of magnitudes and phases stored, which is always a power of two.

"log(expression)" returns natural logarithm of expression.

"log10(expression)" returns logarithm base 10 of expression.

"hibound(array_name)" returns upper bound of array.

"lobound(array_name)" returns lower bound of array.

"log(expression)" returns natural logarithm of expression.

"pow(base,exponent)" returns the value of base taken to the power exponent.

"random(expression)" returns a random number between zero and the value of the expression (always less than expression).

"sin(angle)" returns sine of angle in radians.

"sqrt(expression)" returns square root.

"srandom(expression)" resets the random number generator with the supplied seed value. A value of 0 causes the current time to be used as a seed.

"tan(angle)" returns tangent of angle in radians.

"trunc(expression)" returns integer part of expression.

FILES

"close(file_variable)" closes file variable. Returns ERROR on failure.

"openappend(file_variable, filename)" opens given file on given file variable for appending by printing. Returns ERROR on failure.

"openin(file_variable, filename)" opens given filename or pipeline on given file variable for input. Pipelines are distinguished from files by ending with "|". Returns ERROR on failure.

"openout(file_variable, filename)" opens given filename or pipeline on given file variable for printing. Pipelines are distinguished from files by beginning with "|". Returns ERROR on failure.

ITEMS

"next(item,time)" returns the time of the next "event" in the item after the time specified, or ERROR. Can be used to determine the sampling interval for fixed-frame items, or individual frame lengths for variable-frame items. Called with a negative time, will return the time of the first frame in the item.

"select(item)" selects the designated item as input for subsequent operations. Returns the number of frames in the selected item or ERROR. This routine does not override command line item selections if item number is given as type only.

"selectitem(item)" selects the designated item in the same way as "select()", but returns the selected item number for the data set rather than the number of frames.

"selectmatch(itemspec)" selects the first item that matches the specification supplied and returns the item number or ERROR. The item specification is a string expression in the format allowed by itspec(SFS3). Thus a call of selectmatch("TX") will return the number of the last TX item. Overrides command line item selections.

"selectnext(item)" selects the next item of the same type as input for subsequent operations. Returns the item number or ERROR if no more items found. A call of selectnext(TX) will return the number of the first TX item (e.g. 3.01). Overrides command line item selections.

MEASUREMENT

"co(parameter,time)" returns the value of the requested parameter number, at given time from CO item. Parameters are numbered sequentially from 0, with 0=position, 1=size, 2=flag, 3=mix, 4=gain. Parameters 5 and onward are the data values.

"energy(frequency,time)" returns the energy in dB from a coefficient item at the specified frequency and time.

"f1(time), a1(time), b1(time)" returns frequency, amplitude and bandwidth of formant 1 from SY item at given time.

"f2(time), a2(time), b2(time)" returns frequency, amplitude and bandwidth of formant 2 from SY item at given time.

"f3(time), a3(time), b3(time)" returns frequency, amplitude and bandwidth of formant 3 from SY item at given time.

"f4(time), a4(time), b4(time)" returns frequency, amplitude and bandwidth of formant 4 from SY item at given time.

"fm(entry, time)" returns the value of the requested parameter number, at given time from FM item. Parameters are numbered sequentially from 0, with 0=position, 1=size, 2=flag, 3=gain, 4=npeaks. Parameters 5 and onward are the data values. These are stored in triplets of frequency, amplitude and bandwidth.

"fx(time)" returns fundamental frequency in Hertz at given time from FX item.

"lp(attribute_name, time)" returns value of given attribute at given time from LP item.

"pc(parameter,time)" returns the value of the requested parameter number, at given time from PC item. Parameters are numbered sequentially from 0, with 0=position, 1=size, 2=flag, 3=mix, 4=gain. Parameters 5 and onward are the data values.

"sp(time)" returns speech waveform value at given time from SP item.

"sy(parameter, time)" returns value of requested parameter number, at given time from SY item. Parameters are numbered sequentially from 0, with 0=FX, 3=F1, 4=A1, 5=B1, 6=F2, etc.

"tr(time)" returns parameter track value at given time from TR item.

"tx(time)" returns excitation period in seconds at given time from TX item.

"vu(parameter,time)" returns the value of the requested parameter number, at given time from VU item. Parameters are numbered sequentially from 0, with 0=position, 1=size, 2=flag, 3=mix, 4=gain.

STRING

"ascii(string_expression)" returns the ASCII value of the first character of the string, or ERROR for the ERROR string.

"compare(string_expression, string_expression)" returns 0 if strings same, returns -1 if first sorts earlier than second, returns 1 if second sorts earlier than first.

"entry(string_expression,string_array)" returns the index of the entry in the string array at which the string expression may be found - or ERROR if not in array.

"index(regular_expression, string_expression)" returns index of match of full regular expression into string or ERROR. The special concatenation operator ">>" may be used as an OR between regular expressions in the call to index.

"strlen(string_expression)" returns the length of the string expression, zero for null strings, and ERROR for the error string.

"val(string_expression)" returns arithmetic value of string or ERROR.

UTILITY

"stopwatch (expression)" if expression is 0.0, then stopwatch is reset. If expression not 0.0, then returns elapsed time since last reset (or system boot time) in seconds.

"system(string_expression)" executes the string expression as if it were a command line.

BUILT IN STRING FUNCTIONS

ANNOTATION

"match(pattern)" returns string holding text of annotation(s) matched by pattern, or ERROR string.

"matchn(pattern, n)" returns string holding text of nth match to annotation(s) by pattern, or ERROR string.

ITEM

"history(itemtype)" returns the history field of the currently selected item of the given type, or ERROR string.

STRING

"istr(expression)" returns string equivalent of integer part of expression, or ERROR string.

"str(expression,field,precision)" returns string expression of real number with given field width and decimal precision, or ERROR string.

"char(expression)" returns a string containing the single character with the ASCII value of the expression, or ERROR string if outside ASCII range.

GRAPH PLOTTING

Facilities for producing simple x-y plots are provided by a number of simple procedures. The graph plotting is implemented using Device-Independent Graphics (see DIG(3)) so that graphs may be piped to printing devices, stored in files and edited. Appropriate graphics commands for the executing terminal are determined automatically. The functions are:

"plot(file,graphno,ydata,ynum)" produces a simple line graph of an array of numbers, where "file" is a file variable determining the output device/channel for the graph - use "stdout" for the current terminal; "graphno" is the number of the current graph, facilities are provided for producing a page containing more than one graph (see below) in which case graphs are numbered left-to-right and top-to-bottom starting at 1; "ydata" is an array of var containing the y axis values, and "ynum" is the number of y values. This routine is all you need to create a graph - the following functions merely alter the default format. To plot more than one line on the same graph simply repeat the graph number. To plot different graphs with the same graph number, call plotclear() between plots.

"plottitle(file,string)" adds an overall title to a page of graphs, where "file" is the graphing output device/channel and "string" is the title.

"plotaxes(file,graphno,xmin,xmax,ymin,ymax)" draws a specific set of axes, to over-ride the default axes calculation of plot(). "file" is the output device/channel, "graphno" is the number of the graph to be plotted, and the other values set the minimum and maximum values for the x and y axes.

"plotclear(file)" clears the screen or selects a new output page for a new set of graphs. "file" is the output device/channel.

"plotxdata(xdata,flag)" specifies the x co-ordinates to be used for plotting the y-data in plot(). There are three possible settings for "flag":

flag = 0, xdata[] contains all the 
          x-coordinates required.
flag = 1, xdata[1] is xmin, 
          xdata[2] is xmax.
flag = 2, xdata[1] is xmin, 
          xdata[2] is stepsize on x axis.
The default setting is: x[1]=0, x[2]=1, flag=2.

"plotparam(string)" provides control over the many optional graph formats. "string" is a setting of a plotting parameter in the format "<parameter>=<value>", for example "box=no" or "horizontal=2". Values are maintained for all future graphs in the program unless explicitly reset. The available parameters are:

vertical=<num>      number of graphs 
                    vertically
horizontal=<num>    number of graphs 
                    horizontally
type=<point/line/hist/histend/histdiv/bar>
                    type of graph
char=<character>    character to plot 
                    points with
box=<yes/no>        plot graph in box
xzero=<yes/no>      x axis goes to zero
yzero=<yes/no>      y axis goes to zero
xpos=<top/bottom>   x axis at top/bottom
ypos=<left/right>   y axis at left/right
xlabel=<top/bottom> x label at top/bottom
ylabel=<left/right> y label at left/right
xscale=<yes/no>     draw x scale
yscale=<yes/no>     draw y scale
xlog=<yes/no>       make x axis on 
                    logarithmic scale
ylog=<yes/no>       make y axis on 
                    logarithmic scale
equal=<yes/no>      make axes equal
xtitle=<string>     specify x axis label
ytitle=<string>     specify y axis label
title=<string>      specify graph title

"plotwait(file)" Waits for a key-press or button-press before continuing. Useful if you want to pause between different plots. Does nothing if output is not directed to the screen. On some systems there is interaction between this and normal keyboard input, so only use one or the other.

"plotend(file)" Closes down graphics after plotting. This will be done automatically at the end of the program anyway, so is only of use for programs that print things after displaying things.

SCRIPTING SUPPORT

From version 4, SML supports a scripting interface to individual SFS data sets in files. The functions allow whole data sets to be loaded and analysed, to create new data sets, and to write data sets to SFS files. Whole data sets are referenced through a new type of variable called an 'item' variable. These variables must be declared with global scope (similar to file variables), for example
	item spitem
	item fxitem

The following functions are available to manipulate item variables:

sfsgetitem(item,filename,itemstring) Reads an item specified by 'itemstring' in file 'filename' into item variable 'item'. For example: sfsgetitem(spitem,"myfile.sfs","1.01"). Returns 0 on success, ERROR on error.

sfsdupitem(item1,item2) Makes a copy of item stored in variable 'item2' into variable 'item1'. A byproduct of the copy is that the history string is reset to refer to the current script. This allows the item to be saved directly back into the file without overwriting the original data set. Returns 0 on success, ERROR on error.

sfsnewitem(item,datatype,frameduration,offset,framesize,numframes) Creates a new empty item in variable 'item' of type 'datatype' (SP, LX, TX, FX, etc), with the time interval associated with each frame set to 'frameduration' and the overall time offset of the item set to 'offset', each frame is made up of 'framesize' basic elements, and room should be reserved for 'numframes' frames of data. Although 'numframes' cannot be dynamically expanded, it is not necessary for all of the frames allocated by sfsnewitem() to be written to a file with sfsputitem(). The function sets the history to a default value based on the name of the script and the type of the output item. For example: sfsnewitem(fxitem,FX,0.01,0.0,1,1000). Returns 0 on success, ERROR on error.

sfsputitem(item,filename,numframes) Stores the first 'numframes' frames of data in the data set referred to by 'item' into the file 'filename'. Take care that when saving a data set to an SFS file, any existing data set with the same history string is replaced. Returns 0 on success, ERROR on error.

sfsgetparam(item,param) Gets the value of a numerical parameter with name 'param' from the data set header referred to by 'item'. Available parameters are: "numframes", "frameduration", "offset", "framesize", and "itemno". Returns value of parameter or ERROR.

sfsgetparamstring(item,param) Gets the value of a string parameter with name 'param' from the data set header referred to by 'item'. Available parameters are: "history", "params", "processdate", and "itemno". Returns string value of parameter or ERROR.

sfsgetdata(item,frameno,index) Returns a value from the data set referred to by 'item'. The value is taken at offset 'index' in frame number 'frameno' . Frame numbers range from 0 to sfsgetparam(item,"numframes")-1. Index numbers range from 0 to sfsgetparam(item,"framesize")-1. Returns value from data set or ERROR.

sfsgetstring(item,frameno) Returns a string value from frame 'frameno' of the data set referred to by 'item'. Frame numbers range from 0 to sfsgetparam(item,"numframes")-1. Use this to load annotation labels. Returns the string value or the ERROR string.

sfsgetfield(item.frameno,field) Returns a value from the frame header for structured data types. The frame is referred to by number 'frameno', and the field is referred to by number 'field'. For example for CO data sets, field0 is the frame position, field1 is the frame length, field2 is the voicing flag, field3 is the voicing mixture, field4 is the frame gain. Returns a value from a frame header or ERROR.

sfsgetarray(item,start,count,array) Loads a section of any 1-dimensional item into an array. Data is copied from the waveform or track referred to by 'item' starting at offset 'start' for 'count' samples into array 'array'. The destination array should be big enough to take the number of samples, otherwise only enough samples to fill the array are transferred. Returns the number of values copied or ERROR.

sfssetdata(item,frameno,index,value) Stores a particular numerical expression 'value' into a data set referred to by 'item' at frame number 'frameno' at frame offset 'index'. See sfsgetdata() for explanataion of frame and index numbering. Returns value stored or ERROR.

sfssetfield(item,frameno,field,value) Stores a particular numerical expression 'value' into the frame header of a frame number 'frameno' of data set 'item' at field position 'field'. See sfsgetfield() for details about field numbering. Returns value stored or ERROR.

sfssetstring(item,frameno,string) Stores a string expression into frame 'frameno' of the data set referred to by 'item'. Use this function to set annotation labels. Returns 0 on success, or ERROR.

sfsprocessitem(item1,progname,item2,rettype) Processes the data set referred to by 'item2' using the program and arguments in 'progname' and optionally loads a resultant data set of type 'rettype' into output item variable 'item1'. This function first saves item2 to a temporary file and runs the specified program on it. If 'rettype' is not an empty string then it is used to select the item to be loaded back in to item1. For example, to filter a speech signal in item ipitem to item opitem: use sfsprocessitem(opitem,"genfilt -l1000",ipitem,"sp").

EXAMPLE SCRIPT

/* spblock - example of block processing of speech */

item sp;		/* input speech item */
item co;		/* output spectral coefficients */
var window[0:10000];	/* input window */
var mag[0:10000];	/* spectral magnitudes */
var phase[0:10000];	/* spectral phases */

main {
	var	numf;
	var	fsize;
	var	fdur;
	var	i,j,f;
	var	xsize,cnt;

	/* load speech item from current file */
	sfsgetitem(sp,$filename,"sp.");

	/* get processing parameters */
	numf = sfsgetparam(sp,"numframes");
	fdur = sfsgetparam(sp,"frameduration");
	fsize = 0.025/fdur;	/* 25ms window */
	xsize = 16;		/* FFT size */			
	while (xsize < fsize) xsize = xsize*2;
	xsize = xsize/2;

	/* make up a coefficients item */
	sfsnewitem(co,CO,fdur,0,xsize,1+2*numf/fsize);

	/* process in blocks */
	f=0;
	for (i=0;(i+fsize)<numf;i=i+fsize/2) {
		/* get window */
		sfsgetarray(sp,i,fsize,window);

		/* perform FFT */
		cnt=fft(window,fsize,mag,phase);
		if (cnt!=xsize) abort("size error");

		/* store in frame */
		sfssetfield(co,f,0,i);
		sfssetfield(co,f,1,fsize);
		sfssetfield(co,f,2,0);
		sfssetfield(co,f,3,0);
		sfssetfield(co,f,4,0);
		for (j=0;j<xsize;j=j+1) {
			sfssetdata(co,f,j,20*log10(mag[j]));
		}
		f=f+1;
	}

	/* save spectral coefficients back to file */	
	sfsputitem(co,$filename,f);
}

PATTERN MATCHING

Pattern matching in SML is performed using "regular expressions" - a string of characters that defines a possible match to a simple string. In functions like "time" and "match" and the procedural statement "with" - patterns are matched against annotations in the current database file. A sequence of annotations may be matched by using the special string concatenation operator ">>" between regular expressions. Thus the pattern "[ptk]" matches any annotation starting with "p", "t" or "k", but the pattern "s">>"p">>"l" matches three annotations which must begin with "s", "p" and "l" respectively. The same effect occurs when ">>" is embedded in the string: "s>>p>>l".

The rules for forming regular expressions are as follows:

"\." matches any single character.

"$" matches the end of the annotation.

"[]" enclose alternative characters (can use "-" to mean "through", as in [a-f] meaning [abcdef]).

"+" exploit previous character match one or more times.

"*" exploit previous character match zero or more times.

"{m}" exploit previous character match m times.

"{m,}" exploit previous character match m or more times.

"{m,u}" exploit previous character match a minimum of m times, but a maximum of u times.

"()" use to group operand of "+", "*" or "{}" operators.

characters apart from the special characters above, these must match the string exactly. To match an annotation containing the special characters, precede them with "\".

By default all matches start at the beginning of the annotation string. NOTE that the function "index" and the case match pattern take a full regular expression that is not so constrained. Use "^" to match to the beginning of the string in "index" or pattern.

EXAMPLE PROGRAM


/* example -- find mean slopes fitted to fx contours */
/* function to do line fitting */
function var fit(f,num)
var f[]
{
	var	num,i
	stat	x,y,xy
	for (i=1;i<=num;i=i+1) {
		x += i
		y += f[i]
		xy += i*f[i]
	}
	return((num*xy.sum-x.sum*y.sum)/(num*x.sumsq-x.sum*x.sum))
}
/* global data */
var	freq[1:1000]	/* fundamental frequency contour */
var	timestep	/* time axis of slope */
string	annotation	/* annotation to match */
stat	slopes		/* statistics of slopes found */
/* initialise */
init {
	timestep=0.01
	annotation="\$NUC" >> "[aeiou]"
}
/* main processing */
main {
	var	i,j,num,end
	if (select(AN) && select(FX)) {
		for (i=1;timen(annotation,i);i=i+1) withn(annotation,i) {
			num=0
			end=time(annotation) + length(annotation)
			for (j=time(".");j<end;j=j+timestep) {
				num=num+1
				freq[num]=fx(j)
			}
			slopes += fit(freq,num)/timestep
		}
	}
}
/* summary processing */
summary {
	print "Number of matches=",slopes.count:1,"\n"
	print "Mean slope= ",slopes.mean:1
	print " +/- ",slopes.stddev:1:1," Hz/s\n"
}

FILES

$SFSBASE/data/phon.dic - default phonetic dictionary

VERSION/AUTHOR

4.2 - Mark Huckvale
Fri Jul 09 14:54:50 2004