SFS Manual for Users
4. SFS Data Sets
In the following sections we shall look in a little more detail at the different item types, describing what information is stored for each type, how types are typically converted to other types, and how instances of a data type may be automatically given a text label that can be more meaningful than the processing history.
Types in detail
In the table below, the standard data format for each item type is described. Component parts of data records named in parentheses, e.g. (posn), refer to labels used by sdump when listing the contents of a data set. More technical descriptions of the data records are given in section 4 of the Programmer's Manual.
This section gives a brief guide to the types of data set interconversion possible using the SFS data types. For most of the major types, the table below gives names to some potential interconversion processes. Example program names are given, but these do not form a definitive list, nor are they a set of recommendations.
There are, of course, many other possible interconversions and other existing programs for the interconversions shown.
Text labels & 'slook'
Each data set is produced by a processing program, and each processing program generates a processing history for each data set it produces. The expanded history of a data set is a construction of all of the processing histories of all the data sets on the processing path. Thus an FX item may have the history:
Item 4.01 fx(3.01)
That is it was generated by the program fx operating upon item number 3.01. Item 3.01 is a TX item and it might have the history:
Item 3.01 HQtx(2.01)
That is it was generated by the program HQtx operating upon item 2.01. This was an LX item which, let us say, had the history:
Item 2.01 inwd(freq=12800)
The expanded histories are formed by substituting the item numbers in these histories by the history string of the item referred to. Thus for the items above the expanded histories are:
Item 3.01 HQtx(inwd(freq=12800)) Item 4.01 fx(HQtx(inwd(freq=12800)))
To examine the expanded histories for items in a given file, use the '-l' switch (for 'long') on the program summary:
% summary -l testfile 1. SPEECH (1.01) 16640 frames from inwd/SP(freq=12820,linked) 2. LX (2.01) 16640 frames from inwd/LX(freq=12820,linked) 3. TX (3.01) 75 frames from tx(inwd/LX(freq=12820,linked); thresh=1,height=4)
The expanded histories give a full account of the processing performed on each data set independently from other items in the file, but they are difficult to read. SFS contains a mechanism called text labelling for generating text descriptions of data sets from the expanded history. These text labels are useful in providing titles for graphs as well as for keeping track of the data sets in a particularly complex file. Text labels are used by Ds and pick to provide simple descriptions of data sets. To list the text labels for the items in a file, use the program slook:
% slook testfile 1. SPEECH (SP.01) 16640 frames of natural speech 2. LX (LX.01) 16640 frames of natural lx 3. TX (TX.01) 75 frames of tx from lx
The text labelling mechanism can be customised by the user to incorporate new programs or to give different amounts of technical information in the label. For example, the text label for item SP.01 in the above example could include the sampling frequency. The rest of this section deals with changing the default set of text labels.
Text Label Customisation
The mapping from expanded history to text descriptions is performed using one or more files of pattern matching information; these are called label files. There is a system label file: $SFSBASE/data/labels which provides simple text descriptions for the most common speech processing programs used with SFS. This file can be supplemented with labels files of your own or your work group which can take precedence over the system label file. These files are searched for text descriptions of expanded histories by SFS routines built in to programs such as Ds and slook.
The format of each line in a label file is a pattern-match string followed by a text-replacement string, optionally followed by a number of item history codes. The fields in the line are separated by ':'. The item history codes are short-hand tags for the pattern-match string that can be used to locate items in the file using the standard '-i item' convention in command lines (see itspec).
The pattern-match strings are constructed in the format described by histmatch, which allows the use of '*' to match zero or more characters, and '?' to match a single character. The patterns '%%', '%0' and '%1' match any substring that contains matching parentheses; the last two constructions allow the matching substrings to be returned for use in the text description (see below). To create a text description from an expanded history, the pattern matches in the label file are tested against the history string in turn until a match is found. If no '%' type matches are found, the text description of the history is simply the text string following the first pattern match. If '%' type pattern matches are used, the substrings located may themselves be searched in the label file to generate text description sub-strings. These operations may be seen in the examples below.
Given the expanded history:
the label file line
tx(*):Tx from Lx:
would generate the text label
Tx from Lx
While the label file line:
tx(*;thresh=%0):Tx with threshold %s:
would generate the text label:
Tx with threshold 4
That is, the matching sub-string of the '%' type match, namely '4' is substituted in the text description at the point indicated by the marker '%s'. A maximum of two substrings may be extracted from the expanded history using '%0' and '%1'. The use of '%%' is indicated by the following example; given the expanded history:
the label file line:
proc1(*;thresh=%0,*):proc1 with threshold %s:
would not be guaranteed to locate the correct match to the pattern '%0' since there are two possible matches. Since the pattern '%%' is guaranteed to match a substring containing matching parentheses, the label file line
proc1(%%;thresh=%0,*):proc1 with threshold %s:
would produce the required text label
proc1 with threshold 2
The recursive matching of substrings is demonstrated in the following example. Given the expanded history:
and the label file lines
inwd/LX(freq=%0):%sHz Lx: scopy(%0):copy of %s:
the text label would be
copy of 12800Hz Lx
The first match is to the second line, matching '%0' with 'inwd/LX(freq=12800)'. This is then matched again to derive the label '12800Hz Lx' which is substituted into the text label for the first match.
The main body of a label file consists of pattern match lines as described above, and detailed in LABELS. However, to speed up access to these files, indexing information must be added to them. This indexing of labels files must be performed after every change to the file. When this is not performed, the label searching routines print a warning such as 'label file out of date'. The indexing program is called prolab and is run simply by prolab filename.
The environment variable SFSLABEL controls the selection of label files used in the matching process. If SFSLABEL is not defined, only the system label file is used. If SFSLABEL is defined, it is taken to be a list of label files separated by ':' in the same manner as PATH and SFSPATH. The label files are searched in the order in the variable, and normally the system label file will appear last in the list. In the following example, a user's own label file is created and used to pre-empt the text label attached by the system label file.
% echo $SFSLABEL 'SFSLABEL' undefined % summary file 1. SPEECH (1.01) 16640 frames from inwd/SP(freq=12820,linked) 2. LX (2.01) 16640 frames from inwd/LX(freq=12820,linked) % slook file 1. SPEECH (SP.01) 16640 frames of natural speech 2. LX (LX.01) 16640 frames of natural lx % cat /usr/mark/.labels inwd/SP(freq=%0,*):natural speech at %sHz: % prolab /usr/mark/.labels % setenv SFSLABEL /usr/mark/.labels:/usr/sfs/data/labels % slook file 1. SPEECH (SP.01) 16640 frames of natural speech at 12800Hz 2. LX (LX.01) 16640 frames of natural lxNext Section
|© 2000 Mark Huckvale University College London|