SFS Manual for Users

2. SFS Files

2.1 File construction

SFS files are distinct from other file types; they are created and processed by SFS utilities and processing programs that have been written using the SFS routine library. Programs written to access SFS files can discriminate between them and other file types.

To the user an SFS file is simply a name in a directory, but internally the files have a well-defined structure that is manipulated by SFS utilities and speech processing programs. An SFS file consists of a Main Header and a sequence of Items. The Main Header records information about the source of the file itself (who created it, where and when, for example) and (optionally) information about the source of the data in the file (what it is, who spoke it, where, when). The Items are individual data sets (a piece of speech, some formants, a fundamental frequency trace, etc) prefixed with a record of the source of the data set - a structure called the Item Header. The Item Header records information about how the data set was created (by which program, using which parameters) and the format of the data set itself (what type of data, how much, etc).

2.2 Main Header

An 'empty' SFS file contains only a main header. Empty SFS files can be created with the header editor hed which allows the user to enter into the main header any particular information about the source of the file that needs to be kept with the file. The contents of the main header can be displayed using the '-h' switch to the program sdump, as seen in section 1.1.

Within the main header there are two types of information: that which is automatically provided by hed, and that which is provided by the user:

Automatically set information:

Header Name and Version: These are internal SFS fields set to keep a track of changes in the layout of database files.

File creation date: Automatically set to the date and time at which the file was created.

File identifier: Automatically set to a unique code, tying the file back to a particular computer at a particular site. This is stored in the file $SFSBASE/dbase.id if this file exists and is writable.

Creator's username: Automatically set to the name of the user that created the file.

Alterable information:

Source of recording: Initially set to the site name, this field should be used to identify the institution where the recording was made: e.g. 'ucl', 'sru', 'npl', etc.

Name of database: Initially set to 'temp', this field should be used to identify recordings that form part of a larger organisation of recordings collected for a specific purpose.

Speaker identifier: Initially set to be the same as the creator's username, this field should contain information identifying the speaker of the token, where this is applicable. Theoretically, it is illegal to hold any information about an individual on a computer system unless the system is registered with the Data Protection Agency. It may be sufficient to use a coding scheme so that individuals cannot be identified from the database alone.

Session code: Where a database has been recorded by the same speaker on different occasions, use the session code field to identify the session. There is no initial value.

Session date: This field can be used to identify different sessions, or simply to specify the date of the recording. The preferred format for dates being: DD-MMM-YY HH:MM, for example '09-NOV-85 11:05'. There is no initial value.

Name of token: This field is used to identify tokens in printouts, displays and statistical analyses of the data. A suitable entry might be a transcription of the token either orthographically or in terms of a machine-readable phonetic alphabet. It might also be used to identify a standard passage, e.g. 'Arthur the rat'. There is no initial value.

Token repetition code: This field should be used to discriminate separate instances of the same token recorded during the same session.

Recording environment: This field is reserved for later exploitation of details about the equipment used in the recording. It is intended that a coding system for microphone, amplifier, tape deck, etc be devised, and that this field will be filled from menu selections.

General comments: Comments about the recording or the token can be entered in this field. Examples might be tape counter readings, or more speaker details, or special characteristics of this token. The initial value is 'default header', this should be changed or deleted when the header is edited.

The contents of the main header fields in an SFS file can be modified with the Header Editor program hed. Executing hed with a file enters a simple form-filling dialogue for modifying fields in the main header. For example:

% hed newfile
new file
SPEECH FILE HEADER EDITOR
Fill in the details below.
Type <details><RETURN> to enter details.
Type <RETURN> to skip question.
Type !<number> to jump to question.
Type <SPACE><RETURN> to delete an answer.
1) source of recording [uc1] : _

The user has then the option of entering a new value for the field and typing [RETURN] or simply accepting the default value (in brackets) by typing [RETURN] only, e.g.

1) source of recording [uc1] : gec 
2) name of database [temp] : _

Replies are checked against maximum lengths allowed. Each question is numbered, and you can jump to a question by typing '!<number>', e.g.

2) name of database [temp] : !1
1) source of recording [gec] : _

To delete an existing entry, type [SPACE] and [RETURN], e.g.

1) source of recording [gec] : __ 
2) name of database [temp] : !1
1) source of recording [] : _

To jump to the end of the questions, type '!!', e.g.

1) source of recording [] : !! 
Is the data correct (y/n/q) ? y

To finish, type 'y' to save changes, 'n' to continue editing, or 'q' to quit, leaving existing header unchanged.

The program hed also operates in a command-line mode in which the contents of a number of fields may be set in an SFS file in a single operation. More information about the operation of hed may be found in the manual page.

2.3 Item numbering

SFS files are made up of a main header - described in the previous section - and a number of items: data sets with headers. In the SFS organisation, each item is allocated a number, which uniquely identifies it within a file. The number has two parts: a major type and a minor subtype. The major type identifies the general contents of the data set: SPEECH, LX, TX, FX, ANNOTATION, etc; the subtype records multiple repetitions of the major type in the file.

Understanding item numbering is crucial to operating SFS utilities and speech processing programs. The operation of a given program is usually specified in terms of the input and output items (e.g. a fundamental frequency estimation program is said to transform a SPEECH item to an FX item). The processing of data is recorded using item numbers. The selection of items for display or processing is performed using item numbers.

The first part of an item number is the major type, expressed either as a number or as a short mnemonic. The most popular item types are described in the table below:

No.	Long name	Short name	Description
1	SPEECH	SP	Speech pressure waveform
2	LX	LX	Laryngograph waveform
3	TX	TX	Pitch period location
4	FX	FX	Fundamental frequency trace
5	ANNOT	AN	Annotations
6	PHONETIC	LP	(Lower-) Phonetic features
7	SYNTH	SY	Synthesizer control data
9	DISPLAY	DI	Grey-level display
11	COEFF	CO	Spectral coefficients
12	FORMANT	FM	Formant estimates
14	LPC	PC	Linear Prediction coefficients
16	TRACK	TR	Parameter tracks

The item sub-type is usually a two-digit number (with a leading zero if required) identifying multiple instances of each major type in a file. Thus the first speech item will have sub-type '01', the second '02', etc.

The format of the item number is thus <major type>.<sub-type>, where the major type is either the type number or the short name for the type. Thus the first speech item in a file can be referred to as either '1.01' or 'SP.01'. The third set of formant estimates will have item number '12.03' or 'FM.03', the tenth set of annotations: '5.10' or 'AN.10'. The short name for a type may be entered in upper or lower case.

All speech processing programs use the convention that input data sets may be selected by the command-line switch '-i<item number>'. Thus for a file containing two speech items, numbered 1.01 and 1.02, the command line:

% replay -i 1.01 file

will replay item number 1.01, while

% replay -i 1.02 file

will replay the second. Other formats of the item number are also acceptable, for example:

% replay -i 1.01 file
% replay -i SP.01 file
% replay -i sp.01 file

are all equivalent. In addition, there are simple item numbers for the last and first item of a given type in the file.

% replay -i 1 file
% replay -i sp file

will replay the last speech item in the file. While:

% replay -i 1. file
% replay -i sp. file

will replay the first speech item.

There are also methods for specifying items by their processing history (see section 2.4) and by item history codes (see section 4.3).

2.4 Reading a 'summary'

Investigation of the contents of an SFS file may be made by the programs sdump and summary. In this section we shall look at the operation of the latter.

The program summary gives a list of the data sets present in a given file, ordered by time of creation, for example:

% summary file
1. SPEECH (1.01) 8960 frames from scopy(file=/usr/sfs/demo/life, 
item=1.02,history=agc(1.01))
2. COEFF -(11.01) 532 frames from spectran(1.01;window=8, overlap=6)
3. DISPLAY (9.01) 532 frames from dicode(11.01;dbr=50.00, nump=128)

Each line in the summary relates to a single item and is composed of 6 fields:

Item index: record of the index of the item in the file. Not to be confused (!) with the item number. Most useful when only some of the items are summarised (see below).

Item type name: text description of major item type.

Deletion flag: the presence of a '-' after the item type name indicates that the data set has been deleted (see section 3.3).

Item number: standard format item number, as 'd.dd' or 'dd.dd', where d=digit.

Data set length: in SFS units called frames. Each frame is associated with an interval of time. Thus a speech waveform has one frame per sample containing a single amplitude value, whereas a spectrogram may have one frame every 5ms or so containing a number of energies.

Processing history: a text record of the processing that created the data set, generated by the program that performed the processing. The history is recorded in a standard format according to the pattern:

<program name> ( <input item number list> ; <parameter list> )

the input item number list is a sequence of item numbers separated by commas, and the parameter list is a sequence of program parameter assignments, separated by commas. If the data set has no antecedents in the file, the item number list is left out - see the first entry in the summary above. If the processing program has no parameters, the parameter list is left out.

Thus a summary line:

3. SPEECH (1.02) 22000 frames from agc(1.01)

records details of the third item in the file. It is the second speech item in the file, is 22000 samples long, and was created from item 1.01 in the same file by the program agc. Similarly:

4. COEFF (11.01) 110 frames from spectran(1.02;window=10,overlap=5)

describes a set of spectral coefficients generated by the program spectran from item 1.02. There are 110 spectra in the item, and the parameter settings 'window=' and 'overlap=' record the settings of the analysis parameters of the program spectran (in this case analysis window size and overlap length in ms).

The processing history string plays an important part in the SFS file organisation. The history string records how a data set was generated from which input items. Just as all items in a file must have distinct item numbers, it is an operational constraint on SFS files that every item must have a distinct processing history string. For more details see section 3.3 (remove) and section 3.10 in the SFS programmer's manual (SFS file update).

The default action for summary is to list all of the items in the file. However, a sub-set of the items can be requested through the use of command-line switches. Thus individual items may be requested using the '-i<itemnumber>' switch:

% summary -i1.01 -itx file

This would summarise item 1.01 and the last TX item in the file. All items of a given type can be requested using the '-a<item type>' switch:

% summary -asp file

Which would summarise all speech items in the file. Multiple '-i' and '-a' switches can be intermixed, e.g.:

% summary -isp -isp.03 -atx file

Although many of the SFS utility programs support the '-a' switch, this is not in general true for speech processing programs (since they usually only process a single item at a time - see respective manual pages). However in summary and all programs that support the '-i<item number>' convention for input item selection, items may also be selected by a match on the processing history string. The format for this is:

-i <item type> ^ <history match>

and summary also allows

-a <item type> ^ <history match>

For example: '-isp^agc', which would match to the first speech item which has a processing history starting with agc.

The history match also allows the wild-card characters '?' = match a single character, and '*' = match zero or more characters. For example:

% summary -asp^genfilt*lowfreq=1000?^ file

which would match speech items generated by genfilt(UCL1) that had a processing parameter lowfreq set to 1000. In this example the '?' matches the final ')' (which the Unix shell will not accept by itself) and the second '^' matches the end of the history. For more information see itemno and histmatch, in the SFS programmer's manual.

2.5 Reading a 'dump'

The program sdump provides detailed access to the contents of an SFS file. In this section we shall look at some of the information produced by operating sdump on a file.

A dump of the main header can be produced with:

% sdump -h file
Main
File id: uc1-586 created Wed Nov 19 17:21:39 1986 by mark
Database: demonstration    Speaker: jv
Session: 1st recording     Session Date: 10-JUN-86
Repetition: 1              Environment: 
Token: A:bA:
Comment:

The program lists the field names and field values from the main header, as described in section 2.2. The File id field shows the unique file code, the creation date and the name of the user that created the file.

sdump is most often used to investigate the contents of the data sets in the file. The default action for dump is to summarise the item headers and data sets for all of the items in the given file. The items required to be dumped may be selected using the '-i' and '-a' switches as described in the previous section for summary.

A dump of a speech item might look like:

% sdump -isp. file
Item 1
Data Type: 1.01 SPEECH (speech alone)
History: inwd/SP(freq=12793)
Parameters: 
Process Date: Wed Nov 19 18:15:52 1986
Format: 2 byte integer
Frame size: 1            Frame count: 4608
Total Length: 9216 bytes Frame Duration : 7.8e-05
Window size: 0           Overlap: 0
Offset: 0                Comment:
Data starts:
2 0-5 0 2 0 0 2 5 0 
2 0-2 0-2 5 5 5 5 0 
5 2-7-5 2 5-5-5 0 0 
0 0 5 5 5 2 2 2 5 0 
0 5 0 2 5 0-5 2 5 5

Here, the Data Type field records the item number, the item type name and a text description of the item (see section 4.3). The History field records the processing history. The Parameters field records any special information about the style, format or labelling of the data set (minimum and maximum frequencies for coefficient items, for example). The Process Date field records the time at which the processing was performed (often the time processing was started rather than when it was completed).

The Format field records the basic construction of individual data elements in the file. Thus the speech data above is recorded as a sequence of two-byte integers (numbers from -32768 to +32767). For more complex data sets, such as coefficients, the Format field records that items are structured or stored in floating point. Descriptions of the different data set types and their internal formats are given in the Programmer's Manual section 2.4.

The individual data elements in a data set are then organised into frames. For speech data, a frame consists of a single data element: the sample value. For spectral coefficients the frame contains a number of elements containing the energies and other control information. The Frame Size field records how many individual elements make up each frame. The Frame Count field records the total number of frames in the data set. The Total Length field records the overall length of the data set (in bytes), i.e. for speech data, the total length = 2 x frame count.

Each frame is associated with an interval of time at a particular location in time. The location and the interval are specified using a number of item header fields (and using numbers held on data records for some item types). The following paragraphs summarise how this information is stored.

The Frame Duration field holds the duration (in seconds) of the basic time interval for the data set. For waveforms this will be the sampling interval (1/sampling rate); for structured items this will be the basic unit in which times are measured. The location in time of a speech sample is simply derived from the sample number and the sample interval. The location in time and the duration of each frame in a structured data set (e.g. spectral coefficients) are stored on the individual data records as two integer values (normally given the names posn and size). These values record times as multiples of the basic time interval held in the Frame Duration field. Thus if a COEFF data set had a frame duration of 0.0001 seconds then a data record with posn=200 and size=100 would be located at 0.02 seconds from the start of the data set, and extending for 0.01 seconds.

Each data set may also be offset in time to time zero, a notional absolute time associated with each file. The Offset field records any difference in time (measured in seconds) between the start of this data set and the file time zero. Typically the first item in a file will have a zero offset, and items processed from it may have non-zero offsets. A positive offset means that the start of the data set is later than time zero, a negative offset means the start is earlier than time zero. A typical cause of non-zero offsets is use of analysis windows, to synchronise the analysis to the centre of the window. The program slink described in section 3.2 can be used to create data sets that are parts of other data sets (a portion of a waveform for example). slink uses the offset time to record the difference between the start of the extracted portion and the start of the original.

The Window Size and Overlap fields record information about data sets produced by fixed-frame analysis - that is analysis where the frame interval is of fixed duration, and each frame starts at regularly-spaced instants of time. Under these conditions, the duration of each frame (in units of Frame Duration) is recorded in the Window Size field, and the overlap between one frame and the next (in units of Frame Duration) is recorded in the Overlap field. For analysis that is not regularly spaced, sdump simply records Synchronous and timing information is recorded in the individual data records.

After the information in the Item Header, sdump summarises the contents of the data set itself. In the example at the beginning of this section, the numbers represent the sample values from a stored waveform. Each data set type is displayed in a particular format. Because many data sets are rather long, the default action of sdump is to print only the first few frames. When the data is being summarised in this way, sdump prints 'Data Starts:' before the data set. A command-line option '-f' requests a full listing of the data set.

Next Section