SFS Manual for Programmers

2. Data Structures

SFS data structures comprise the filing system headers (main, item and link) described in sections 2.1, 2.2 and 2.3, and the speech data structures (one for each item type) described in section 2.4. The 'C' source of all structures may be found in the file $SFSBASE/include/sfs.h.

2.1

Main Header

The SFS main header occupies the start of every SFS file. It records information about the source of the file and the source of the data in the file. Some fields are set automatically when the file is created, others are available for editing by the user. See the description of the utility program hed(SFS1) in User Manual Section 2.2.

The 'C' language definition of the SFS main header is as follows:

/* definition of speech database main header */
struct main_header {
  /* file creation information */
  char hname[4];      /* header name & version */
  long creatdate;     /* file creation date */
  char username[20];  /* username of creator */
  struct file_id fid; /* unique file identifier */
  /* database information */
  char source[20];    /* source of recording */
  char dbase[20];     /* name of database */
  char speaker[20];   /* speaker name */
  char session[20];   /* session code */
  char sessdate[20];  /* session date */
  char token[160];    /* name of spoken token */
  char rep[8];        /* token repetition code */
  /* recording information, etc */
  char environ[8];    /* recording conditions */
  char archive[20];   /* archiving details */
  char comment[80];   /* general comments */
  char spare[96];     /* space for expansion */
  long machine;       /* machine type */
};

These fields will be described in turn:

hname: This field contains the header name and version, it is used by the SFS software to identify SFS files and to allow for future enhancements. It currently contains the characters: 'U', 'C', '2' and '\0'. Set by sfsopen().

creatdate: The encoded date and time of the file creation. Set by sfsopen().

username: The username of the person who created the file. Set by sfsopen().

fid: This is a simple structure having the form:

    struct file_id {
      char site[4];
      long num;
    }

It records a three-letter site-name code and a unique number. This field is set when the file is created by sfsopen(). A record of the site name and last file number is recorded in the file $SFSBASE/dbase.id.

source: This is set by the user to record the institution which provided the contents of the file.

dbase: This is set by the user to record the name of the database from which the file has been taken.

speaker: This is set by the user to record the name of the speaker, if appropriate.

session: This is set by the user to record the name of the recording session.

sessdate: This is set by the user to record the date of the recording session.

token: This is set by the user to give the data in the file a suitable identifying name - used in displays and reports.

rep: This is set by the user to differentiate repetitions of tokens.

environ: This field is reserved for later use to record information about the recording conditions.

archive: This field is reserved for later use to record archiving details.

comment: This field is set by the user to record any particular comments about the data in the file.

spare: This field reserves space for future main header fields.

machine: This field records the machine format for the information in the main header. Currently this may be one of:

    0x00000000 68000 hi-lo byte order, e.g. Motorola/Sun
    0x01010101 lo-hi byte order, e.g. Intel

This field affects the decoding of binary variables. It defines the expected byte order for long integers. The conversion of foreign header fields is performed by sfsopen().

This field is set by sfsopen() on header creation. The current machine type is available from the constant SFSMACHINE.

2.2

Item Header

The SFS item header is used as a prefix for every data set in a file. It records information about the source and format of the data.

The 'C' language definition of the SFS main header is as follows:

/* definition of speech database item header */
struct item_header {
  char history[256];    /* processing history */
  char params[128];     /* special processing parameters */
  long processdate;     /* processing date */
  long datatype;        /* data type: speech, lx, etc */
  long subtype;         /* data sub-type */
  long floating;        /* fixed or floating data */
  long datasize;        /* data item size (bytes) */
  long framesize;       /* no. items in time frame */
  long numframes;       /* no. frames in data */
  long length;          /* overall length (bytes) */
  double frameduration; /* time interval duration (s) */
  long datapresent;     /* data availability */
  double offset;        /* offset to file zero */
  char comment[20];     /* data set comment */
  long windowsize;      /* analysis window in samples */
  long overlap;         /* analysis overlap in samples */
  long lxsync;          /* variable frame durations flag */
  char spare[40];       /* space for expansion */
  long machine;         /* machine type */
};

These fields will be described in turn. Where information is given about setting fields with values, this refers to the creation of new data sets.

history: This field is intended to record the processing history of the data set following. Thus this string needs to record: (i) the name of the program that generated the data set, (ii) which output item (for programs that produce more than one), (iii) which input items, and (iv) the setting of any flags or parameters that the user can modify. The required format for this information is:

progname/outtype(input items;parameter list)

but fields may be dropped if they are empty:

progname(input items;parameter list)
progname(input items)
progname(parameter list)
progname

The format for each field is as follows:

outtype: uppercase item type mnemonic (SP, LX, etc)

input items: comma separated, numeric format %d.%02d

parameter list: comma separated, tag format <parameter>=<value>

For example:

testsig
inwd/SP(freq=12800)
tx(2.01)
HQanal/FM(1.02,3.01;window=20,step=10)

params: This field details parameters of the data set not catered for in the standard set of data description fields (below). Such details do not include parameters of the processing that may be changed by the user, since these should have been placed in the history field. Example use of the params field is for a record of the maximum and minimum frequency of the values in a stored spectrum. The format of the params field is the same as for the history parameter list, namely comma separated tag format: <parameter>=<value>.

See the descriptions of data set structures in 2.2.4 to find any mandatory parameters for a data set.

processdate: The encoded date and time of the generation of the data set. Set automatically by sfsheader().

datatype: The major item type. Speech=1, Lx=2, etc. Currently defined values are:

NumberMnemonicName
1SPSpeech pressure waveform
2LXLaryngograph waveform
3TXFundamental period markers
4FXFundamental frequency trace
5ANAnnotations
6LPLower phonetic records
7SYSynthesizer control data
8WDWord chart
9DIGrey-level display
10VUVoicing
11COEnergy coefficients
12FMFormant estimates
13ENEnergy
14PCLinear prediction coefficients
15MMMarkov model
16TRAcoustic parameter trace
17GEC Reserved
18GEArticulatory model geometry
19AEArticulatory model aerodynamics
20IPArticulatory model source parameters
21SCArticulatory model source
22PHPhysiological data
23RPRational polynomial filter
24RRRational rootlist filter
25UGGlottal flow
26XMExcitation model
27NAArticulatory model nose
28CACalibration
29ARArea (e.g. glottal)

The manifest constant MAXDATATYPE records the number of the largest datatype assigned a function.

When remove leaves an item stub (see User Manual Section 3.3), it negates the value of the datatype field. Set this field with sfsheader().

subtype: This field is used to count repetitions of a data set type in a file. Programs should not assume that data sets have any particular subtype number. This field is generated by sfsupdate() to be one more than the largest subtype value for items of the given type in a file once any duplicate items have been removed. It is not necessary to set values in this field.

floating: Records primary format of data set:

    1  Floating point values
    0  Integer values
    -1 Structured records

The floating field should be set with sfsheader().

datasize: Size of primary unit of data set. Record size of each element in unstructured data sets, or size of data buffer elements in structured data sets. Thus:

    1 sizeof(char)  e.g. AN, DI
    2 sizeof(short) e.g. SP, LX, FX, SY
    4 sizeof(long)  e.g. TX
    4 sizeof(float) e.g. CO, PC, etc.

This field was intended to describe unstructured types. It use is clumsy for structured types, where the only really important value is the product datasize * framesize which must equal the length of each structure. Since many structures have a fixed part and a variable part (see e.g. co_rec in section 2.4), the requirements above mean that the length of the fixed part must be an integer multiple of the component elements of the variable part. In practice this is not too much of a limitation.

The datasize field should be set with sfsheader().

framesize: Number of primary units per time frame. A frame of data is associated with a single interval of time. Thus waveforms have a framesize equal to 1. Synthesizer control data, which has a number of short integer values per unit time, would have a framesize equal to the number of values. Structured records must be a multiple of datasize bytes in length, and framesize records this multiple.

Since framesize records the number of data elements in an entire frame, this includes both the fixed part and the variable part of some structured items. Commonly however it is necessary to calculate the number of values in the variable part given the framesize, and vice versa: the value of framesize given the number of values. The SFS global array sfsstruct[] records the sizes of the fixed part in bytes of each defined data structure. Using this we can calculate the number of values in the variable part of a structure:

numvalues = (item.framesize * item.datasize 
  - sfsstruct[item.datatype]) / item.datasize

This is available as the macro SFSRECSIZE(&item) defined in sfs.h. Similarly the framesize value is:

framesize = (sfsstruct[item.datatype] + numvalues 
  * item.datasize) / item.datasize

Annotation records are the only variable-length records (frames) currently in use. For variable length records, each record is preceded in the file by a single byte length field (this is performed by sfswrite() and need not concern the programmer). To indicate variable length records (single-byte leader) framesize should be set to -1.

The framesize field should be set with sfsheader().

numframes: This field records the total number of frames in the data set. It is set by calls to sfswrite().

length: This field records the total length of the data set in bytes. It is set by calls to sfswrite(). When remove(SFS1) creates an item stub (see section 1.3.3) this field is set to zero.

The length field is primarily used to provide the offset to the start of the next item header. That is, the following loop traverses an SFS file:

    fread(&head,sizeof(struct main_header),1,ip);
    while (fread(&item,sizeof(struct item_header), 1, ip)==1)
      fseek(ip,item.length,1);

provided that the machine type of all the headers matches the current system. The equivalent loop using SFS routines always works:

    fid = sfsopen(filename,"r",&head);
    while (sfsnextitem(fid,&item)) /* loop */;

frameduration: This records the basic time interval in which the location and duration of frames are recorded. For unstructured items the frameduration value is the duration of a frame. For structured items, the location and duration of a frame is commonly recorded on each data record independently. These values are held as long integers on the records, and the frameduration field records the duration of each unit of time that these integers count.

TX data sets are unusual in that they are a simple sequence of long integers, each integer recording the interval between one fundamental period and the next in units of frameduration.

The frameduration field should be set using sfsheader().

datapresent: This is a flag recording where the data set that corresponds to this item may be found. Its current values are:

    0  data set not present
    1  data set follows in this file
    2  data set pointed to by link header following in this file

The datapresent flag is set automatically by sfsheader(), sfsupdate() and sfswritelink().

offset: This records any relative offset between data sets in a file. Each file has a notional time zero usually defined as the start of the first frame of the first data set in the file. Other data sets may have an offset with respect to time zero, which should be recorded in this field using sfsheader().

Thus the absolute time for a frame is its relative location + offset:

time_of_frame = item.offset + frame_position * item.frameduration

The frame position in one item (item1) given the frame position in a different item (item2) is given by:

frame_position_1 = ((item2.offset + frame_position_2 *
  item2.frameduration) - item1.offset) / item1.frameduration

comment: Reserved for later use.

windowsize & overlap: For structured items where the position and duration of each frame may be found by simple calculation, certain processing may operate more efficiently. For such fixed-frame analysis, record in these fields the duration of each frame (in units of frameduration) and the overlap of one frame with the next (0 <= overlap < windowsize). Set the values with sfsheader().

lxsync: If the windowsize and overlap fields are not appropriate for the data set (variable frame analysis). Set this flag to 1 using sfsheader().

spare: This field reserves space for future item header fields.

machineThis field records the machine format for the information in the item header. Currently this may be one of:

    0x00000000 68000 hi-lo byte order, e.g. Motorola/Sun
    0x01010101 lo-hi byte order, e.g. Intel

This field affects the decoding of binary variables. It defines the expected byte order for integers and floats. The bit pattern for floating point numbers is ANSI as used in 68000/Unix and 8086/MS-DOS. The conversion of foreign item header fields is performed by sfsitem() and sfsnextitem(). The conversion of foreign data sets is performed automatically by sfsread().

This field is set automatically by sfsheader(). The current machine type is available from the constant SFSMACHINE.

2.3

Link Header

The SFS link header is used with an item header to create a data item in a file in circumstances where the data itself is stored elsewhere. The presence of a link header redirects sfsread() from the source file to the actual location of the data. Link headers are typically set up by slink, see User manual section 3.2.

The 'C' language definition of the SFS link header is as follows:

/* structure of link item header */
struct link_header {
  char filename[128]; /* linked file name */
  char filepath[128]; /* file access path */
  long filetype;      /* file format code */
  long datatype;      /* SFS file datatype */
  long subtype;       /* SFS file subtype */
  long offset;        /* byte offset into data */
  long multiplex;     /* data multiplexing size */
  long linkdate;      /* date/time of link */
  long swab;          /* swap byte flag */
  long dcoffset;      /* DC offset (subtracted) */
  long shift;         /* sample bit shift */
  char spare[216];    /* space for expansion */
  long machine;       /* SFSMACHINE for file */
};

These fields are described in turn:

filename: Absolute pathname of source file for data set.

filepath: Node name for networked files.

filetype: Data file format code of source file from:

    0  Binary
    1  SFS File

datatype: For SFS files, the datatype number of source data set.

subtype: For SFS files, the subtype number of source data set.

offset: Byte offset into data set of first data frame.

multiplex: (Waveforms only) Multi-channel data indicator:

    0  One channel (non-multiplexed)
    1  Two channels
    n  (n+1) channels

linkdate: Date and time at which link was set up.

swab: (Waveforms only) Short integers in incorrect byte order. If this flag is set, short integers read from the source file are automatically byte-swapped by sfsread().

dcoffset: (Waveforms only) Sample value offset to zero. This value is automatically added to all sample values by sfsread() (after any byte swapping).

shift: (Waveforms only) Binary shift required on sample values. This value is used to perform binary shifts on sample values: +ve values = shift left, -ve values = shift right. The shifting is performed by sfsread() after any DC offset or byte swapping. The shift field is also used to flag when the link points to 8 bit samples packed one sample/byte.

spare: This field reserves space for future link header fields.

machine: This field records the machine format for the information in the link header. Currently this may be one of:

    0x00000000 68000 hi-lo byte order, e.g. Motorola/Sun
    0x01010101 lo-hi byte order, e.g. Intel

This field affects the coding of binary variables. It defines the expected byte-order for long integers in the header. The conversion of foreign link headers is performed by sfsread().

2.4

Data Set Structures

In this section, the 'C' language definitions of the SFS data structures are given, along with a short description of what individual fields are used to hold.

1. SPEECH - Speech pressure waveform

short sp[];

Notes: Set bits=12 in params field for 12-bit data right aligned in word.

2. LX - Laryngograph waveform

short lx[];

3. TX - Fundamental periods

long tx[];

Notes: Each value holds fundamental period, accumulate to get time.

4. FX - Fundamental frequency

short fx[];

5. ANNOTATION - Event labels

/* structure of an annotation record */
#define MAXANLABEL 247   /* including final '\0' */
struct an_rec {
  long posn;             /* frame number */
  long size;             /* width in frames */
  char *label;           /* annotation label */
} an[];

6. PHONETIC - Phonetic segments

/* structure of a lower phonetic record */
struct lp_rec {
  chars name[8];        /* symbol name */
  long length1;         /* duration (ms) 1 */
  long pitch1;          /* pitch (Hz) 1 */
  long length2;         /* duration (ms) 1 */
  long pitch2;          /* pitch (Hz) 2 */
  unsigned char *alist; /* attribute list */
} lp[];

Notes: length1 and pitch1 hold synthetic prosody, i.e. calculated from rule. length2 and pitch2 hold natural prosody, i.e. measured. The location and values of attributes are held in the phonetic dictionary: SFSBASE/data/phon.dic.

7. SYNTH - Synthesizer control data

short sy[][];

Notes: Locations for Holmes Parallel Formant Synthesizer:

    0   FX Fundamental frequency Hz
    1   AH Aspiration amplitude db/10
    2   MS Mark-Space Ratio 0-63(ignored)
    3   F1 Frequency Formant 1 Hz
    4   A1 Amplitude Formant 1 db/10
    5   B1 Bandwidth Formant 1 Hz(ignored)
    6   F2 Frequency Formant 2 Hz
    7   A2 Amplitude Formant 2 db/10
    8   B2 Bandwidth Formant 2 Hz(ignored)
    9   F3 Frequency Formant 3 Hz
    10  A3 Amplitude Formant 3 db/10
    11  B3 Bandwidth Formant 3 Hz(ignored)
    12  F4 Frequency Formant 4 Hz(ignored)
    13  A4 Amplitude Formant 4 db/10
    14  B4 Bandwidth Formant 4 Hz(ignored)
    15  FN Frequency Nasal Formant Hz
    16  AN Amplitude Nasal Formant+A1 db/10
    17  BN Bandwidth Nasal Formant Hz(ignored)
    18  VR Voicing Ratio 0-248

Values marked 'ignored', are currently set for an entire synthesis in the synthesizer parameter file (or in EPROM) see soft(UCL1).

8. WORD - Syntactic constituents/arcs in chart

/* structure of a word record */
#define MAXWDLEN 32
struct  wd_rec {
  long  start;            /* start node */
  long  end;              /* end node */
  float score;            /* word score */
  char  symbol[MAXWDLEN]; /* word symbol */
  char  *attr;            /* word attribute list */
} wd[];

9. DISPLAY - Grey-level display

/* structure of a display item */
struct di_rec {
  long posn;    /* start sample */
  long size;    /* window size */
  char *pixel;  /* pixel list */
} di[];

Notes: SFS format DISPLAY items have the string 'sfsformat' in the params field. The params field should also contain minimum (minf=) and maximum (maxf=) frequency settings to set the range of the frequency axis, or the definitions of a set of labels for each row of data (labels=val1|val2|..|valn).

10. VOICE - Voicing

/* structure of a voicing item */
struct vu_rec {
  long posn;    /* starting sample */
  long size;    /* analysis window size */
  long flag;    /* 0=unvoiced, 1=voiced */
  float mix;    /* excitation mix */
  float gain;   /* overall gain on frame */
  float *data;  /* energy */
} vu[];

Notes: The length of the variable part of the voicing structure is usually zero.

11. COEFF - Spectral coefficients

/* structure of a coefficients item */
struct co_rec {
  long posn;     /* starting sample */
  long size;     /* analysis window size */
  long flag;     /* 0=unvoiced, 1=voiced */
  float mix;     /* excitation mix */
  float gain;    /* overall gain on frame */
  float *data;   /* energies */
} co[];

Notes: The item params field records the minimum (minf=) and maximum (maxf=) frequency, or a set of labels, see DISPLAY type.

12. FORMANT - Formant estimates

/* structure of a raw formant estimates item */
struct fm_rec_array {
  float freq;    /* formant frequency */
  float amp;     /* formant amplitude */
  float band;    /* formant bandwidth */
};
struct fm_rec {
  long posn;     /* starting sample number */
  long size;     /* analysis window size */
  long flag;     /* 0=unvoiced, 1=voiced */
  float gain;    /* overall gain for frame */
  long npeaks;   /* number of valid formants */
  struct fm_rec_array *formant;  /* formant structure */
} fm[];

Notes: The fm_rec structure is of fixed size for any data with space in every record for a number of peak estimates. The number of valid peaks in any given record is given by the value npeaks.

13. ENERGY - Spectral energy

/* structure of an energy item */
struct en_rec {
  long posn;     /* starting sample */
  long size;     /* analysis window size */
  long flag;     /* 0=unvoiced, 1=voiced */
  float mix;     /* excitation mix */
  float gain;    /* overall gain on frame */
  float *data;   /* energies */
} en[];

Notes: The length of the variable part of an energy record is usually 1.

14. LPC - Linear predictor coefficients

/* structure of an lpc item */
struct pc_rec {
  long posn;     /* starting sample */
  long size;     /* analysis window size */
  long flag;     /* 0=unvoiced, 1=voiced */
  float mix;     /* excitation mix */
  float gain;    /* overall gain on frame */
  float *data;   /* predictor coefficients */
} pc[];

Notes: Currently, the values on the record are polynomial coefficients only.

16. TRACK - Parameter Tracks

float tr[];

18. GEOM - Articulatory Model Geometry

#define ALT_RECL 51/* # sections in vocal tract */
struct ge_rec {
  short areas[ALT_RECL];  /* vocal tract areas 0.01 mm2 */
  short ag;               /* average glottal area 0.01 mm2 */
  short av;               /* velo-pharyngeal port 0.01 mm2 */
  float q;                /* glottal fold tension */
  float plu;              /* lung air pressure */
  float vc;               /* volume of the main cavity */
  float ac;               /* area of the main constriction */
} ge[];

19. AERO - Articulatory Model Aerodynamics

struct ae_rec {
  float psg;   /* subglottal pressure */
  float pc;    /* supraglottal pressure */
  float pdiff; /* pressure drop across glottis */
  float ug;    /* glottal air flow */
  float uv;    /* nasal air flow */
  float uc;    /* oral air flow */
  float vlu;   /* lung volume */
} ae[];

20. SRCPARM - Articulatory Model Source Parameters

struct ip_rec {
  float voia;   /* voice amplitude envelope */
  float voif;   /* Fx */
  float frica;  /* frication amplitude */
  float frcvpa; /* frication amplitude at velum */
  float aspa;   /* aspiration amplitude */
  float tcr;    /* closed ratio of glottis */
  float td;     /* closing time for glottis */
  float k;      /* asymmetry factor for Fant wave */
} ip[];

21. SOURCE - Articulatory Model Source

struct sc_rec {
  float voice;      /* glottal air flow */
  float frication;  /* main turbulence source */
  float fricvp;     /* turbulence source at velum */
  float aspiration; /* turbulence at the glottis */
  float dpc;        /* differential pressure plos. */
  short agc;        /* larynx synch. glottal area */
} sc[];

23. RPOLY - Rational polynomial filter

struct rp_rec {
  long posn;    /* sample number of window start */
  long size;    /* size of analysis window */
  float gain;   /* gain of filter */
  long delay;   /* delay in samples */
  long nzero;   /* number of zeros */
  float *data;  /* coeffs begin with top z**-1 */
} rp[];

24. RROOT - Rational rootlist filter

struct rr_rec {
  long posn;    /* sample number of window start */
  long size;    /* size of analysis window */
  float gain;   /* gain of filter */
  long delay;   /* delay in samples */
  long nzero;   /* number of zeros */
  long nzpair;  /* number of zero complex pairs */
  long nppair;  /* number of pole complex pairs */
  float *data;  /* complex zeros, real zeros, 
                   complex poles, real poles */
} rr[];

26. EXCITE - Excitation model

struct xm_rec {
  double start;  /* opening time in sec for cycle */
  double length; /* length in seconds */
  float amp;     /* amplitude of excitation */
  float mix;     /* fraction of total power in v+ */
  float *data;   /* model-dependent parameters */
} xm[];

Notes: Item params field contains 'model=xxxxx' to identify model.

27. NASAL - Nose description

struct na_rec {
  char id[80];           /* name of this nose */
  short areas[ALT_RECL]; /* cross-sections in 0.01mm2 */
  int len;               /* length of the nose in 0.5cm */
} na;

28. CALIBRATION - Calibration data

struct ca_rec {
  char units[80];/* Physical units */
  float slope1;     /* Slope through 0 and C of G */
  float resid1; 
  float const2;     /* Straight line fit */
  float slope2;
  float resid2;
  float coeff3[3];  /* Second order coefficients */
  float resid3;
  float min,max;    /* Values min, max */
} ca;
Next Section


© 2000 Mark Huckvale University College London