SPEECH FILING SYSTEM V3.3

FREQUENTLY ASKED QUESTIONS

July 1998


CONTENTS

Installation
1.    What platforms does SFS run on?
2.    What graphics display devices are supported?
3.    What graphics printing devices are supported?
4.    What graphics file formats are supported?
5.    What Analogue to Digital Converters are supported?
6.    What Digital to Analogue Converters are supported?
7.    What audio file formats are supported?
8.    How do I obtain SFS?
9.    What other software do I need to run SFS on DOS?
10.   What environment variables do I need to set?

Common Problems
11.   Why doesn't replay work?
12.   Why doesn't display work?
13.   Why doesn't printing work?
14.   What does 'labels file out of date' mean?

Introduction to Items
15.   What do all these '-i' switches mean?

How To ...
16.   How can I display/print spectrograms?
17.   How can I view part of large file?
18.   How can I decide what to display?
19.   How can I annotate a signal file?
20.   How can I get a fundamental frequency trace?
21.   How can I get an energy trace?
22.   How can I get a set of formant estimates?
23.   How can I filter the signal?
24.   How can I change the sampling rate?
25.   How can I change the speed of the signal?
26.   How can I import data from some other source?
27.   How can I export a signal or annotations?



INSTALLATION

1.    What platforms does SFS run on?

      Binaries are available for MSDOS, WIN32, SPARC/SunOS,
      SPARC/SOlaris, X86/Linux.  The DOS version works under
      MSDOS directly, and also in a DOS box under Windows 3.1 or
      Windows 95.  The WIN32 version runs in a console under
      Windows 95/98 or Windows NT.  The SunOS version is compiled
      on a SPARC-10 running SunOS4.1.3.  The Solaris version is
      compiled on a UltraSparc running Solaris 2.5.

      SFS has been ported to other Unix systems: DEC Alpha,
      Hewlett Packard, Masscomp and Linux.  However these must be
      compiled from the sources provided.

      In each case, the use of the GNU 'C' compiler is
      encouraged.  SFS was originally developed with a K&R C
      compiler, but over time has moved to ANSI-C using
      conditional compilation.

2.    What graphics display devices are supported?

      On DOS systems, SFS uses a graphics library called GRX,
      that was developed by Csaba Biegl.  This supports most
      Super-VGA cards.  It may be found with the DJGPP
      distribution.

      With the WIN32 version, SFS uses the windows API for
      graphics.

      On Unix systems, SFS uses the X-Windows library.

      There is some support for graphics terminals operating over
      serial lines, but this is not encouraged.  UCL have
      developed a graphical telnet program for PCs with NFS
      (contact the author).  SFS will run on an X-terminal or PC
      X-terminal emulator with 256 simultaneous colours.

3.    What graphics printing devices are supported?

      Postscript and Epson Stylus Pro Colour printers.  The WIN32
      version will also print to supported Windows printers.

4.    What graphics file formats are supported?

      On DOS, Encapsulated postscript, WordPerfect graphics files
      and GIF files.

      On Unix, Encapsulated postscript and GIF files.

5.    What Analogue to Digital Converters are supported?

      On DOS, SoundBlaster-16 and Laryngograph Ltd PCLX.

      On WIN32, the use of the multimedia driver means that any
      sound card is supported.  However a 16-bit card with
      constantly variable sampling rates is recommended.

      On Unix, Sun 8-bit and Sun DBRI 16-bit interface.

6.    What Digital to Analogue Converters are supported?

      On DOS, SoundBlaster-8, SoundBlaster-16, Laryngograph PCLX,
      UCL's own expansion bus replay card.

      On WIN32, the use of the multimedia driver means that any
      sound card is supported.  However a 16-bit card with
      constantly variable sampling rates is recommended.

      On Unix, Sun DBRI 16-bit, and Sun 8-bit; Network replay
      using Vista Exceed X-window emulator and UCL telnet program
      (contact author); output to a shell script for use with
      AudioFile or other systems.  Linux dsp device.

7.    What audio file formats are supported?

      SFS maintains its own file format for data.  It needs this
      because it maintains a processing history of each data set;
      this allows a user to keep track of the origin and
      processing of any piece of data.  SFS also tries to keep
      data sets together in a single file, to try and make the
      user interface simpler.  This means that the SFS file
      format must allow multiple copies of multiple types of data
      in a single file; and this precludes the use of other file
      formats.

      To deal with other data file formats, SFS provides
      utilities for importing and exporting data.  For importing
      signals, it is often unnecessary to make a new physical
      copy of the signal;  instead, a command 'slink' simply
      records in an SFS file the instructions for how and where
      to access the data in its original format.  This makes
      access to large read-only databases of data very
      convenient. 

      SFS can link to or read speech signals in the following
      file formats:
           binary files
           WAV format (RIFF format)
           VOC format
           AU format
           ILS format
           AIFF format
           HTK format
           common label file formats

      SFS can write speech signals to files in the following file
      formats (using *list programs for different data types):
           binary files
           WAV format (RIFF format)
           VOC format
           AU format
           AIFF format
           ILS format
           ESPS format
           HTK files (waveform, coefficient and annotations)
           common label file formats

      SFS can also read and write many data sets from/to a
      textual representation.

8.    How do I obtain SFS?

      The easiest way is to obtain an execute-only package of
      binaries for IBMPC/MSDOS or SPARC/SunOS.  These may be downloaded
      from

          https://www.phon.ucl.ac.uk/downloads/sfs/

      Look for files

      msdos1/sfs3ddbn.zip        version 3.dd binaries (non DPMI)
      msdos2/sfs3ddbn.zip        version 3.dd binaries (DPMI)
      win32/sfs3ddbn.zip         version 3.dd binaries (WIN32)
      sunos/sfs3ddbn.tar.gz      version 3.dd binaries (Sunos 4.1.x)
      solaris/sfs3ddbn.tar.gz    version 3.dd binaries (Solaris 2.5)
      linux/sfs3ddbn.tar.gz      version 3.dd binaries (Linux)

      For other Unix systems, you must configure and compile the
      sources from files:

      unix/sfs3dds1.tar.gz       version 3.dd sources part 1
      unix/sfs3dds2.tar.gz       version 3.dd sources part 2
      etc

      The sources are in logical components; in general the
      higher numbered components are the more esoteric.  I
      suggest you just start with the first.

      For MSDOS systems, look for the sources:

      msdos2/sfs3dds1.zip        version 3.dd sources part 1
      msdos2/sfs3dds2.zip        version 3.dd sources part 2
      etc

      A set of demonstration files may be found in:

      demo/sfsdemo.zip           demo for MSDOS
      demo/sfsdemo.tar.gz        demo for Unix

9.    What other software do I need to run SFS on DOS?

      SFS is compiled using the port of GNU C, by D J Delorie;
      commonly known as the djgpp compiler.  This is a flat
      memory model 32-bit compiler that runs in protected mode. 
      Version 1 generates code that will run on straight DOS
      only, version 2 will run in a DOS box under Windows (3.1 or
      95).

      To launch protected mode programs, version 1 of djgpp
      supplies a run-time environment called GO32.EXE.  A recent
      version of this is included in the SFS binary distribution. 
      Later versions are available with the djgpp package (which
      may be downloaded from the SimTel archive in
      SimTel/vendors/djgpp).  Version 2 of DJGPP uses DPMI
      services directly without need for GO32.EXE.  SFS runs
      under version 1 or version 2 of DJGPP.

      To support SVGA graphics, SFS uses a graphics library
      called GRX written by Csaba Biegl.  To operate the graphics
      card, this library uses a driver routine which must be
      compatible with your hardware.  Fortunately, most modern
      hardware supports the VESA standard for SVGA modes.  The
      GRX driver for VESA compatible cards is included in the SFS
      binary distribution.  If your card does not support VESA
      modes, then you will need to get a different driver from
      the GRX distribution.

      To operate in a DOS box under Windows 3.1 or Windows 95,
      you need to use version 2 of the GRX library.

10.   What environment variables do I need to set?

      SFS requires the user to set environment variables to allow
      it to (i) find its home directory, (ii) identify the
      graphics display device, (iii) identify the graphics
      printing device, (iv) identify the digital-to-analogue
      converter device, and (v) identify the analogue-to-digital
      converter device.

      Variable   Example Settings     Meaning
      --------   ----------------     -------
      SFSBASE    /app/sfs             Installation directory is
                                      /app/sfs

      GTERM      xterm                X-Windows
                 svga-256             800x600x256 colour
                 xvga-256             1024x768x256 colour

      GPRINT     printer              postscript printer on stdprn
                 LPT1                 postscript printer on LPT1
                 winprint             Windows printer
                 eps                  EPS file output

      DAC        sun16                Sun DBRI 16-bit
                 sb16                 SoundBlaster 16-bit
                 win32                Windows multimedia

      ADC        sun16                Sun DBRI 16-bit
                 sb16                 SoundBlaster 16-bit
                 win32                Windows multimedia



COMMON PROBLEMS

11.   Why doesn't replay work?

      Replay is the most difficult thing to get right in an
      installation.  If you have had replay working once, then
      the most common cause of failure is that the DAC
      environment variable is not set.  The 'replay' program
      should report an error if it can't determine the replay
      device.

      Other suggestions:
      -    The replay device is not supported (see SFSCONFG.h)
      -    The replay device is not compiled in (see SFSCONFG.h)
      -    Your machine doesn't have a working replay card. 
           Surprisingly, a common problem.  SFS needs to control
           DAC hardware on the machine the program is actually
           executing on.
      -    Volume control turned down, speaker not connected,
           signal level is too low.
      -    For SoundBlaster, the BLASTER variable is not set, or
           interrupt > 8 being used.

      The complete set of settings for DAC may be found in the
      manual page for replay.

12.   Why doesn't display work?

      Mostly because the GTERM environment variable is not set. 
      Look in the file $(SFSBASE)/data/digmap to find the list of
      supported settings.  Graphics devices also need to be
      compiled in, see SFSCONFG.h.

      If you get a message saying that the output is being
      redirected into a 'metafile' then this is because SFS
      cannot determine the type of the graphics device.

      Try 'set GTERM=vga-16' for DOS and 'setenv GTERM xterm" on
      Unix.

      The complete set of settings for GTERM may be found in the
      file $SFSBASE/data/digmap.  Check that this text file is in
      Unix format for Unix machines, and in DOS format for DOS
      machines - formats have been confused in the past.

13.   Why doesn't printing work?

      Mostly because the GPRINT environment variable is not set. 
      Look in the file $(SFSBASE)/data/digmap to find a list of
      supported settings.  Graphics devices also need to be
      compiled in, see SFSCONFG.h.

      Details of how to set up printing under Unix may be found
      in the installation notes.

14.   What does 'labels file out of date' mean?

      SFS uses a text file to convert processing histories into
      English descriptions.  By default this is
      $(SFSBASE)/data/labels.  This file is indexed to make it
      fast to access.  When it is installed on a new machine, its
      date may be updated and SFS thinks that the file has been
      changed but not re-indexed.

      Solution: run the prolab program on the labels file:

           prolab $(SFSBASE)/data/labels

      The labels file is described in the User Manual.  Check
      that this text file is in Unix format for Unix machines,
      and in DOS format for DOS machines - formats have been
      confused in the past.



INTRODUCTION TO ITEMS

15.   What do all these -i switches mean?

      An SFS file can contain many different data sets; it can
      contain multiple speech signals, annotations, formant or
      fundamental frequency data, etc.  SFS uses this grouping of
      data to maintain a 'processing history', a record of the
      antecedents of each data set (or 'item').  To refer to a
      particular piece of data within an SFS file, every SFS
      program understands a common 'item numbering', and the '-i'
      switches specify the item number to the program.

      Item number are made up from two components: a major data
      type code and a simple count code.  The most common major
      types are listed below:

      Major type      Mnemonic   Description
      ----------      --------   -----------
      1               SP         Speech pressure waveform
      2               LX         Laryngograph waveform
      3               TX         Larynx period data
      4               FX         Fundamental frequency data
      5               AN         Annotations
      7               SY         Synthesizer control data
      9               DI         Grey-level display data
      10              CO         Spectral coefficients
      12              FM         Formant estimates
      16              TR         Parameter tracks

      The count code simply records the index number of the data
      type in the file.  If there are two speech items then they
      will have count codes of 1 and 2.

      An item number then, consists of a major type, a period and
      a count code; e.g. 1.01 or 10.05, corresponding to the
      first speech item in the file and the fifth coefficient
      item.  Since numbers are hard to remember, the major type
      numbers may also be replaced by the two-letter mnemonics in
      the table above; e.g. sp.01 or co.05.  Note that the use of
      a leading zero for the count code is optional.

      A given SFS program, then, that processes a single data set
      needs to be able to identify which data set from a given
      file to use as input.  If there is only one data set in the
      file of the appropriate type for the program, then the
      program uses that automatically.  If there is more than one
      data set of the input type, the program will usually select
      the last item of the appropriate type.  However if this is
      not what you want, you need to tell the program which item
      to process using the -i  switch.

      Take as an example that you want to compare a piece of
      speech low-pass filtered at 2000Hz with it high-pass
      filtered at 2000Hz.  The file starts with a single speech
      item, numbered 1.01.  This is then processed by genfilt:

           genfilt -l 2000 file.sfs

      Which generates an item 1.02 in the file.  However the
      command

           genfilt -h 2000 file.sfs

      will not generate the second filtered signal as you might
      have wished.  Genfilt in this instance will take as its
      input item 1.02, which (we know) has been filtered already. 
      Instead we need the command.

           genfilt -i1.01 -h 2000 file.sfs

      Which processes the original signal instead as we required.

      There exists a short hand for the first and the last items
      of a given type.  The first item in the file of a given
      type may be selected by using an item number made up from
      the major type followed by a period only, the last item may
      be selected by using the major type only.  Thus 'sp.'
      refers to the first speech item, 'sp' refers to the last.



HOW TO ...

16.   How do I display/print spectrograms?

      The main display program Es has the capability of
      calculating, displaying and printing spectrograms as you
      work.  To start up Es with display of a speech waveform and
      a spectrogram, use:

           Es -isp -gsp file.sfs

      Es has menu options to produce a hard-copy of the signal
      displayed on the screen.

      The program sprint will also print spectrograms directly to
      the printer.  The programs espect and esform will display
      and print spectrograms with spectral cross-sections.

17.   How do I view part of large file?

      Use the '-s' and '-e' switches on Es to specify the initial
      starting and ending times displayed.  Since Es attempts to
      read an entire data set into memory before displaying it,
      it is necessary to specify the initial times for very large
      files.  It is still possible to scroll forwards and
      backwards in time, but impossible to zoom out to longer
      times than initially specified.

      For example:

           Es -s100 -e130 bigfile.sfs

18.   How can I decide what to display?

      Items may be selected for display using the '-i' and '-a'
      switches to Es or Ds.  But a simpler method is to use the
      tree display in Es - on menu options FILE - TREE.  Each box
      on the tree represents an item, and you can select the
      items you want to display.  You can jump straight to the
      tree display with the '-t' option on Es.  For example:

           Es -t manyitems.sfs

19.   How can I annotate a signal file?

      Use the '-l name' switch to Es.  The string 'name' is a
      name given to the set of annotations.  SFS files may
      contain a number of annotation sets and this name
      differentiates them.

      Es will then display an annotation area at the bottom of
      the display, and have a new ANNO menu button.  To add an
      annotation (i) position the left cursor, (ii) select ANNO,
      (iii) type in a string on the keyboard, (iv) key RETURN.

      Look at the Es manual page for more details.

20.   How can I get a fundamental frequency trace?

      The programs fxac and fxcep provide autocorrelation and
      cepstral methods for fundamental frequency estimation from
      speech signals.  They have a default set of parameters that
      work pretty well on a clean signal.

21.   How can I get an energy trace?

      The program envelope provides a method for generating a
      TRACK item from a speech signal.

22.   How can I get a set of formant estimates?

      The program fmanal provides a set of formant estimates from
      a speech signal.  This has a default set of parameters that
      work pretty well for clean 10kHz sampled speech signals.

23.   How can I filter the signal?

      The program genfilt provides general-purpose low-pass,
      high-pass, band-pass and band-stop filters using recursive
      digital filter designs.

      Low-pass at 100Hz:
           genfilt -l 100 file.sfs

      High-pass at 2000Hz:
           genfilt -h 2000 file.sfs

      Band-pass between 300 and 3500Hz:
           genfilt -h 300 -l 3500 file.sfs

      Band-stop between 3000 and 4000Hz:
           genfilt -l 3000 -h 4000 file.sfs

24.   How can I change the sampling rate?

      The program resamp provides a general purpose
      interpolation/decimation facility for changing sampling
      rates by small integer ratios.

      For example:
           resamp -f 44100 file.sfs

25.   How can I change the speed/pitch of the signal.

      The program respeed provides a general purpose retiming
      facility for speeding-up or slowing down speech without
      changing the pitch.

      The program repitch provides a special purpose method for
      changing speed AND pitch, but requires a set of pitch epoch
      annotations (try pp and txan).

26.   How can I import data from some other source?

      To import into an SFS file, the empty SFS file must be
      created first - this is your opportunity to identify the
      speaker, source and utterance to the system.  Use the 'hed'
      program to create an empty SFS file:

           hed newfile.sfs

      and answer the questions, or

           hed -n newfile.sfs

      for the truly lazy.

      Speech signals may be imported from almost any format.  The
      key is the slink sprogram which creates a pointer in an SFS
      file which indicates where and how data may be read in. 
      The data itself is not copied by slink into the SFS file;
      to do this use slink followed by scopy.

      For example, to link into a binary file with 16-bit samples
      in natural byte order at 20000 samples/sec:

           slink -isp -f20000 ipfile.dat opfile.sfs

      Or to link to a monophonic .WAV file:

           slink -isp -tWAV ipfile.wav opfile.sfs

      Look at the manual page for slink for all options and
      formats.

27.   How can I export a signal or annotations, etc?

      The program splist allows the export of signals in binary
      files and other formats.

      The program sfs2wav creates Windows compatible .WAV files
      and supports multiple channels.

      The program anlist creates text representations of
      annotations.

      The programs sylist, colist, fmlist, trlist, etc export
      other data sets.

      Refer to the manual pages for these programs for details of
      export formats supported.



This software is copyright University College London 1987-1998. 
No part of the software may be sold, but copies may be made and
the software modified and distributed free of charge providing
the copyright of University College London continues to be
demonstrated.

This software bears no warranty or guarantee of any kind.

UCL and Mark Huckvale are unable to support this software. While
bug-fixes are welcome, requests for help may be ignored.

Mark Huckvale
Phonetics and Linguistics
University College London
Gower Street
London WC1E 6BT
SFS@pals.ucl.ac.uk