SPEECH FILING SYSTEM V3.3
FREQUENTLY ASKED QUESTIONS
July 1998
CONTENTS
Installation
1. What platforms does SFS run on?
2. What graphics display devices are supported?
3. What graphics printing devices are supported?
4. What graphics file formats are supported?
5. What Analogue to Digital Converters are supported?
6. What Digital to Analogue Converters are supported?
7. What audio file formats are supported?
8. How do I obtain SFS?
9. What other software do I need to run SFS on DOS?
10. What environment variables do I need to set?
Common Problems
11. Why doesn't replay work?
12. Why doesn't display work?
13. Why doesn't printing work?
14. What does 'labels file out of date' mean?
Introduction to Items
15. What do all these '-i' switches mean?
How To ...
16. How can I display/print spectrograms?
17. How can I view part of large file?
18. How can I decide what to display?
19. How can I annotate a signal file?
20. How can I get a fundamental frequency trace?
21. How can I get an energy trace?
22. How can I get a set of formant estimates?
23. How can I filter the signal?
24. How can I change the sampling rate?
25. How can I change the speed of the signal?
26. How can I import data from some other source?
27. How can I export a signal or annotations?
INSTALLATION
1. What platforms does SFS run on?
Binaries are available for MSDOS, WIN32, SPARC/SunOS,
SPARC/SOlaris, X86/Linux. The DOS version works under
MSDOS directly, and also in a DOS box under Windows 3.1 or
Windows 95. The WIN32 version runs in a console under
Windows 95/98 or Windows NT. The SunOS version is compiled
on a SPARC-10 running SunOS4.1.3. The Solaris version is
compiled on a UltraSparc running Solaris 2.5.
SFS has been ported to other Unix systems: DEC Alpha,
Hewlett Packard, Masscomp and Linux. However these must be
compiled from the sources provided.
In each case, the use of the GNU 'C' compiler is
encouraged. SFS was originally developed with a K&R C
compiler, but over time has moved to ANSI-C using
conditional compilation.
2. What graphics display devices are supported?
On DOS systems, SFS uses a graphics library called GRX,
that was developed by Csaba Biegl. This supports most
Super-VGA cards. It may be found with the DJGPP
distribution.
With the WIN32 version, SFS uses the windows API for
graphics.
On Unix systems, SFS uses the X-Windows library.
There is some support for graphics terminals operating over
serial lines, but this is not encouraged. UCL have
developed a graphical telnet program for PCs with NFS
(contact the author). SFS will run on an X-terminal or PC
X-terminal emulator with 256 simultaneous colours.
3. What graphics printing devices are supported?
Postscript and Epson Stylus Pro Colour printers. The WIN32
version will also print to supported Windows printers.
4. What graphics file formats are supported?
On DOS, Encapsulated postscript, WordPerfect graphics files
and GIF files.
On Unix, Encapsulated postscript and GIF files.
5. What Analogue to Digital Converters are supported?
On DOS, SoundBlaster-16 and Laryngograph Ltd PCLX.
On WIN32, the use of the multimedia driver means that any
sound card is supported. However a 16-bit card with
constantly variable sampling rates is recommended.
On Unix, Sun 8-bit and Sun DBRI 16-bit interface.
6. What Digital to Analogue Converters are supported?
On DOS, SoundBlaster-8, SoundBlaster-16, Laryngograph PCLX,
UCL's own expansion bus replay card.
On WIN32, the use of the multimedia driver means that any
sound card is supported. However a 16-bit card with
constantly variable sampling rates is recommended.
On Unix, Sun DBRI 16-bit, and Sun 8-bit; Network replay
using Vista Exceed X-window emulator and UCL telnet program
(contact author); output to a shell script for use with
AudioFile or other systems. Linux dsp device.
7. What audio file formats are supported?
SFS maintains its own file format for data. It needs this
because it maintains a processing history of each data set;
this allows a user to keep track of the origin and
processing of any piece of data. SFS also tries to keep
data sets together in a single file, to try and make the
user interface simpler. This means that the SFS file
format must allow multiple copies of multiple types of data
in a single file; and this precludes the use of other file
formats.
To deal with other data file formats, SFS provides
utilities for importing and exporting data. For importing
signals, it is often unnecessary to make a new physical
copy of the signal; instead, a command 'slink' simply
records in an SFS file the instructions for how and where
to access the data in its original format. This makes
access to large read-only databases of data very
convenient.
SFS can link to or read speech signals in the following
file formats:
binary files
WAV format (RIFF format)
VOC format
AU format
ILS format
AIFF format
HTK format
common label file formats
SFS can write speech signals to files in the following file
formats (using *list programs for different data types):
binary files
WAV format (RIFF format)
VOC format
AU format
AIFF format
ILS format
ESPS format
HTK files (waveform, coefficient and annotations)
common label file formats
SFS can also read and write many data sets from/to a
textual representation.
8. How do I obtain SFS?
The easiest way is to obtain an execute-only package of
binaries for IBMPC/MSDOS or SPARC/SunOS. These may be downloaded
from
https://www.phon.ucl.ac.uk/downloads/sfs/
Look for files
msdos1/sfs3ddbn.zip version 3.dd binaries (non DPMI)
msdos2/sfs3ddbn.zip version 3.dd binaries (DPMI)
win32/sfs3ddbn.zip version 3.dd binaries (WIN32)
sunos/sfs3ddbn.tar.gz version 3.dd binaries (Sunos 4.1.x)
solaris/sfs3ddbn.tar.gz version 3.dd binaries (Solaris 2.5)
linux/sfs3ddbn.tar.gz version 3.dd binaries (Linux)
For other Unix systems, you must configure and compile the
sources from files:
unix/sfs3dds1.tar.gz version 3.dd sources part 1
unix/sfs3dds2.tar.gz version 3.dd sources part 2
etc
The sources are in logical components; in general the
higher numbered components are the more esoteric. I
suggest you just start with the first.
For MSDOS systems, look for the sources:
msdos2/sfs3dds1.zip version 3.dd sources part 1
msdos2/sfs3dds2.zip version 3.dd sources part 2
etc
A set of demonstration files may be found in:
demo/sfsdemo.zip demo for MSDOS
demo/sfsdemo.tar.gz demo for Unix
9. What other software do I need to run SFS on DOS?
SFS is compiled using the port of GNU C, by D J Delorie;
commonly known as the djgpp compiler. This is a flat
memory model 32-bit compiler that runs in protected mode.
Version 1 generates code that will run on straight DOS
only, version 2 will run in a DOS box under Windows (3.1 or
95).
To launch protected mode programs, version 1 of djgpp
supplies a run-time environment called GO32.EXE. A recent
version of this is included in the SFS binary distribution.
Later versions are available with the djgpp package (which
may be downloaded from the SimTel archive in
SimTel/vendors/djgpp). Version 2 of DJGPP uses DPMI
services directly without need for GO32.EXE. SFS runs
under version 1 or version 2 of DJGPP.
To support SVGA graphics, SFS uses a graphics library
called GRX written by Csaba Biegl. To operate the graphics
card, this library uses a driver routine which must be
compatible with your hardware. Fortunately, most modern
hardware supports the VESA standard for SVGA modes. The
GRX driver for VESA compatible cards is included in the SFS
binary distribution. If your card does not support VESA
modes, then you will need to get a different driver from
the GRX distribution.
To operate in a DOS box under Windows 3.1 or Windows 95,
you need to use version 2 of the GRX library.
10. What environment variables do I need to set?
SFS requires the user to set environment variables to allow
it to (i) find its home directory, (ii) identify the
graphics display device, (iii) identify the graphics
printing device, (iv) identify the digital-to-analogue
converter device, and (v) identify the analogue-to-digital
converter device.
Variable Example Settings Meaning
-------- ---------------- -------
SFSBASE /app/sfs Installation directory is
/app/sfs
GTERM xterm X-Windows
svga-256 800x600x256 colour
xvga-256 1024x768x256 colour
GPRINT printer postscript printer on stdprn
LPT1 postscript printer on LPT1
winprint Windows printer
eps EPS file output
DAC sun16 Sun DBRI 16-bit
sb16 SoundBlaster 16-bit
win32 Windows multimedia
ADC sun16 Sun DBRI 16-bit
sb16 SoundBlaster 16-bit
win32 Windows multimedia
COMMON PROBLEMS
11. Why doesn't replay work?
Replay is the most difficult thing to get right in an
installation. If you have had replay working once, then
the most common cause of failure is that the DAC
environment variable is not set. The 'replay' program
should report an error if it can't determine the replay
device.
Other suggestions:
- The replay device is not supported (see SFSCONFG.h)
- The replay device is not compiled in (see SFSCONFG.h)
- Your machine doesn't have a working replay card.
Surprisingly, a common problem. SFS needs to control
DAC hardware on the machine the program is actually
executing on.
- Volume control turned down, speaker not connected,
signal level is too low.
- For SoundBlaster, the BLASTER variable is not set, or
interrupt > 8 being used.
The complete set of settings for DAC may be found in the
manual page for replay.
12. Why doesn't display work?
Mostly because the GTERM environment variable is not set.
Look in the file $(SFSBASE)/data/digmap to find the list of
supported settings. Graphics devices also need to be
compiled in, see SFSCONFG.h.
If you get a message saying that the output is being
redirected into a 'metafile' then this is because SFS
cannot determine the type of the graphics device.
Try 'set GTERM=vga-16' for DOS and 'setenv GTERM xterm" on
Unix.
The complete set of settings for GTERM may be found in the
file $SFSBASE/data/digmap. Check that this text file is in
Unix format for Unix machines, and in DOS format for DOS
machines - formats have been confused in the past.
13. Why doesn't printing work?
Mostly because the GPRINT environment variable is not set.
Look in the file $(SFSBASE)/data/digmap to find a list of
supported settings. Graphics devices also need to be
compiled in, see SFSCONFG.h.
Details of how to set up printing under Unix may be found
in the installation notes.
14. What does 'labels file out of date' mean?
SFS uses a text file to convert processing histories into
English descriptions. By default this is
$(SFSBASE)/data/labels. This file is indexed to make it
fast to access. When it is installed on a new machine, its
date may be updated and SFS thinks that the file has been
changed but not re-indexed.
Solution: run the prolab program on the labels file:
prolab $(SFSBASE)/data/labels
The labels file is described in the User Manual. Check
that this text file is in Unix format for Unix machines,
and in DOS format for DOS machines - formats have been
confused in the past.
INTRODUCTION TO ITEMS
15. What do all these -i switches mean?
An SFS file can contain many different data sets; it can
contain multiple speech signals, annotations, formant or
fundamental frequency data, etc. SFS uses this grouping of
data to maintain a 'processing history', a record of the
antecedents of each data set (or 'item'). To refer to a
particular piece of data within an SFS file, every SFS
program understands a common 'item numbering', and the '-i'
switches specify the item number to the program.
Item number are made up from two components: a major data
type code and a simple count code. The most common major
types are listed below:
Major type Mnemonic Description
---------- -------- -----------
1 SP Speech pressure waveform
2 LX Laryngograph waveform
3 TX Larynx period data
4 FX Fundamental frequency data
5 AN Annotations
7 SY Synthesizer control data
9 DI Grey-level display data
10 CO Spectral coefficients
12 FM Formant estimates
16 TR Parameter tracks
The count code simply records the index number of the data
type in the file. If there are two speech items then they
will have count codes of 1 and 2.
An item number then, consists of a major type, a period and
a count code; e.g. 1.01 or 10.05, corresponding to the
first speech item in the file and the fifth coefficient
item. Since numbers are hard to remember, the major type
numbers may also be replaced by the two-letter mnemonics in
the table above; e.g. sp.01 or co.05. Note that the use of
a leading zero for the count code is optional.
A given SFS program, then, that processes a single data set
needs to be able to identify which data set from a given
file to use as input. If there is only one data set in the
file of the appropriate type for the program, then the
program uses that automatically. If there is more than one
data set of the input type, the program will usually select
the last item of the appropriate type. However if this is
not what you want, you need to tell the program which item
to process using the -i - switch.
Take as an example that you want to compare a piece of
speech low-pass filtered at 2000Hz with it high-pass
filtered at 2000Hz. The file starts with a single speech
item, numbered 1.01. This is then processed by genfilt:
genfilt -l 2000 file.sfs
Which generates an item 1.02 in the file. However the
command
genfilt -h 2000 file.sfs
will not generate the second filtered signal as you might
have wished. Genfilt in this instance will take as its
input item 1.02, which (we know) has been filtered already.
Instead we need the command.
genfilt -i1.01 -h 2000 file.sfs
Which processes the original signal instead as we required.
There exists a short hand for the first and the last items
of a given type. The first item in the file of a given
type may be selected by using an item number made up from
the major type followed by a period only, the last item may
be selected by using the major type only. Thus 'sp.'
refers to the first speech item, 'sp' refers to the last.
HOW TO ...
16. How do I display/print spectrograms?
The main display program Es has the capability of
calculating, displaying and printing spectrograms as you
work. To start up Es with display of a speech waveform and
a spectrogram, use:
Es -isp -gsp file.sfs
Es has menu options to produce a hard-copy of the signal
displayed on the screen.
The program sprint will also print spectrograms directly to
the printer. The programs espect and esform will display
and print spectrograms with spectral cross-sections.
17. How do I view part of large file?
Use the '-s' and '-e' switches on Es to specify the initial
starting and ending times displayed. Since Es attempts to
read an entire data set into memory before displaying it,
it is necessary to specify the initial times for very large
files. It is still possible to scroll forwards and
backwards in time, but impossible to zoom out to longer
times than initially specified.
For example:
Es -s100 -e130 bigfile.sfs
18. How can I decide what to display?
Items may be selected for display using the '-i' and '-a'
switches to Es or Ds. But a simpler method is to use the
tree display in Es - on menu options FILE - TREE. Each box
on the tree represents an item, and you can select the
items you want to display. You can jump straight to the
tree display with the '-t' option on Es. For example:
Es -t manyitems.sfs
19. How can I annotate a signal file?
Use the '-l name' switch to Es. The string 'name' is a
name given to the set of annotations. SFS files may
contain a number of annotation sets and this name
differentiates them.
Es will then display an annotation area at the bottom of
the display, and have a new ANNO menu button. To add an
annotation (i) position the left cursor, (ii) select ANNO,
(iii) type in a string on the keyboard, (iv) key RETURN.
Look at the Es manual page for more details.
20. How can I get a fundamental frequency trace?
The programs fxac and fxcep provide autocorrelation and
cepstral methods for fundamental frequency estimation from
speech signals. They have a default set of parameters that
work pretty well on a clean signal.
21. How can I get an energy trace?
The program envelope provides a method for generating a
TRACK item from a speech signal.
22. How can I get a set of formant estimates?
The program fmanal provides a set of formant estimates from
a speech signal. This has a default set of parameters that
work pretty well for clean 10kHz sampled speech signals.
23. How can I filter the signal?
The program genfilt provides general-purpose low-pass,
high-pass, band-pass and band-stop filters using recursive
digital filter designs.
Low-pass at 100Hz:
genfilt -l 100 file.sfs
High-pass at 2000Hz:
genfilt -h 2000 file.sfs
Band-pass between 300 and 3500Hz:
genfilt -h 300 -l 3500 file.sfs
Band-stop between 3000 and 4000Hz:
genfilt -l 3000 -h 4000 file.sfs
24. How can I change the sampling rate?
The program resamp provides a general purpose
interpolation/decimation facility for changing sampling
rates by small integer ratios.
For example:
resamp -f 44100 file.sfs
25. How can I change the speed/pitch of the signal.
The program respeed provides a general purpose retiming
facility for speeding-up or slowing down speech without
changing the pitch.
The program repitch provides a special purpose method for
changing speed AND pitch, but requires a set of pitch epoch
annotations (try pp and txan).
26. How can I import data from some other source?
To import into an SFS file, the empty SFS file must be
created first - this is your opportunity to identify the
speaker, source and utterance to the system. Use the 'hed'
program to create an empty SFS file:
hed newfile.sfs
and answer the questions, or
hed -n newfile.sfs
for the truly lazy.
Speech signals may be imported from almost any format. The
key is the slink sprogram which creates a pointer in an SFS
file which indicates where and how data may be read in.
The data itself is not copied by slink into the SFS file;
to do this use slink followed by scopy.
For example, to link into a binary file with 16-bit samples
in natural byte order at 20000 samples/sec:
slink -isp -f20000 ipfile.dat opfile.sfs
Or to link to a monophonic .WAV file:
slink -isp -tWAV ipfile.wav opfile.sfs
Look at the manual page for slink for all options and
formats.
27. How can I export a signal or annotations, etc?
The program splist allows the export of signals in binary
files and other formats.
The program sfs2wav creates Windows compatible .WAV files
and supports multiple channels.
The program anlist creates text representations of
annotations.
The programs sylist, colist, fmlist, trlist, etc export
other data sets.
Refer to the manual pages for these programs for details of
export formats supported.
This software is copyright University College London 1987-1998.
No part of the software may be sold, but copies may be made and
the software modified and distributed free of charge providing
the copyright of University College London continues to be
demonstrated.
This software bears no warranty or guarantee of any kind.
UCL and Mark Huckvale are unable to support this software. While
bug-fixes are welcome, requests for help may be ignored.
Mark Huckvale
Phonetics and Linguistics
University College London
Gower Street
London WC1E 6BT
SFS@pals.ucl.ac.uk