SPEECH FILING SYSTEM V3.30 FREQUENTLY ASKED QUESTIONS July 1998 ------------------------------------------------------------- CONTENTS Installation 1. What platforms does SFS run on? 2. What graphics display devices are supported? 3. What graphics printing devices are supported? 4. What graphics file formats are supported? 5. What Analogue to Digital Converters are supported? 6. What Digital to Analogue Converters are supported? 7. What audio file formats are supported? 8. How do I obtain SFS? 9. What other software do I need to run SFS on DOS? 10. What environment variables do I need to set? Common Problems 11. Why doesn't replay work? 12. Why doesn't display work? 13. Why doesn't printing work? 14. What does 'labels file out of date' mean? Introduction to Items 15. What do all these '-i' switches mean? How To ... 16. How can I display/print spectrograms? 17. How can I view part of large file? 18. How can I decide what to display? 19. How can I annotate a signal file? 20. How can I get a fundamental frequency trace? 21. How can I get an energy trace? 22. How can I get a set of formant estimates? 23. How can I filter the signal? 24. How can I change the sampling rate? 25. How can I change the speed of the signal? 26. How can I import data from some other source? 27. How can I export a signal or annotations? ---------------------------------------------------------------- INSTALLATION 1. What platforms does SFS run on? Binaries are available for MSDOS, WIN32, SPARC/SunOS, SPARC/SOlaris, X86/Linux. The DOS version works under MSDOS directly, and also in a DOS box under Windows 3.1 or Windows 95. The WIN32 version runs in a console under Windows 95/98 or Windows NT. The SunOS version is compiled on a SPARC-10 running SunOS4.1.3. The Solaris version is compiled on a UltraSparc running Solaris 2.5. SFS has been ported to other Unix systems: DEC Alpha, Hewlett Packard, and Masscomp. However these must be compiled from the sources provided. In each case, the use of the GNU 'C' compiler is encouraged. SFS was originally developed with a K&R C compiler, but over time has moved to ANSI-C using conditional compilation. 2. What graphics display devices are supported? On DOS systems, SFS uses a graphics library called GRX, that was developed by Csaba Biegl. This supports most Super-VGA cards. It may be found with the DJGPP distribution. With the WIN32 version, SFS uses the windows API for graphics. On Unix systems, SFS uses the X-Windows library. There is some support for graphics terminals operating over serial lines, but this is not encouraged. UCL have developed a graphical telnet program for PCs with NFS (contact the author). SFS will run on an X-terminal or PC X-terminal emulator with 256 simultaneous colours. 3. What graphics printing devices are supported? Postscript and Epson Stylus Pro Colour printers. The WIN32 version will also print to supported Windows printers. 4. What graphics file formats are supported? On DOS, Encapsulated postscript, WordPerfect graphics files and GIF files. On Unix, Encapsulated postscript and GIF files. 5. What Analogue to Digital Converters are supported? On DOS, SoundBlaster-16 and Laryngograph Ltd PCLX. On WIN32, the use of the multimedia driver means that any sound card is supported. However a 16-bit card with constantly variable sampling rates is recommended. On Unix, Sun 8-bit and Sun DBRI 16-bit interface. 6. What Digital to Analogue Converters are supported? On DOS, SoundBlaster-8, SoundBlaster-16, Laryngograph PCLX, UCL's own expansion bus replay card. On WIN32, the use of the multimedia driver means that any sound card is supported. However a 16-bit card with constantly variable sampling rates is recommended. On Unix, Sun DBRI 16-bit, and Sun 8-bit; Network replay using Vista Exceed X-window emulator and UCL telnet program (contact author); output to a shell script for use with AudioFile or other systems. Linux dsp device. 7. What audio file formats are supported? SFS maintains its own file format for data. It needs this because it maintains a processing history of each data set; this allows a user to keep track of the origin and processing of any piece of data. SFS also tries to keep data sets together in a single file, to try and make the user interface simpler. This means that the SFS file format must allow multiple copies of multiple types of data in a single file; and this precludes the use of other file formats. To deal with other data file formats, SFS provides utilities for importing and exporting data. For importing signals, it is often unnecessary to make a new physical copy of the signal; instead, a command 'slink' simply records in an SFS file the instructions for how and where to access the data in its original format. This makes access to large read-only databases of data very convenient. SFS can link to or read speech signals in the following file formats: binary files WAV format (RIFF format) VOC format AU format ILS format AIFF format HTK format common label file formats SFS can write speech signals to files in the following file formats (using *list programs for different data types): binary files WAV format (RIFF format) VOC format AU format AIFF format ILS format ESPS format HTK files (waveform, coefficient and annotations) common label file formats SFS can also read and write many data sets from/to a textual representation. 8. How do I obtain SFS? The easiest way is to obtain an execute-only package of binaries. These may be FTPd from ftp://pitch.phon.ucl.ac.uk/pub/sfs Look for files msdos1/sfs3ddbn.zip version 3.dd binaries (non DPMI) msdos2/sfs3ddbn.zip version 3.dd binaries (DPMI) win32/sfs3ddbn.zip version 3.dd binaries (WIN32) sunos/sfs3ddbn.tar.gz version 3.dd binaries (Sunos 4.1.x) solaris/sfs3ddbn.tar.gz version 3.dd binaries (Solaris 2.5) linux/sfs3ddbn.tar.gz version 3.dd binaries (Linux) For other Unix systems, you must configure and compile the sources from files: unix/sfs3dds1.tar.gz version 3.dd sources part 1 unix/sfs3dds2.tar.gz version 3.dd sources part 2 etc The sources are in logical components; in general the higher numbered components are the more esoteric. I suggest you just start with the first two. For MSDOS systems, look for the sources: msdos2/sfs3dds1.zip version 3.dd sources part 1 msdos2/sfs3dds2.zip version 3.dd sources part 2 etc A set of demonstration files may be found in: demo/sfsdemo.zip demo for MSDOS demo/sfsdemo.tar.gz demo for Unix 9. What other software do I need to run SFS on DOS? SFS is compiled using the port of GNU C, by D J Delorie; commonly known as the djgpp compiler. This is a flat memory model 32-bit compiler that runs in protected mode. Version 1 generates code that will run on straight DOS only, version 2 will run in a DOS box under Windows (3.1 or 95). To launch protected mode programs, version 1 of djgpp supplies a run-time environment called GO32.EXE. A recent version of this is included in the SFS binary distribution. Later versions are available with the djgpp package (which may be downloaded from the SimTel archive in SimTel/vendors/djgpp). Version 2 of DJGPP uses DPMI services directly without need for GO32.EXE. SFS runs under version 1 or version 2 of DJGPP. To support SVGA graphics, SFS uses a graphics library called GRX written by Csaba Biegl. To operate the graphics card, this library uses a driver routine which must be compatible with your hardware. Fortunately, most modern hardware supports the VESA standard for SVGA modes. The GRX driver for VESA compatible cards is included in the SFS binary distribution. If your card does not support VESA modes, then you will need to get a different driver from the GRX distribution. To operate in a DOS box under Windows 3.1 or Windows 95, you need to use version 2 of the GRX library. 10. What environment variables do I need to set? SFS requires the user to set environment variables to allow it to (i) find its home directory, (ii) identify the graphics display device, (iii) identify the graphics printing device, (iv) identify the digital-to-analogue converter device, and (v) identify the analogue-to-digital converter device. Variable Example Settings Meaning -------- ---------------- ------- SFSBASE /app/sfs Installation directory is /app/sfs GTERM xterm X-Windows svga-256 800x600x256 colour xvga-256 1024x768x256 colour GPRINT printer postscript printer on stdprn LPT1 postscript printer on LPT1 winprint Windows printer eps EPS file output DAC sun16 Sun DBRI 16-bit sb16 SoundBlaster 16-bit win32 Windows multimedia ADC sun16 Sun DBRI 16-bit sb16 SoundBlaster 16-bit win32 Windows multimedia -------------------------------------------------------------- COMMON PROBLEMS 11. Why doesn't replay work? Replay is the most difficult thing to get right in an installation. If you have had replay working once, then the most common cause of failure is that the DAC environment variable is not set. The 'replay' program should report an error if it can't determine the replay device. Other suggestions:  The replay device is not supported (see SFSCONFG.h)  The replay device is not compiled in (see SFSCONFG.h)  Your machine doesn't have a working replay card. Surprisingly, a common problem. SFS needs to control DAC hardware on the machine the program is actually executing on.  Volume control turned down, speaker not connected, signal level is too low.  For SoundBlaster, the BLASTER variable is not set, or interrupt > 8 being used. The complete set of settings for DAC may be found in the manual page for replay. 12. Why doesn't display work? Mostly because the GTERM environment variable is not set. Look in the file $(SFSBASE)/data/digmap to find the list of supported settings. Graphics devices also need to be compiled in, see SFSCONFG.h. If you get a message saying that the output is being redirected into a 'metafile' then this is because SFS cannot determine the type of the graphics device. Try 'set GTERM=vga-16' for DOS and 'setenv GTERM xterm" on Unix. The complete set of settings for GTERM may be found in the file $SFSBASE/data/digmap. Check that this text file is in Unix format for Unix machines, and in DOS format for DOS machines - formats have been confused in the past. 13. Why doesn't printing work? Mostly because the GPRINT environment variable is not set. Look in the file $(SFSBASE)/data/digmap to find a list of supported settings. Graphics devices also need to be compiled in, see SFSCONFG.h. Details of how to set up printing under Unix may be found in the installation notes. 14. What does 'labels file out of date' mean? SFS uses a text file to convert processing histories into English descriptions. By default this is $(SFSBASE)/data/labels. This file is indexed to make it fast to access. When it is installed on a new machine, its date may be updated and SFS thinks that the file has been changed but not re-indexed. Solution: run the prolab program on the labels file: prolab $(SFSBASE)/data/labels The labels file is described in the User Manual. Check that this text file is in Unix format for Unix machines, and in DOS format for DOS machines - formats have been confused in the past. ---------------------------------------------------------------- INTRODUCTION TO ITEMS 15. What do all these -i switches mean? An SFS file can contain many different data sets; it can contain multiple speech signals, annotations, formant or fundamental frequency data, etc. SFS uses this grouping of data to maintain a 'processing history', a record of the antecedents of each data set (or 'item'). To refer to a particular piece of data within an SFS file, every SFS program understands a common 'item numbering', and the '-i' switches specify the item number to the program. Item number are made up from two components: a major data type code and a simple count code. The most common major types are listed below: Major type Mnemonic Description ---------- -------- ----------- 1 SP Speech pressure waveform 2 LX Laryngograph waveform 3 TX Larynx period data 4 FX Fundamental frequency data 5 AN Annotations 7 SY Synthesizer control data 9 DI Grey-level display data 10 CO Spectral coefficients 12 FM Formant estimates 16 TR Parameter tracks The count code simply records the index number of the data type in the file. If there are two speech items then they will have count codes of 1 and 2. An item number then, consists of a major type, a period and a count code; e.g. 1.01 or 10.05, corresponding to the first speech item in the file and the fifth coefficient item. Since numbers are hard to remember, the major type numbers may also be replaced by the two-letter mnemonics in the table above; e.g. sp.01 or co.05. Note that the use of a leading zero for the count code is optional. A given SFS program, then, that processes a single data set needs to be able to identify which data set from a given file to use as input. If there is only one data set in the file of the appropriate type for the program, then the program uses that automatically. If there is more than one data set of the input type, the program will usually select the last item of the appropriate type. However if this is not what you want, you need to tell the program which item to process using the -i switch. Take as an example that you want to compare a piece of speech low-pass filtered at 2000Hz with it high-pass filtered at 2000Hz. The file starts with a single speech item, numbered 1.01. This is then processed by genfilt: genfilt -l 2000 file.sfs Which generates an item 1.02 in the file. However the command genfilt -h 2000 file.sfs will not generate the second filtered signal as you might have wished. Genfilt in this instance will take as its input item 1.02, which (we know) has been filtered already. Instead we need the command. genfilt -i1.01 -h 2000 file.sfs Which processes the original signal instead as we required. There exists a short hand for the first and the last items of a given type. The first item in the file of a given type may be selected by using an item number made up from the major type followed by a period only, the last item may be selected by using the major type only. Thus 'sp.' refers to the first speech item, 'sp' refers to the last. ------------------------------------------------------------- HOW TO ... 16. How do I display/print spectrograms? The main display program Es has the capability of calculating, displaying and printing spectrograms as you work. To start up Es with display of a speech waveform and a spectrogram, use: Es -isp -gsp file.sfs Es has menu options to produce a hard-copy of the signal displayed on the screen. The program sprint will also print spectrograms directly to the printer. The programs espect and esform will display and print spectrograms with spectral cross-sections. 17. How do I view part of large file? Use the '-s' and '-e' switches on Es to specify the initial starting and ending times displayed. Since Es attempts to read an entire data set into memory before displaying it, it is necessary to specify the initial times for very large files. It is still possible to scroll forwards and backwards in time, but impossible to zoom out to longer times than initially specified. For example: Es -s100 -e130 bigfile.sfs 18. How can I decide what to display? Items may be selected for display using the '-i' and '-a' switches to Es or Ds. But a simpler method is to use the tree display in Es - on menu options FILE - TREE. Each box on the tree represents an item, and you can select the items you want to display. You can jump straight to the tree display with the '-t' option on Es. For example: Es -t manyitems.sfs 19. How can I annotate a signal file? Use the '-l name' switch to Es. The string 'name' is a name given to the set of annotations. SFS files may contain a number of annotation sets and this name differentiates them. Es will then display an annotation area at the bottom of the display, and have a new ANNO menu button. To add an annotation (i) position the left cursor, (ii) select ANNO, (iii) type in a string on the keyboard, (iv) key RETURN. Look at the Es manual page for more details. 20. How can I get a fundamental frequency trace? The programs fxac and fxcep provide autocorrelation and cepstral methods for fundamental frequency estimation from speech signals. They have a default set of parameters that work pretty well on a clean signal. 21. How can I get an energy trace? The program envelope provides a method for generating a TRACK item from a speech signal. 22. How can I get a set of formant estimates? The program fmanal provides a set of formant estimates from a speech signal. This has a default set of parameters that work pretty well for clean 10kHz sampled speech signals. 23. How can I filter the signal? The program genfilt provides general-purpose low-pass, high-pass, band-pass and band-stop filters using recursive digital filter designs. Low-pass at 100Hz: genfilt -l 100 file.sfs High-pass at 2000Hz: genfilt -h 2000 file.sfs Band-pass between 300 and 3500Hz: genfilt -h 300 -l 3500 file.sfs Band-stop between 3000 and 4000Hz: genfilt -l 3000 -h 4000 file.sfs 24. How can I change the sampling rate? The program resamp provides a general purpose interpolation/decimation facility for changing sampling rates by small integer ratios. For example: resamp -f 44100 file.sfs 25. How can I change the speed/pitch of the signal. The program respeed provides a general purpose retiming facility for speeding-up or slowing down speech without changing the pitch. The program repitch provides a special purpose method for changing speed AND pitch, but requires a set of pitch epoch annotations (try program pp if you don't have a Laryngograph). 26. How can I import data from some other source? To import into an SFS file, the empty SFS file must be created first - this is your opportunity to identify the speaker, source and utterance to the system. Use the 'hed' program to create an empty SFS file: hed newfile.sfs and answer the questions, or hed -n newfile.sfs for the truly lazy. Speech signals may be imported from almost any format. The key is the slink sprogram which creates a pointer in an SFS file which indicates where and how data may be read in. The data itself is not copied by slink into the SFS file; to do this use slink followed by scopy. For example, to link into a binary file with 16-bit samples in natural byte order at 20000 samples/sec: slink -isp -f20000 ipfile.dat opfile.sfs Or to link to a monophonic .WAV file: slink -isp -tWAV ipfile.wav opfile.sfs Look at the manual page for slink for all options and formats. 27. How can I export a signal or annotations, etc? The program splist allows the export of signals in binary files and other formats. The program sfs2wav creates Windows compatible .WAV files and supports multiple channels. The program anlist creates text representations of annotations. The programs sylist, colist, fmlist, trlist, etc export other data sets. Refer to the manual pages for these programs for details of export formats supported. -------------------------------------------------------------- This software is copyright University College London 1987-1997. No part of the software may be sold, but copies may be made and the software modified and distributed free of charge providing the copyright of University College London continues to be demonstrated. This software bears no warranty or guarantee of any kind. UCL and Mark Huckvale are unable to support this software. While bug-fixes are welcome, requests for help may be ignored. Mark Huckvale Phonetics and Linguistics University College London Gower Street London WC1E 6BT SFS@phon.ucl.ac.uk