Speech
Processing by Computer
LAB 5
FORMANT
AND FUNDAMENTAL FREQUENCY ANALYSIS
This lab session compares two methods for formant
analysis and two methods for fundamental frequency analysis on a continuous
utterance. Fixed-frame LPC analysis is
compared with pitch-synchronous LPC analysis for formant frequency
estimation. The time-domain
autocorrelation is compared with the cepstrum method for fundamental frequency
analysis.
1. Time-domain
fundamental frequency analysis: autocorrelation
(i) Acquire an utterance of about a dozen
words at 20000 samples/second with a simultaneous Laryngograph input. This will generate our 'reference'
fundamental trace.
(ii) Use the 'HQtx' and the 'fx' programs to
convert Lx to Tx and Tx to Fx for display.
(iii) Use the 'fxac' program to estimate a
fundamental frequency trace by the autocorrelation method. Compare the two traces. In which areas does the autocorrelation
method fail? Use the option to store
the autocorrelation function for display and study the areas where the method
failed.
2. Frequency-domain
fundamental frequency analysis: cesptrum
(i) Use same utterance as 1.
(ii) Use the 'fxcep' program to estimate a
fundamental frequency trace using the cepstrum method. Compare the trace to the reference
trace. In which areas does the cepstrum
method fail? Use the option to store the cestrum function for display and study
the areas where the method failed.
(iii) Compare the results of the two
methods. Which appears to be superior
for your utterance?
3. Fixed-frame
formant analysis
(i) Use the same utterance as 1.
(ii) Downsample the utterance to a rate of
10,000 samples/second. Use the 'resamp'
program.
(iii) Run the 'fmanal' program with a window
size of 20ms and a step size of 10ms to get a set of formant estimates. Study the results. Where does the program produce good estimates and where does it
fail? When it does fail, what are the
common consequences?
4. Pitch-synchronous formant analysis
(i) Use the same downsampled utterance as
3.
(ii) Run
the fmanal program with the '-t0' switch to force pitch-synchronous
analysis. Compare the two sets of
formant estimates. Which appears to be
superior for your utterance? In what
circumstances are the pitch-synchronous estimates better or worse than the
fixed-frame estimates?
5. Formant
tracking
(i) Formant track the estimates produced in
3. and 4. using the 'fmtrack' program in combination with the best fundamental
frequency trace. Compare the two sets
of tracked formants, which seems superior?
(ii) Resynthesize the utterance from the
better formant tracks using the 'soft' (software formant synthesizer)
program. How does it compare with the
original?