Speech Processing by Computer

LAB 5

FORMANT AND FUNDAMENTAL FREQUENCY ANALYSIS

This lab session compares two methods for formant analysis and two methods for fundamental frequency analysis on a continuous utterance. Fixed-frame LPC analysis is compared with pitch-synchronous LPC analysis for formant frequency estimation. The time-domain autocorrelation is compared with the cepstrum method for fundamental frequency analysis.

1. Time-domain fundamental frequency analysis: autocorrelation

(i) Acquire an utterance of about a dozen words at 20000 samples/second with a simultaneous Laryngograph input. This will generate our 'reference' fundamental trace.

(ii) Use the 'HQtx' and the 'fx' programs to convert Lx to Tx and Tx to Fx for display.

(iii) Use the 'fxac' program to estimate a fundamental frequency trace by the autocorrelation method. Compare the two traces. In which areas does the autocorrelation method fail? Use the option to store the autocorrelation function for display and study the areas where the method failed.

2. Frequency-domain fundamental frequency analysis: cesptrum

(i) Use same utterance as 1.

(ii) Use the 'fxcep' program to estimate a fundamental frequency trace using the cepstrum method. Compare the trace to the reference trace. In which areas does the cepstrum method fail? Use the option to store the cestrum function for display and study the areas where the method failed.

(iii) Compare the results of the two methods. Which appears to be superior for your utterance?

3. Fixed-frame formant analysis

(i) Use the same utterance as 1.

(ii) Downsample the utterance to a rate of 10,000 samples/second. Use the 'resamp' program.

(iii) Run the 'fmanal' program with a window size of 20ms and a step size of 10ms to get a set of formant estimates. Study the results. Where does the program produce good estimates and where does it fail? When it does fail, what are the common consequences?

4. Pitch-synchronous formant analysis

(i) Use the same downsampled utterance as 3.

(ii) Run the fmanal program with the '-t0' switch to force pitch-synchronous analysis. Compare the two sets of formant estimates. Which appears to be superior for your utterance? In what circumstances are the pitch-synchronous estimates better or worse than the fixed-frame estimates?

5. Formant tracking

(i) Formant track the estimates produced in 3. and 4. using the 'fmtrack' program in combination with the best fundamental frequency trace. Compare the two sets of tracked formants, which seems superior?

(ii) Resynthesize the utterance from the better formant tracks using the 'soft' (software formant synthesizer) program. How does it compare with the original?