Speech Processing by Computer
LAB 8
SIGNAL GENERATION FOR
SYNTHESIS
In this lab session we will experiment with three different signal generation methods and compare the results. We will use methods for prosody manipulation to change the pitch and timing of natural speech. We will use a diphone concatenation system to produce synthetic copies of a natural utterance. We will use a formant synthesis by rule system to produce a second set of synthetic copies. We will then compare the different versions by listening to them.
Control file format
The
prosody manipulation program repros, the MBROLA diphone synthesis system
mbrsynth and the formant synthesis system phosynth all use the
same control file format. Control files
are plain text files that can be created with Windows notepad. The format of this file is as follows:
each line describes 1 phonetic
segment
a line has three parts separated
by spaces:
(i) the name of the
phone
(ii) the duration of the
phone in milliseconds
(iii) the required Fx
contour through the segment
the phone names are in SAMPA
format:
Consonants |
p, b, t, d, k, g, tS, dZ, f, v, T, D,
s, z, S, Z, h, m, n, N, l, r, w, j, 5 (=dark-l) |
Vowels |
i:, I, e, {, V, A:, Q, O:, U, u:, @,
3:, aI, OI, eI, @U, aU, e@, I@, U@ |
Silence |
_ |
the Fx contour is specified as a list of pairs of numbers, each pair consists
of
(i)
the % duration of the segment at which the Fx is to be set
(ii)
the Fx value at that point in Hz.
Thus the following control file will say
"Mark":
_
100
m
125 0 125
A:
500 0 120 100 100
k
250
_
100
Read this as: silence for 100ms, [m]
lasts 125ms starting at 125Hz, [a:] lasts 500ms, starting at 120Hz, ending at
100Hz, [k] lasts 250ms, end with silence of 100ms.
1.
Record a
natural utterance
a.
Design a
short sentence (4/5 words max) with a mix of segments and an interesting
prosody
b.
Record
Speech and Lx at 16,000 samples/sec
c.
Generate Tx
and Fx
d.
Save to
file SENT.SFS
2.
Annotate
natural utterance
a.
add SAMPA
annotations to SENT.SFS
b.
remember to
annotate silence as _
c.
let Mark
check your annotations for correctness
3.
Generate
.PHO control files
a.
choose
Tools/Annotations/Export/Export as MBROLA
b.
save
control file as NN.PHO
c.
open NN.PHO
in notepad
d.
change the
durations of segments to the values specified in the table overleaf.
e.
save as
SN.PHO
f.
open NN.PHO
g.
change the
pitch contour so that all specifications are deleted except for 0 150 on the
first segment and 100 100 on the last segment. This will cause a uniform fall in pitch from 150Hz to 100Hz over
the utterance.
h.
save as
NS.PHO
i.
open SN.PHO
j.
change the
pitch contour again as in step g
k.
save as
SS.PHO
4.
Generate
variants of natural utterance
a.
select item
1.01 and choose Tools/Speech/Process/Prosody Change. Use the SN.PHO control file
b.
select item
1.01 and choose Tools/Speech/Process/Prosody Change. Use the NS.PHO control file
c.
select item
1.01 and choose Tools/Speech/Process/Prosody Change. Use the SS.PHO control file
5.
Generate
diphone versions
a.
choose
Tools/Generate/MBROLA synthesis. Use
the NN.PHO control file and the en1 database of diphones.
b.
choose
Tools/Generate/MBROLA synthesis. Use
the SN.PHO control file and the en1 database of diphones.
c.
choose
Tools/Generate/MBROLA synthesis. Use
the NS.PHO control file and the en1 database of diphones.
d.
choose
Tools/Generate/MBROLA synthesis. Use
the SS.PHO control file and the en1 database of diphones.
6.
Generate
formant versions
a.
choose
Tools/Generate/Synthesis by rule. Use
the NN.PHO control file.
b.
choose
Tools/Synthesis Data/Synthesize speech to make a speech signal.
c.
choose
Tools/Generate/Synthesis by rule. Use
the SN.PHO control file.
d.
choose
Tools/Synthesis Data/Synthesize speech to make a speech signal.
e.
choose
Tools/Generate/Synthesis by rule. Use
the NS.PHO control file.
f.
choose
Tools/Synthesis Data/Synthesize speech to make a speech signal.
g.
choose
Tools/Generate/Synthesis by rule. Use
the SS.PHO control file.
h.
choose
Tools/Synthesis Data/Synthesize speech to make a speech signal.
7.
Compare
versions
a.
you should
have 12 different speech items:
1.01 |
Natural Speech |
Natural Durations |
Natural Pitch |
1.02 |
|
Synthetic Durations |
Natural Pitch |
1.03 |
|
Natural Durations |
Synthetic Pitch |
1.04 |
|
Synthetic Durations |
Synthetic Pitch |
1.05 |
Diphone Synthesis |
Natural Durations |
Natural Pitch |
1.06 |
|
Synthetic Durations |
Natural Pitch |
1.07 |
|
Natural Durations |
Synthetic Pitch |
1.08 |
|
Synthetic Durations |
Synthetic Pitch |
1.09 |
Formant Synthesis |
Natural Durations |
Natural Pitch |
1.10 |
|
Synthetic Durations |
Natural Pitch |
1.11 |
|
Natural Durations |
Synthetic Pitch |
1.12 |
|
Synthetic Durations |
Synthetic Pitch |
b.
what is
more important, natural durations or natural pitch? Compare 2 & 3, 6 & 7, 10 & 11.
c.
does good
prosody compensate for poor voice quality?
Compare 5 & 4, 9 & 8.
& |
140 |
3: |
240 |
5 |
80 |
@ |
90 |
@U |
230 |
A: |
220 |
D |
100 |
I |
100 |
I@ |
210 |
N |
130 |
O: |
210 |
OI |
240 |
Q |
140 |
S |
180 |
T |
160 |
U |
100 |
U@ |
230 |
V |
155 |
Z |
70 |
_ |
100 |
aI |
220 |
aU |
240 |
b |
115 |
d |
75 |
dZ |
170 |
e |
125 |
e@ |
270 |
eI |
230 |
f |
130 |
g |
90 |
h |
160 |
i: |
140 |
j |
110 |
k |
140 |
l |
80 |
l~ |
80 |
m |
110 |
n |
130 |
p |
130 |
r |
80 |
s |
125 |
t |
130 |
tS |
210 |
u: |
155 |
v |
85 |
w |
80 |
z |
140 |
{ |
140 |