We’re going to be working this week with the same data set we worked with last week:
load("/shared/groups/jrole001/pals0047/data/aurora_data.Rda")
Let’s kick things off with an easy warm-up exercise. Start by plotting the fundamental frequency values over time:
plot(time, f0)
using the techniques from this week’s lecture, determine the time point of the f0 peak (i.e. maximum f0), in milliseconds
Now that we’re warmed up, we’re going to introduce a concept that we’ll explore in detail next week: a data frame.
The following code will create a data frame that contains information
about the individual phones (NB: @
is the SAMPA convention
for the schwa vowel), the syllable number within the three-syllable
word, and the time points (in milliseconds) of the start and end of each
phone segment:
segments <- data.frame(
phone = c("o","r","o","r","@"),
syllable = c(1,2,2,3,3),
start = c(0,84,166,285,384),
end = c(84,166,285,384,490)
)
Now let’s see what this newly created data frame looks like. As we’ll see next week, this data frame is formed of 5 rows and 4 columns:
segments
## phone syllable start end
## 1 o 1 0 84
## 2 r 2 84 166
## 3 o 2 166 285
## 4 r 3 285 384
## 5 @ 3 384 490
time to think: why are most of the time points the exact same for both the “start” and “end” column, but offset by exactly one row? what is the relationship between these two columns?
Given your answer to Exercise 1, determine the answers to the following questions:
We saw in this week’s lecture how we can use Boolean expressions and operators directly to slice and extract data that meet the relevant Boolean conditions.
Using this technique, create three different
variables that contain the F0 values associated with the
three syllables of the word. Name your three variables
s1
, s2
, and s3
.
s1 <- f0[time>=0 & time<0.084]
s2 <- f0[time>=0.084 & time<0.285]
s3 <- f0[time>=0.285 & time<=0.490]
If you did the slicing correctly, your three variables should contain
the following data (NB: the ;
operator allows you to place
different commands on the same line of code):
s1; s2; s3
## [1] 129 130 129 129 129 131 132 132 131
## [1] 129 127 126 125 124 126 129 131 132 133 133 133 135 136 138 141 143 145 146
## [20] 146
## [1] 146 145 143 140 138 134 129 125 117 110 103 98 92 90 87 88 87 86 85
## [20] 85 81
After creating the three separate variables, you will be able to answer the following questions:
hint #1: use the length()
function to determine the number of SAMPLES
hint #2: the increment of the time values is 0.01 seconds, so the duration of data with 3 samples would be exactly 0.03 seconds (or 30 ms), for example
For this last exercise, you’ll be combining many of the techniques you learned this week, plus learning a new one, in order to find the location of peaks within a signal. You’ll be working with the F1 data, so let’s go ahead and plot it:
plot(time, f1)
Your goal for this exercise is to determine, algorithmically, the time points associated with three different peaks (two minima, and one maximum), indicated here by the solid red dots:
plot(time, f1)
points(0.12, f1[time==0.12], col='red', pch=16)
points(0.22, f1[time==0.22], col='red', pch=16)
points(0.32, f1[time==0.32], col='red', pch=16)
The first point is the easiest, because we can see that it is the
minimum (i.e. smallest value) across the entire range of F1
values. Because of this, we can simply use which.min()
to find its time point:
time[which.min(f1)]
## [1] 0.12
That’s easy enough. But what about the second point, the peak that
occurs around 0.2 seconds? If we use the which.max()
function to index across the entire data range, we get a value that
isn’t close at all:
time[which.max(f1)]
## [1] 0.49
The problem here is that the functions which.max()
and
which.min()
will find the index of the absolute maximum and
minimum across the entire range of data. This is called the
global maximum (or global minimum).
However, the second point indicated in red is what’s called a
local maximum: it is a maximum relative to its
neighboring values, even though it may not be the maximum of
all of the values.
What this means is that you will need to reduce the range of
data that you put into the which.max()
function, by
first slicing the data using Boolean operation. In other words, you will
first create a local subset of data, and then find which is the
global maximum within that local range. Make
sense?
An added complication is that once you slice the data, the range of indices changes, which you will need to account for when determining your final answers by creating an offset value. What do I mean by this? Let’s take a look at an example using a ruler:
If we consider each 0.5 cm tick on this ruler as an index, then the 3 cm mark (denoted in red) is the 7th tick. In other words, the 3 cm mark has an index of 7.
Now let’s “zoom in” to only the range of values between 3 and 7 cm (inclusive of 3 and 7). In Boolean terms, this is the range of tick marks that are greater than or equal to 3 and less than or equal to 7:
Now that we have “zoomed in” on the data, the index for the 3 cm tick
mark is no longer 7… it is 1! However, the “real world” value has not
changed: the tick mark still corresponds to 3 cm. To account for this
difference, we need to be aware of how much data was excluded on the
left edge, i.e. how many FALSE
Boolean values were on the
left edge (the values that were less than 3). This is an
offset that will need to be added to the new index,
1:
offset <- 6
index <- 1
index + offset
## [1] 7
Let’s take a look at an example of how we can approach this problem using the first minimum, which we have already found:
time[which.min(f1)]
## [1] 0.12
If we look at the plot above, we know that this minimum is somewhere between 0.05 s and 0.15 s. So let’s use that range to create a Boolean expression:
time >= 0.05 & time <= 0.15
## [1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [13] TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [49] FALSE FALSE
There are exactly 5 false
Boolean values on the left
edge of the evaluation, so the offset is 5. We don’t
have to manually count this offset, however. We can instead use the
which()
function:
which(time >= 0.05 & time <= 0.15)
## [1] 6 7 8 9 10 11 12 13 14 15 16
The output of the which()
function tells us which
indices give a TRUE
value for the expression. So, given the
above expression, the indices 6-16 fall within the range of 0.05 s to
0.15 s. What this means is that our offset value is 1 less than the
first TRUE
index. We can therefore determine the
offset in an algorithmic way by following this logic:
matches <- which(time >= 0.05 & time <= 0.15)
offset <- matches[1] - 1
Now that we have the offset, we can add it to the local minimum within the selected range:
peak <- which.min(f1[time >= 0.05 & time <= 0.15])
peak <- peak + offset
Our result should now match what we found previously:
time[peak]
## [1] 0.12
time[which.min(f1)]
## [1] 0.12
using these techniques, determine the time points (in milliseconds) of the local maximum near 0.2 s and the local minimum near 0.3 s
# local maximum near 0.2 s
t1 <- 0.15
t2 <- 0.25
matches <- which(time >= t1 & time <= t2)
offset <- matches[1] - 1
peak <- which.max(f1[time >= t1 & time <= t2])
peak <- peak + offset
time[peak]
## [1] 0.22
# local minimum near 0.3s
t1 <- 0.25
t2 <- 0.4
matches <- which(time >= t1 & time <= t2)
offset <- matches[1] - 1
peak <- which.min(f1[time >= t1 & time <= t2])
peak <- peak + offset
time[peak]
## [1] 0.32