Yesterday I posted about the normalization functions in the tidynorm R package. In order to implement formant track normalization, I had to also put together code for working with the Discrete Cosine Transform (DCT), which in and of itself can be handy to work with.
The DCT
I’ve posted about the DCT before, but to put it briefly, the DCT tries to re-describe an input signal in terms of weighted and summed cosine functions. The DCT basis looks like this:
plotting code
dct_mat <- dct_basis(100, k = 5)
as_tibble(
dct_mat,
.name_repair = "unique"
) |>
mutate(
x = row_number()
) |>
pivot_longer(
starts_with("...")
) |>
ggplot(
aes(x, value, color = name)
) +
geom_line(
linewidth = 1
) +
guides(color = "none") +
labs(y = NULL) +
theme_no_x()->p
p
p+theme_dark()
If we grab one vowel’s formant track and fit a linear model using these functions as predictors, the coefficients will equal the DCT coefficients.
plotting code
one_ay |>
ggplot(
aes(t, F1)
)+
geom_point(color = ptol_red, size = 2)->
p
p
p + theme_dark()
# 5 dct coefficients
# for a formant track with
# 20 measurement points
dct_mat <- dct_basis(20, k = 5)
dct_mod <- lm(one_ay$F1 ~ dct_mat - 1)
dct_direct <- tidynorm::dct(one_ay$F1)[1:5]
cbind(
coef(dct_mod),
dct_direct
)
dct_direct
dct_mat1 602.3486557 602.3486557
dct_mat2 97.7676452 97.7676452
dct_mat3 -0.4687751 -0.4687751
dct_mat4 -7.1061819 -7.1061819
dct_mat5 -19.4956181 -19.4956181
Using the DCT to Smooth
A cool thing about the DCT is that it can be used to smooth formant tracks. We can see that smoothing effect if we plot the inverse DCT of the coefficients we just got.
plotting code
In tidynorm()
we can get these smoothed formant tracks with reframe_with_dct_smooth()
.
# grabbing a sample of
# a few vowel tracks
set.seed(2025-06)
speaker_tracks |>
filter(
speaker == "s01",
plt_vclass == "ay0"
) |>
filter(
id %in% sample(unique(id), 5)
) ->
ay_tracks
# smoothing happens here
ay_tracks |>
reframe_with_dct_smooth(
F1:F3,
.token_id_col = id,
.time_col = t
) ->
ay_smooths
Averaging formant tracks
Something that’s really handy about DCT coefficients is they let you average over formant tracks of vowel tokens that are all different lengths. The process goes:
- Get the DCT coefficients for each token with
reframe_with_dct().
- Average over each vowel class and dct parameter with
dplyr::summarise()
. - Then, convert everything back into formant-tracks with
reframe_with_idct()
.
# Grabbing a subset of
# vowel classes
speaker_tracks |>
filter(
speaker == "s03",
str_detect(plt_vclass, "y")
) ->
y_vowels
# Step 1: Reframe with DCT
y_vowels |>
reframe_with_dct(
F1:F3,
.by = speaker,
.token_id_col = id,
.time_col = t
) ->
y_vowel_dct
# Step 2: average over vowel class
# and DCT parameter
y_vowel_dct |>
summarise(
.by = c(speaker, plt_vclass, .param),
across(
F1:F3,
mean
)
) ->
y_vowel_dct_mean
# Step 3: Convert back to tracks
# with the inverse DCT
y_vowel_dct_mean |>
reframe_with_idct(
F1:F3,
.by = speaker,
.token_id_col = plt_vclass,
.param_col = .param
) ->
y_vowel_averages
plotting code
y_vowel_averages |>
ggplot(
aes(F2, F1)
)+
geom_textpath(
aes(
color = plt_vclass,
label = plt_vclass
),
arrow = arrow(
type = "closed",
angle = 25,
length = unit(0.25, "cm")
),
linewidth = 1
)+
guides(color = "none")+
scale_x_reverse()+
scale_y_reverse()->
y_p
y_p
y_p + theme_dark()
Regressions with DCTs
A really cool thing about DCTs is that you can use them as outcome measures in a regression, and get some nuanced results with some very simple models. For example, let’s go to fit a model looking at the effect of voicing on the F1 of /ay/ (“ay”, vs “ay0”).
Step 1: get the data subset
speaker_tracks |>
filter(
speaker == "s03",
str_detect(plt_vclass, "ay")
) ->
ays
Step 2: get the DCTs
ays |>
reframe_with_dct(
F1:F3,
.token_id_col = id,
.time_col = t
)->
ay_dcts
Step 3: pivot wider
We need each DCT coefficient in its own column for this, so we’ll pivot wider.
ay_dcts |>
pivot_wider(
names_from = .param,
values_from = F1:F3
) ->
ay_dct_wide
speaker | id | vowel | plt_vclass | word | .n | F1_0 | F1_1 | F1_2 | F1_3 | F1_4 | F2_0 | F2_1 | F2_2 | F2_3 | F2_4 | F3_0 | F3_1 | F3_2 | F3_3 | F3_4 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | s03 | 0 | AY | ay | I | 20 | 508.0 | 2.6 | −27.2 | −7.5 | −15.3 | 908.8 | −60.2 | 3.8 | −1.4 | −5.9 | 1,556.1 | 60.0 | −58.8 | 78.7 | −52.5 |
2 | s03 | 15 | AY | ay | KIND | 20 | 442.6 | 118.2 | 14.6 | 36.3 | 11.8 | 849.9 | 134.7 | −82.7 | 124.9 | 6.3 | 1,535.4 | 87.3 | −97.2 | −37.3 | 41.2 |
3 | s03 | 28 | AY | ay | MY | 20 | 430.2 | −47.1 | −67.3 | −22.7 | −21.5 | 867.3 | −141.6 | 14.8 | 11.6 | 23.9 | 1,615.5 | −5.0 | −7.0 | −14.6 | 2.7 |
4 | s03 | 43 | AY | ay | I | 20 | 404.3 | −23.0 | −66.9 | 0.7 | −17.2 | 906.7 | −76.2 | 48.9 | 34.0 | −0.7 | 1,544.2 | −34.8 | 35.9 | 53.4 | 87.7 |
5 | s03 | 55 | AY | ay | I | 20 | 391.2 | 34.5 | −26.8 | −6.6 | −16.5 | 1,032.7 | −132.0 | 29.2 | −5.0 | 1.3 | 1,587.5 | −11.5 | 29.8 | 26.3 | −23.8 |
6..45 | |||||||||||||||||||||
46 | s03 | 674 | AY | ay0 | LIFE | 20 | 414.8 | 19.9 | −40.2 | 22.9 | −24.4 | 850.8 | −219.5 | 107.1 | −32.5 | −4.3 | 1,573.9 | −88.6 | 145.4 | −83.3 | 49.9 |
speaker | id | vowel | plt_vclass | word | .n | F1_0 | F1_1 | F1_2 | F1_3 | F1_4 | F2_0 | F2_1 | F2_2 | F2_3 | F2_4 | F3_0 | F3_1 | F3_2 | F3_3 | F3_4 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | s03 | 0 | AY | ay | I | 20 | 508.0 | 2.6 | −27.2 | −7.5 | −15.3 | 908.8 | −60.2 | 3.8 | −1.4 | −5.9 | 1,556.1 | 60.0 | −58.8 | 78.7 | −52.5 |
2 | s03 | 15 | AY | ay | KIND | 20 | 442.6 | 118.2 | 14.6 | 36.3 | 11.8 | 849.9 | 134.7 | −82.7 | 124.9 | 6.3 | 1,535.4 | 87.3 | −97.2 | −37.3 | 41.2 |
3 | s03 | 28 | AY | ay | MY | 20 | 430.2 | −47.1 | −67.3 | −22.7 | −21.5 | 867.3 | −141.6 | 14.8 | 11.6 | 23.9 | 1,615.5 | −5.0 | −7.0 | −14.6 | 2.7 |
4 | s03 | 43 | AY | ay | I | 20 | 404.3 | −23.0 | −66.9 | 0.7 | −17.2 | 906.7 | −76.2 | 48.9 | 34.0 | −0.7 | 1,544.2 | −34.8 | 35.9 | 53.4 | 87.7 |
5 | s03 | 55 | AY | ay | I | 20 | 391.2 | 34.5 | −26.8 | −6.6 | −16.5 | 1,032.7 | −132.0 | 29.2 | −5.0 | 1.3 | 1,587.5 | −11.5 | 29.8 | 26.3 | −23.8 |
6..45 | |||||||||||||||||||||
46 | s03 | 674 | AY | ay0 | LIFE | 20 | 414.8 | 19.9 | −40.2 | 22.9 | −24.4 | 850.8 | −219.5 | 107.1 | −32.5 | −4.3 | 1,573.9 | −88.6 | 145.4 | −83.3 | 49.9 |
Step 4: Fit the model
I’ll fit this with a “simple” lm()
. This isn’t one of the fancy GAMs you’ve heard about.
Step 5: Interpreting the model
Things get a little weird here, but we can apply the inverse DCT to the model parameters to visualize them. Getting confidence intervals takes a few more steps.
library(broom)
# get a dataframe
# of the model coefficients
tidy(ay_model) |>
# apply idct to each model term
reframe_with_idct(
estimate,
.token_id_col = term,
.param_col = response,
.n = 50
) |>
# plotting
ggplot(
aes(
.time/50, estimate
)
) +
geom_line(
color = ptol_red,
linewidth = 2
) +
facet_wrap(~term, scales = "free_y") ->
model_plot
We can interpret the curve in the Intercept facet like we normally do: It’s the predicted F1 formant track for the reference level. The curve in the “plt_vclassay0” facet is the difference curve, or how much different pre-voiceless /ay/ is predicted to be.
To get a visualization of the uncertainty we’ll have to sample from a multivariate normal.
library(mvtnorm)
library(ggdist)
Sigma <- vcov(ay_model)
mu_df <- tidy(ay_model)
rmvnorm(
1000,
mean = mu_df$estimate,
sigma = Sigma
) |>
t() |>
as_tibble(
.name_repair = "unique_quiet",
) |>
mutate(
response = mu_df$response,
term = mu_df$term
) |>
pivot_longer(
starts_with("..."),
names_to = "sample",
values_to = "estimate"
) |>
reframe_with_idct(
estimate,
.by = sample,
.token_id_col = term,
.param_col = response,
.n = 50
) |>
ggplot(
aes(
.time, estimate
)
)+
geom_hline(
data = tibble(term = "plt_vclassay0", estimate = 0),
aes(
yintercept = estimate
)
)+
stat_lineribbon()+
facet_wrap(~term, scales = "free_y")-> ci_plot
ci_plot
ci_plot + theme_dark()
Getting the rate and acceleration
If you’re still here, you might also be interested to know that you can also get the first and second derivatives of the inverse DCT as well. tidynorm has two functions for this (idct_rate()
and idct_accel()
), but there are also optional arguments .rate
and .accel
in reframe_with_idct()
y_vowel_dct_mean |>
# let's look at just one
# vowel
filter(
plt_vclass == "ay"
) |>
# reframe with rate and accel
reframe_with_idct(
F1,
.token_id_col = plt_vclass,
.param_col = .param,
.rate = T,
.accel = T,
.n = 100
) ->
formant_derivatives
plot rendering
formant_derivatives |>
pivot_longer(
starts_with("F1")
) |>
ggplot(
aes(
.time, value
)
)+
geom_line(color = ptol_red, linewidth = 1)+
facet_wrap(~name, scales = "free_y")->
deriv_plot
deriv_plot
deriv_plot + theme_dark()
Summing up
There’s a lot of cool and interesting use cases for DCT coefficients! Expect to see more about them from me!
Reuse
Citation
@online{fruehwald2025,
author = {Fruehwald, Josef},
title = {Doing Cool Things with the {Discrete} {Cosine} {Transform} in
Tidynorm},
series = {Væl Space},
date = {2025-06-17},
url = {https://jofrhwld.github.io/blog/posts/2025/06/2025-06-17_dct-in-tidynorm/},
langid = {en}
}