Doing cool things with the Discrete Cosine Transform in tidynorm

DCT coefficients are really useful!
Author

Josef Fruehwald

Published

June 17, 2025

Yesterday I posted about the normalization functions in the tidynorm R package. In order to implement formant track normalization, I had to also put together code for working with the Discrete Cosine Transform (DCT), which in and of itself can be handy to work with.

The DCT

I’ve posted about the DCT before, but to put it briefly, the DCT tries to re-describe an input signal in terms of weighted and summed cosine functions. The DCT basis looks like this:

plotting code
dct_mat <- dct_basis(100, k = 5)

as_tibble(
  dct_mat, 
  .name_repair = "unique"
) |> 
  mutate(
    x = row_number()
  ) |> 
  pivot_longer(
    starts_with("...")
  ) |> 
  ggplot(
    aes(x, value, color = name)
  ) +
    geom_line(
      linewidth = 1
    ) +
  guides(color = "none") +
  labs(y = NULL) +
  theme_no_x()->p

p
p+theme_dark()

If we grab one vowel’s formant track and fit a linear model using these functions as predictors, the coefficients will equal the DCT coefficients.

speaker_tracks |> 
  filter(
    speaker == "s01",
    plt_vclass == "ay"
  ) |> 
  filter(id == first(id)) ->
  one_ay
plotting code
one_ay |> 
  ggplot(
    aes(t, F1)
  )+
  geom_point(color = ptol_red, size = 2)->
  p

p
p + theme_dark()

# 5 dct coefficients
# for a formant track with
# 20 measurement points
dct_mat <- dct_basis(20, k = 5)

dct_mod <- lm(one_ay$F1 ~ dct_mat - 1)

dct_direct <- tidynorm::dct(one_ay$F1)[1:5]

cbind(
  coef(dct_mod),
  dct_direct
)
                      dct_direct
dct_mat1 602.3486557 602.3486557
dct_mat2  97.7676452  97.7676452
dct_mat3  -0.4687751  -0.4687751
dct_mat4  -7.1061819  -7.1061819
dct_mat5 -19.4956181 -19.4956181

Using the DCT to Smooth

A cool thing about the DCT is that it can be used to smooth formant tracks. We can see that smoothing effect if we plot the inverse DCT of the coefficients we just got.

plotting code
one_ay |> 
  mutate(
    F1_s = idct(
      dct_direct, n = n()
      )
  ) |> 
  ggplot(
    aes(
      t
    )
  ) +
  geom_point(
    aes(y = F1, color = "original")
  ) +
  geom_line(
    aes(y = F1_s, color = "dct smooth"),
    linewidth = 1
  )+
  labs(
    color = NULL
  ) -> p

p
p + theme_dark()

In tidynorm() we can get these smoothed formant tracks with reframe_with_dct_smooth().

# grabbing a sample of
# a few vowel tracks
set.seed(2025-06)
speaker_tracks |> 
  filter(
    speaker == "s01",
    plt_vclass == "ay0"
  ) |> 
  filter(
    id %in% sample(unique(id), 5)
  ) ->
  ay_tracks

# smoothing happens here
ay_tracks |> 
  reframe_with_dct_smooth(
    F1:F3,
    .token_id_col = id,
    .time_col = t
  ) ->
  ay_smooths
plotting code
ay_tracks |> 
  ggplot(
    aes(F2, F1)
  ) +
  geom_path(
    aes(color = factor(id)),
    arrow = arrow(
      type = "closed",
      angle = 25,
      length = unit(0.5, "cm")
    ),
    linewidth = 1
  ) +
  guides(
    color = "none"
  ) +
  scale_y_reverse() +
  scale_x_reverse() ->
  track_p

track_p %+% ay_smooths ->
  smooth_p

Averaging formant tracks

Something that’s really handy about DCT coefficients is they let you average over formant tracks of vowel tokens that are all different lengths. The process goes:

  1. Get the DCT coefficients for each token with reframe_with_dct().
  2. Average over each vowel class and dct parameter with dplyr::summarise().
  3. Then, convert everything back into formant-tracks with reframe_with_idct().
# Grabbing a subset of
# vowel classes
speaker_tracks |> 
  filter(
    speaker == "s03",
    str_detect(plt_vclass, "y")
  ) ->
  y_vowels

# Step 1: Reframe with DCT
y_vowels |> 
  reframe_with_dct(
    F1:F3,
    .by = speaker,
    .token_id_col = id,
    .time_col = t
  ) -> 
  y_vowel_dct
  
# Step 2: average over vowel class
#   and DCT parameter
  y_vowel_dct |> 
  summarise(
    .by = c(speaker, plt_vclass, .param),
    across(
      F1:F3,
      mean
    )
  ) ->
  y_vowel_dct_mean
  
# Step 3: Convert back to tracks
# with the inverse DCT
y_vowel_dct_mean |> 
  reframe_with_idct(
    F1:F3,
    .by = speaker,
    .token_id_col = plt_vclass,
    .param_col = .param
  ) ->
  y_vowel_averages
plotting code
y_vowel_averages |> 
  ggplot(
    aes(F2, F1)
  )+
  geom_textpath(
    aes(
      color = plt_vclass,
      label = plt_vclass
    ),
    arrow = arrow(
      type = "closed",
      angle = 25,
      length = unit(0.25, "cm")
    ),
    linewidth = 1
  )+
  guides(color = "none")+
  scale_x_reverse()+
  scale_y_reverse()->
  y_p

y_p
y_p + theme_dark()

Regressions with DCTs

A really cool thing about DCTs is that you can use them as outcome measures in a regression, and get some nuanced results with some very simple models. For example, let’s go to fit a model looking at the effect of voicing on the F1 of /ay/ (“ay”, vs “ay0”).

Step 1: get the data subset

speaker_tracks |> 
  filter(
    speaker == "s03",
    str_detect(plt_vclass, "ay")
  ) ->
  ays

Step 2: get the DCTs

ays |> 
  reframe_with_dct(
    F1:F3,
    .token_id_col = id,
    .time_col = t
  )->
  ay_dcts

Step 3: pivot wider

We need each DCT coefficient in its own column for this, so we’ll pivot wider.

ay_dcts |> 
  pivot_wider(
    names_from = .param,
    values_from = F1:F3
  ) ->
  ay_dct_wide
speaker id vowel plt_vclass word .n F1_0 F1_1 F1_2 F1_3 F1_4 F2_0 F2_1 F2_2 F2_3 F2_4 F3_0 F3_1 F3_2 F3_3 F3_4
1 s03 0 AY ay I 20 508.0 2.6 −27.2 −7.5 −15.3 908.8 −60.2 3.8 −1.4 −5.9 1,556.1 60.0 −58.8 78.7 −52.5
2 s03 15 AY ay KIND 20 442.6 118.2 14.6 36.3 11.8 849.9 134.7 −82.7 124.9 6.3 1,535.4 87.3 −97.2 −37.3 41.2
3 s03 28 AY ay MY 20 430.2 −47.1 −67.3 −22.7 −21.5 867.3 −141.6 14.8 11.6 23.9 1,615.5 −5.0 −7.0 −14.6 2.7
4 s03 43 AY ay I 20 404.3 −23.0 −66.9 0.7 −17.2 906.7 −76.2 48.9 34.0 −0.7 1,544.2 −34.8 35.9 53.4 87.7
5 s03 55 AY ay I 20 391.2 34.5 −26.8 −6.6 −16.5 1,032.7 −132.0 29.2 −5.0 1.3 1,587.5 −11.5 29.8 26.3 −23.8
6..45
46 s03 674 AY ay0 LIFE 20 414.8 19.9 −40.2 22.9 −24.4 850.8 −219.5 107.1 −32.5 −4.3 1,573.9 −88.6 145.4 −83.3 49.9
speaker id vowel plt_vclass word .n F1_0 F1_1 F1_2 F1_3 F1_4 F2_0 F2_1 F2_2 F2_3 F2_4 F3_0 F3_1 F3_2 F3_3 F3_4
1 s03 0 AY ay I 20 508.0 2.6 −27.2 −7.5 −15.3 908.8 −60.2 3.8 −1.4 −5.9 1,556.1 60.0 −58.8 78.7 −52.5
2 s03 15 AY ay KIND 20 442.6 118.2 14.6 36.3 11.8 849.9 134.7 −82.7 124.9 6.3 1,535.4 87.3 −97.2 −37.3 41.2
3 s03 28 AY ay MY 20 430.2 −47.1 −67.3 −22.7 −21.5 867.3 −141.6 14.8 11.6 23.9 1,615.5 −5.0 −7.0 −14.6 2.7
4 s03 43 AY ay I 20 404.3 −23.0 −66.9 0.7 −17.2 906.7 −76.2 48.9 34.0 −0.7 1,544.2 −34.8 35.9 53.4 87.7
5 s03 55 AY ay I 20 391.2 34.5 −26.8 −6.6 −16.5 1,032.7 −132.0 29.2 −5.0 1.3 1,587.5 −11.5 29.8 26.3 −23.8
6..45
46 s03 674 AY ay0 LIFE 20 414.8 19.9 −40.2 22.9 −24.4 850.8 −219.5 107.1 −32.5 −4.3 1,573.9 −88.6 145.4 −83.3 49.9

Step 4: Fit the model

I’ll fit this with a “simple” lm(). This isn’t one of the fancy GAMs you’ve heard about.

ay_model <- lm(
  cbind(F1_0, F1_1, F1_2, F1_3, F1_4) ~ plt_vclass,
  data = ay_dct_wide
)

Step 5: Interpreting the model

Things get a little weird here, but we can apply the inverse DCT to the model parameters to visualize them. Getting confidence intervals takes a few more steps.

library(broom)

# get a dataframe 
# of the model coefficients
tidy(ay_model) |> 
  
# apply idct to each model term
  reframe_with_idct(
    estimate,
    .token_id_col = term,
    .param_col = response,
    .n = 50
  ) |> 

# plotting
  ggplot(
    aes(
      .time/50, estimate
    )
  ) +
  geom_line(
    color = ptol_red,
    linewidth = 2
  ) +
  facet_wrap(~term, scales = "free_y") ->
  model_plot
plot rendering
model_plot
model_plot + theme_dark()

We can interpret the curve in the Intercept facet like we normally do: It’s the predicted F1 formant track for the reference level. The curve in the “plt_vclassay0” facet is the difference curve, or how much different pre-voiceless /ay/ is predicted to be.

To get a visualization of the uncertainty we’ll have to sample from a multivariate normal.

library(mvtnorm)
library(ggdist)
Sigma <- vcov(ay_model)
mu_df <- tidy(ay_model)

rmvnorm(
  1000, 
  mean = mu_df$estimate, 
  sigma = Sigma
) |> 
  t() |> 
  as_tibble(
    .name_repair = "unique_quiet",
  ) |> 
  mutate(
    response = mu_df$response,
    term = mu_df$term
  ) |> 
  pivot_longer(
    starts_with("..."),
    names_to = "sample",
    values_to = "estimate"
  ) |> 
  reframe_with_idct(
    estimate,
    .by = sample,
    .token_id_col = term,
    .param_col = response,
    .n = 50
  ) |> 
  ggplot(
    aes(
      .time, estimate
    )
  )+
  geom_hline(
    data =  tibble(term = "plt_vclassay0", estimate = 0),
    aes(
      yintercept = estimate
    )
  )+
  stat_lineribbon()+
  facet_wrap(~term, scales = "free_y")-> ci_plot

ci_plot
ci_plot + theme_dark()

Getting the rate and acceleration

If you’re still here, you might also be interested to know that you can also get the first and second derivatives of the inverse DCT as well. tidynorm has two functions for this (idct_rate() and idct_accel()), but there are also optional arguments .rate and .accel in reframe_with_idct()

y_vowel_dct_mean |> 
  # let's look at just one
  # vowel
  filter(
    plt_vclass == "ay"
  ) |> 
  # reframe with rate and accel
  reframe_with_idct(
    F1,
    .token_id_col = plt_vclass,
    .param_col = .param,
    .rate = T,
    .accel = T,
    .n = 100
  ) ->
  formant_derivatives
plot rendering
formant_derivatives |> 
  pivot_longer(
    starts_with("F1")
  ) |> 
  ggplot(
    aes(
      .time, value
    )
  )+
  geom_line(color = ptol_red, linewidth = 1)+
  facet_wrap(~name, scales = "free_y")->
  deriv_plot

deriv_plot
deriv_plot + theme_dark()

Summing up

There’s a lot of cool and interesting use cases for DCT coefficients! Expect to see more about them from me!

Reuse

CC-BY-SA 4.0

Citation

BibTeX citation:
@online{fruehwald2025,
  author = {Fruehwald, Josef},
  title = {Doing Cool Things with the {Discrete} {Cosine} {Transform} in
    Tidynorm},
  series = {Væl Space},
  date = {2025-06-17},
  url = {https://jofrhwld.github.io/blog/posts/2025/06/2025-06-17_dct-in-tidynorm/},
  langid = {en}
}
For attribution, please cite this work as:
Fruehwald, Josef. 2025. “Doing Cool Things with the Discrete Cosine Transform in Tidynorm.” Væl Space. June 17, 2025. https://jofrhwld.github.io/blog/posts/2025/06/2025-06-17_dct-in-tidynorm/.