Doing cool things with the Discrete Cosine Transform in tidynorm

Yesterday I posted about the normalization functions in the tidynorm R package. In order to implement formant track normalization, I had to also put together code for working with the Discrete Cosine Transform (DCT), which in and of itself can be handy to work with.

library(tidynorm)
library(tidyverse)
library(geomtextpath)
library(gt)

source(here::here("_defaults.R"))

The DCT

I’ve posted about the DCT before, but to put it briefly, the DCT tries to re-describe an input signal in terms of weighted and summed cosine functions. The DCT basis looks like this:

plotting code

dct_mat <- dct_basis(100, k = 5)

as_tibble(
  dct_mat, 
  .name_repair = "unique"
) |> 
  mutate(
    x = row_number()
  ) |> 
  pivot_longer(
    starts_with("...")
  ) |> 
  ggplot(
    aes(x, value, color = name)
  ) +
    geom_line(
      linewidth = 1
    ) +
  guides(color = "none") +
  labs(y = NULL) +
  theme_no_x()->p

p
p+theme_dark()

If we grab one vowel’s formant track and fit a linear model using these functions as predictors, the coefficients will equal the DCT coefficients.

speaker_tracks |> 
  filter(
    speaker == "s01",
    plt_vclass == "ay"
  ) |> 
  filter(id == first(id)) ->
  one_ay

plotting code

one_ay |> 
  ggplot(
    aes(t, F1)
  )+
  geom_point(color = ptol_red, size = 2)->
  p

p
p + theme_dark()

# 5 dct coefficients
# for a formant track with
# 20 measurement points
dct_mat <- dct_basis(20, k = 5)

dct_mod <- lm(one_ay$F1 ~ dct_mat - 1)

dct_direct <- tidynorm::dct(one_ay$F1)[1:5]

cbind(
  coef(dct_mod),
  dct_direct
)

                      dct_direct
dct_mat1 602.3486557 602.3486557
dct_mat2  97.7676452  97.7676452
dct_mat3  -0.4687751  -0.4687751
dct_mat4  -7.1061819  -7.1061819
dct_mat5 -19.4956181 -19.4956181

Using the DCT to Smooth

A cool thing about the DCT is that it can be used to smooth formant tracks. We can see that smoothing effect if we plot the inverse DCT of the coefficients we just got.

plotting code

one_ay |> 
  mutate(
    F1_s = idct(
      dct_direct, n = n()
      )
  ) |> 
  ggplot(
    aes(
      t
    )
  ) +
  geom_point(
    aes(y = F1, color = "original")
  ) +
  geom_line(
    aes(y = F1_s, color = "dct smooth"),
    linewidth = 1
  )+
  labs(
    color = NULL
  ) -> p

p
p + theme_dark()

In tidynorm() we can get these smoothed formant tracks with reframe_with_dct_smooth().

# grabbing a sample of
# a few vowel tracks
set.seed(2025-06)
speaker_tracks |> 
  filter(
    speaker == "s01",
    plt_vclass == "ay0"
  ) |> 
  filter(
    id %in% sample(unique(id), 5)
  ) ->
  ay_tracks

# smoothing happens here
ay_tracks |> 
  reframe_with_dct_smooth(
    F1:F3,
    .token_id_col = id,
    .time_col = t
  ) ->
  ay_smooths

plotting code

ay_tracks |> 
  ggplot(
    aes(F2, F1)
  ) +
  geom_path(
    aes(color = factor(id)),
    arrow = arrow(
      type = "closed",
      angle = 25,
      length = unit(0.5, "cm")
    ),
    linewidth = 1
  ) +
  guides(
    color = "none"
  ) +
  scale_y_reverse() +
  scale_x_reverse() ->
  track_p

track_p %+% ay_smooths ->
  smooth_p

Averaging formant tracks

Something that’s really handy about DCT coefficients is they let you average over formant tracks of vowel tokens that are all different lengths. The process goes:

Get the DCT coefficients for each token with reframe_with_dct().
Average over each vowel class and dct parameter with dplyr::summarise().
Then, convert everything back into formant-tracks with reframe_with_idct().

# Grabbing a subset of
# vowel classes
speaker_tracks |> 
  filter(
    speaker == "s03",
    str_detect(plt_vclass, "y")
  ) ->
  y_vowels

# Step 1: Reframe with DCT
y_vowels |> 
  reframe_with_dct(
    F1:F3,
    .by = speaker,
    .token_id_col = id,
    .time_col = t
  ) -> 
  y_vowel_dct
  
# Step 2: average over vowel class
#   and DCT parameter
  y_vowel_dct |> 
  summarise(
    .by = c(speaker, plt_vclass, .param),
    across(
      F1:F3,
      mean
    )
  ) ->
  y_vowel_dct_mean
  
# Step 3: Convert back to tracks
# with the inverse DCT
y_vowel_dct_mean |> 
  reframe_with_idct(
    F1:F3,
    .by = speaker,
    .token_id_col = plt_vclass,
    .param_col = .param
  ) ->
  y_vowel_averages

plotting code

y_vowel_averages |> 
  ggplot(
    aes(F2, F1)
  )+
  geom_textpath(
    aes(
      color = plt_vclass,
      label = plt_vclass
    ),
    arrow = arrow(
      type = "closed",
      angle = 25,
      length = unit(0.25, "cm")
    ),
    linewidth = 1
  )+
  guides(color = "none")+
  scale_x_reverse()+
  scale_y_reverse()->
  y_p

y_p
y_p + theme_dark()

Regressions with DCTs

A really cool thing about DCTs is that you can use them as outcome measures in a regression, and get some nuanced results with some very simple models. For example, let’s go to fit a model looking at the effect of voicing on the F1 of /ay/ (“ay”, vs “ay0”).

Step 1: get the data subset

speaker_tracks |> 
  filter(
    speaker == "s03",
    str_detect(plt_vclass, "ay")
  ) ->
  ays

Step 2: get the DCTs

ays |> 
  reframe_with_dct(
    F1:F3,
    .token_id_col = id,
    .time_col = t
  )->
  ay_dcts

Step 3: pivot wider

We need each DCT coefficient in its own column for this, so we’ll pivot wider.

ay_dcts |> 
  pivot_wider(
    names_from = .param,
    values_from = F1:F3
  ) ->
  ay_dct_wide

The wide data

	speaker	id	vowel	plt_vclass	word	.n	F1_0	F1_1	F1_2	F1_3	F1_4	F2_0	F2_1	F2_2	F2_3	F2_4	F3_0	F3_1	F3_2	F3_3	F3_4
1	s03	0	AY	ay	I	20	508.0	2.6	−27.2	−7.5	−15.3	908.8	−60.2	3.8	−1.4	−5.9	1,556.1	60.0	−58.8	78.7	−52.5
2	s03	15	AY	ay	KIND	20	442.6	118.2	14.6	36.3	11.8	849.9	134.7	−82.7	124.9	6.3	1,535.4	87.3	−97.2	−37.3	41.2
3	s03	28	AY	ay	MY	20	430.2	−47.1	−67.3	−22.7	−21.5	867.3	−141.6	14.8	11.6	23.9	1,615.5	−5.0	−7.0	−14.6	2.7
4	s03	43	AY	ay	I	20	404.3	−23.0	−66.9	0.7	−17.2	906.7	−76.2	48.9	34.0	−0.7	1,544.2	−34.8	35.9	53.4	87.7
5	s03	55	AY	ay	I	20	391.2	34.5	−26.8	−6.6	−16.5	1,032.7	−132.0	29.2	−5.0	1.3	1,587.5	−11.5	29.8	26.3	−23.8
6..45
46	s03	674	AY	ay0	LIFE	20	414.8	19.9	−40.2	22.9	−24.4	850.8	−219.5	107.1	−32.5	−4.3	1,573.9	−88.6	145.4	−83.3	49.9

	speaker	id	vowel	plt_vclass	word	.n	F1_0	F1_1	F1_2	F1_3	F1_4	F2_0	F2_1	F2_2	F2_3	F2_4	F3_0	F3_1	F3_2	F3_3	F3_4
1	s03	0	AY	ay	I	20	508.0	2.6	−27.2	−7.5	−15.3	908.8	−60.2	3.8	−1.4	−5.9	1,556.1	60.0	−58.8	78.7	−52.5
2	s03	15	AY	ay	KIND	20	442.6	118.2	14.6	36.3	11.8	849.9	134.7	−82.7	124.9	6.3	1,535.4	87.3	−97.2	−37.3	41.2
3	s03	28	AY	ay	MY	20	430.2	−47.1	−67.3	−22.7	−21.5	867.3	−141.6	14.8	11.6	23.9	1,615.5	−5.0	−7.0	−14.6	2.7
4	s03	43	AY	ay	I	20	404.3	−23.0	−66.9	0.7	−17.2	906.7	−76.2	48.9	34.0	−0.7	1,544.2	−34.8	35.9	53.4	87.7
5	s03	55	AY	ay	I	20	391.2	34.5	−26.8	−6.6	−16.5	1,032.7	−132.0	29.2	−5.0	1.3	1,587.5	−11.5	29.8	26.3	−23.8
6..45
46	s03	674	AY	ay0	LIFE	20	414.8	19.9	−40.2	22.9	−24.4	850.8	−219.5	107.1	−32.5	−4.3	1,573.9	−88.6	145.4	−83.3	49.9

Step 4: Fit the model

I’ll fit this with a “simple” lm(). This isn’t one of the fancy GAMs you’ve heard about.

ay_model <- lm(
  cbind(F1_0, F1_1, F1_2, F1_3, F1_4) ~ plt_vclass,
  data = ay_dct_wide
)

Step 5: Interpreting the model

Things get a little weird here, but we can apply the inverse DCT to the model parameters to visualize them. Getting confidence intervals takes a few more steps.

library(broom)

# get a dataframe 
# of the model coefficients
tidy(ay_model) |> 
  
# apply idct to each model term
  reframe_with_idct(
    estimate,
    .token_id_col = term,
    .param_col = response,
    .n = 50
  ) |> 

# plotting
  ggplot(
    aes(
      .time/50, estimate
    )
  ) +
  geom_line(
    color = ptol_red,
    linewidth = 2
  ) +
  facet_wrap(~term, scales = "free_y") ->
  model_plot

plot rendering

model_plot
model_plot + theme_dark()

We can interpret the curve in the Intercept facet like we normally do: It’s the predicted F1 formant track for the reference level. The curve in the “plt_vclassay0” facet is the difference curve, or how much different pre-voiceless /ay/ is predicted to be.

Getting CIs

To get a visualization of the uncertainty we’ll have to sample from a multivariate normal.

library(mvtnorm)
library(ggdist)
Sigma <- vcov(ay_model)
mu_df <- tidy(ay_model)

rmvnorm(
  1000, 
  mean = mu_df$estimate, 
  sigma = Sigma
) |> 
  t() |> 
  as_tibble(
    .name_repair = "unique_quiet",
  ) |> 
  mutate(
    response = mu_df$response,
    term = mu_df$term
  ) |> 
  pivot_longer(
    starts_with("..."),
    names_to = "sample",
    values_to = "estimate"
  ) |> 
  reframe_with_idct(
    estimate,
    .by = sample,
    .token_id_col = term,
    .param_col = response,
    .n = 50
  ) |> 
  ggplot(
    aes(
      .time, estimate
    )
  )+
  geom_hline(
    data =  tibble(term = "plt_vclassay0", estimate = 0),
    aes(
      yintercept = estimate
    )
  )+
  stat_lineribbon()+
  facet_wrap(~term, scales = "free_y")-> ci_plot

ci_plot
ci_plot + theme_dark()

Getting the rate and acceleration

If you’re still here, you might also be interested to know that you can also get the first and second derivatives of the inverse DCT as well. tidynorm has two functions for this (idct_rate() and idct_accel()), but there are also optional arguments .rate and .accel in reframe_with_idct()

y_vowel_dct_mean |> 
  # let's look at just one
  # vowel
  filter(
    plt_vclass == "ay"
  ) |> 
  # reframe with rate and accel
  reframe_with_idct(
    F1,
    .token_id_col = plt_vclass,
    .param_col = .param,
    .rate = T,
    .accel = T,
    .n = 100
  ) ->
  formant_derivatives

plot rendering

formant_derivatives |> 
  pivot_longer(
    starts_with("F1")
  ) |> 
  ggplot(
    aes(
      .time, value
    )
  )+
  geom_line(color = ptol_red, linewidth = 1)+
  facet_wrap(~name, scales = "free_y")->
  deriv_plot

deriv_plot
deriv_plot + theme_dark()

Summing up

There’s a lot of cool and interesting use cases for DCT coefficients! Expect to see more about them from me!

Reuse

CC-BY-SA 4.0

Citation

BibTeX citation:

@online{fruehwald2025,
  author = {Fruehwald, Josef},
  title = {Doing Cool Things with the {Discrete} {Cosine} {Transform} in
    Tidynorm},
  series = {Væl Space},
  date = {2025-06-17},
  url = {https://jofrhwld.github.io/blog/posts/2025/06/2025-06-17_dct-in-tidynorm/},
  langid = {en}
}

For attribution, please cite this work as:

Fruehwald, Josef. 2025. “Doing Cool Things with the Discrete Cosine Transform in Tidynorm.” Væl Space. June 17, 2025. https://jofrhwld.github.io/blog/posts/2025/06/2025-06-17_dct-in-tidynorm/.