Making a spectrogram in R

Josef Fruehwald

doi:10.59350/0d3a6-7nk34

I might flesh this out as a more detailed tutorial for LingMethodsHub, but for now this is going to be a rough-around-the-edges post about making spectrograms in R. My goal will be to get as close as possible to recreating a spectrogram like you might get from Praat.

Figure 1: A spectrogram of the vowel /i/ generated by praat.

Pre-processing.

To keep things simple, I grabbed vowel audio clip from the Wikipedia IPA vowel chat with audio (audio info and license).

Explanation of a hacky thing I had to do.

I know tuneR package has a readWave() function, and I couldn’t figure out how to read in an Oog file, so step 1 was converting the Oog to a wav file. Since I’m writing this in a quarto notebook, I thought I should be able to drop in a ```{sh} code chunk, but it seems like doesn’t have access to my PATH. Long story short, that’s why I’ve got this kind of goofy R code chunk with system() and the full path to sox.

library(glue)

source(here::here("_defaults.R"))

ogg_file <- "assets/Close_front_unrounded_vowel.ogg"
wav_file <- "assets/Close_front_unrounded_vowel.wav"
system(glue("/opt/homebrew/bin/sox {ogg_file} {wav_file}"))

Loading the audio file

The seewave package, which I’m using to make the spectrogram, takes the sound objects created by tuneR, so that’s what I’ll use for reading in the audio file.

library(tuneR)
library(seewave)

i_wav <- readWave("assets/Close_front_unrounded_vowel.wav")

To get a sense of what information is in the wav file, you can use str()

str(i_wav)

Formal class 'Wave' [package "tuneR"] with 6 slots
  ..@ left     : int [1:26524] -2 16 42 24 33 53 56 68 51 55 ...
  ..@ right    : num(0) 
  ..@ stereo   : logi FALSE
  ..@ samp.rate: int 44100
  ..@ bit      : int 16
  ..@ pcm      : logi TRUE

Since I’m going to be zooming in to 0 to 5,000 Hz on the spectrogram, I’ll downsample the audio to 10000.

i_wav_d <- downsample(i_wav, 10000)

Computing the spectrogram

The function to compute the spectrogram is seewave::spectro(). Its argument names are formatted in a way I find a bit grating. A lot of them are compressed down to single characters or other abbreviations that require having the docs constantly open.

Anyway, the arguments that seem most important are:

wl: window length in samples
wn: window function, defaulting to Hanning
ovlp: Window overlap, in percentage. That is, a 25% overlap between analysis windows should be passed to ovlp as 25.

Praat defaults

Let’s have a look at the Praat spectrogram defaults

A screenshot of the default spectrogram settings for Praat. — Figure 2: Praat defaults for spectrograms

Here’s a quick illustration of what these defaults correspond to. It takes an analysis window that’s 0.005 seconds long, and moves it over time by 0.002 second increments. Also, the data coming into the analysis window is weighted by a Gaussian distribution.

library(tidyverse)

Plot Code

win_len <- 0.005
time_step <- 0.002
tibble(
  center = seq(win_len/2, 0.02, by = time_step),
  left_edge = center - (win_len/2),
  right_edge = center + (win_len/2),
  win_num = seq_along(center)
) -> window_fig
window_fig |> 
  ggplot(aes(x = center, y = win_num))+
    geom_pointrange(
      aes(
        xmin = left_edge,
        xmax = right_edge
      ),
      size = 2,
      linewidth = 2
    )+
  labs(x = "time",
       y = NULL,
       title = "Overlapping spectrogram window illustration")+
  theme(axis.text.y = element_blank(),
        panel.grid.major.y = element_blank(),
        panel.grid.minor.y = element_blank())

A figure with point ranges illustrating the overlap of spectrogram analysis windows. Each successive window overlaps by slightly more than 50 — Figure 3: An illustration of overlapping analysis windows that produce a spectrogram

The seewave::spectro() function defines this same relationship, except instead of the time step or window hop length, we need to define by what % the windows overlap. We also need to express how wide the windows are in terms of audio sample, rather than in terms of time, but that just requires multiplying the desired time width by the sampling rate.

win_len <- 0.005 * i_wav_d@samp.rate
hop_len <- 0.002 * i_wav_d@samp.rate
overlap <- ((win_len - hop_len) / win_len) * 100

The one thing that I can’t recreate for now is the Gaussian window function. seewave doesn’t have it implemented, so I’ll just stick with its default (Hamming)

Computing the spectrogram

Now, it’s a pretty straightforward call to spectro().

spect <-
  i_wav_d |>
  spectro(
    # window length, in terms of samples
    wl = win_len,
    # window overlap
    ovlp = overlap,
    # don't plot the result
    plot = F
    )

The spect object is a list with three named items

$time: a vector corresponding to the time domain
$freq: a vector corresponding to the frequency domain
$amp: a matrix of amplitudes across the time and frequency domains

glimpse(spect)

List of 3
 $ time: num [1:299] 0 0.00202 0.00404 0.00606 0.00807 ...
 $ freq: num [1:25] 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 ...
 $ amp : num [1:25, 1:299] -35.1 -39.5 -51.3 -49.9 -51.7 ...

Tidying up

In order to make a plot of the spectrogram in ggplot2, we need to do some tidying up. I’ll go about this by setting the row and column names of the spect$amp matrix to the frequency and time domain values, converting it to a data frame, then doing some pivoting.

# set the colnames and rownames
colnames(spect$amp) <- spect$time
rownames(spect$amp) <- spect$freq

spect_df <-
  spect$amp |>
  # coerce the row names to a column
  as_tibble(rownames = "freq") |>
  # pivot to long format
  pivot_longer(
    # all columns except freq
    -freq, 
    names_to = "time", 
    values_to = "amp"
  ) |>
  # since they were names before,
  # freq and time need conversion to numeric
  mutate(
    freq = as.numeric(freq),
    time = as.numeric(time)
  )

summary(spect_df)

      freq          time             amp        
 Min.   :0.0   Min.   :0.0000   Min.   :-89.94  
 1st Qu.:1.2   1st Qu.:0.1494   1st Qu.:-45.17  
 Median :2.4   Median :0.3008   Median :-35.85  
 Mean   :2.4   Mean   :0.3008   Mean   :-35.43  
 3rd Qu.:3.6   3rd Qu.:0.4521   3rd Qu.:-26.69  
 Max.   :4.8   Max.   :0.6015   Max.   :  0.00

“Dynamic Range”

Frequency data is represented in terms of kHz, which I’ll leave alone for now. One last thing we need to re-recreate from Praat is the “dynamic range”. All values below some cut off (by default, 50 below the maximum) are plotted with the same color. We can do that with some data frame operations here.

dyn = -50
spect_df_floor <- 
  spect_df |> 
  mutate(
    amp_floor = case_when(
      amp < dyn ~ dyn,
      TRUE ~ amp  
    )
  )

Plotting the spectrogram

Now what’s left is to plot the thing. I’ll load the khroma package in order to get some nice color scales.

library(khroma)

Basic raster plot

As a first step, we can plot the time by frequency data as a raster plot, with little rectangles for each position filled in with their amplitude.

spect_df_floor |> 
  ggplot(aes(time, freq))+
    geom_raster(aes(fill = amp_floor))+
    guides(fill = "none")+
    labs(
      x = "time (s)",
      y = "frequency (kHz)",
      title = "spectrogram raster plot"
    )

A spectrogram, drawn as a raster plot — Figure 4: A raster spectrogram plot

Spectrogram contour plot

To get closer to the Praat output, though, we need to make a contour plot instead. Here’s where I’m getting a bit stymied. I wind up with these weird diagonal “shadows” on the right and left hand side of the spectrogram, which I think are a result of how the stat_contour() is being computed and plotted, rather than anything to do with the actual spectrogram.

spect_df_floor |> 
  ggplot(aes(time, freq))+
    stat_contour(
      aes(
        z = amp_floor,
        fill = after_stat(level)
      ),
      geom = "polygon",
      bins = 300
    )+
    scale_fill_batlow()+
    guides(fill = "none")+
    labs(
      x = "time (s)",
      y = "frequency (kHz)",
      title = "spectrogram contour plot"
    )

A spectrogram, drawn with contours — Figure 5: A contour spectrogram plot

One way around this I’ve found is to compute the spectrogram on the higher sampling rate audio, and then zoom into the frequency range you want.

re-running the spectrogram

win_len <- 0.005 * i_wav@samp.rate
hop_len <- 0.002 * i_wav@samp.rate
overlap <- ((win_len - hop_len) / win_len) * 100

spect2 <-
  i_wav |>
  spectro(
    # window length, in terms of samples
    wl = win_len,
    # window overlap
    ovlp = overlap,
    # don't plot the result
    plot = F
    )

# set the colnames and rownames
colnames(spect2$amp) <- spect2$time
rownames(spect2$amp) <- spect2$freq

spect2_df <-
  spect2$amp |>
  # coerce the row names to a column
  as_tibble(rownames = "freq") |>
  # pivot to long format
  pivot_longer(
    # all columns except freq
    -freq, 
    names_to = "time", 
    values_to = "amp"
  ) |>
  # since they were names before,
  # freq and time need conversion to numeric
  mutate(
    freq = as.numeric(freq),
    time = as.numeric(time)
  )

dyn = -50
spect2_df_floor <- 
  spect2_df |> 
  mutate(
    amp_floor = case_when(
      amp < dyn ~ dyn,
      TRUE ~ amp  
    )
  )

spect2_df_floor |> 
  ggplot(aes(time, freq))+
    stat_contour(
      aes(
        z = amp_floor,
        fill = after_stat(level)
      ),
      geom = "polygon",
      bins = 300
    )+
    scale_fill_batlow()+
    guides(fill = "none")+
    labs(
      x = "time (s)",
      y = "frequency (kHz)",
      title = "spectrogram contour plot"
    )+
    coord_cartesian(ylim = c(0, 5))

I think this might not be necessary if I had a better handle on stat_contour() but for now, it does the trick!

Update

Due to popular request, another attempt at showing the degree of window overlap.

This is fantastic, jo, thx! One quick q, I’m wondering if there might be a better way to visualize overlapping windows. The image maybe implies some y dimension (windows increasing in some vertical space). I’m just putting myself in a 1st yrs shoes in my speech class

— Chandan (@GutStrings) January 22, 2023

I’ve folded the code here, just because it’s medium gnarly. The short version is you can get the weights for the window functions from seewave::ftwindow(), and then I used geom_area() to plot it.

the data and plotting code

window_fig |> 
  group_by(win_num) |> 
  nest() |> 
  mutate(
    dists = map(
      data, 
      \(df){
        tibble(
          time = seq(df$left_edge, df$right_edge, len = 512),
          weight = ftwindow(512)
        )
      }
    )
  ) |> 
  unnest(dists) -> window_weights

window_weights |> 
  ggplot(aes(time, weight, group = win_num)) +
    geom_area(
      aes(
        group = win_num, 
        fill = win_num,
        color = win_num
      ), 
      position = "identity", 
      alpha = 0.5)+
    scale_fill_hawaii(guide = "none")+
    scale_color_hawaii(guide = "none")+
  labs(title = "Hamming window functions",
       subtitle = "window length = 0.005, time step =0.002")

A sequence of Hamming window functions across the time domain, each overlapping with the previous by 60% — Figure 7: Hamming window functions

Reuse

CC-BY 4.0

Citation

BibTeX citation:

@online{fruehwald2023,
  author = {Fruehwald, Josef},
  title = {Making a Spectrogram in {R}},
  series = {Væl Space},
  date = {2023-01-22},
  url = {https://jofrhwld.github.io/blog/posts/2023/01/2023-01-22_r-spect/},
  doi = {10.59350/0d3a6-7nk34},
  langid = {en}
}

For attribution, please cite this work as:

Fruehwald, Josef. 2023. “Making a Spectrogram in R.” Væl Space. January 22, 2023. https://doi.org/10.59350/0d3a6-7nk34.