Forced Alignment and Vowel Extraction for Sociophonetics

Josef Fruehwald

March 13, 2014

Introduction

Outline

  1. The Benefit of Automation
  2. The tools we’ve built
    • FAVE Align
    • FAVE Extract
  3. Some of the results.

FAVE

Thank you National Science Foundation!

The Benefit of Automation

FUD

  • Fear
  • Uncertainty
  • Doubt

“It’ll make mistakes!”

People make mistakes.

plot of chunk unnamed-chunk-1

F2 = 15543?

  tels_df %.% filter(F2 > 5000) %.%
              select(F1,F2,VClass,Word)
##    F1    F2 VClass  Word
## 1 996 15543     ay whine

F2 < F1?

  tels_df %.% filter(F2 < F1, !is.na(F1)) %.%
              select(F1,F2,VClass,Word)
##    F1  F2 VClass Word
## 1 822 761     oh  all

What’s this little hat?

plot of chunk unnamed-chunk-4

It’s all low vowels?

plot of chunk unnamed-chunk-5

Mistakes

  • Hand Measurements ≠ Error Free

“It’s a black box!”

People are black boxes

https://en.wikipedia.org/wiki/File:PhrenologyPix.jpg

FAVE

https://github.com/JoFrhwld/FAVE

“It’s like…”

too much

Well so is Praat!

praat

Don’t stop looking at and listening to your data!

Positive Benefits

  • Consistency
  • Replicability

When humans format data by hand

Diane Altwasser, 28, Calgary, AB  TS 663


Darcy Janzen (m), 36, Calgary, AB  TS 658


John Kistler, 47, m,ColoradoSprings, CO TS 147

When humans curate data by hand

AB:Calgary:DAltwasser.txt:       text/plain; charset=us-ascii
...
AR:LittleRock:MKemp.pln:         text/x-c++; charset=iso-8859-1
...
AZ:Tucson:JBrunekant.pln:        text/plain; charset=iso-8859-1
...
IL:Chicago:JWojcik.pln:          text/x-c; charset=us-ascii
...
IL:Chicago:KReynen.pln:          text/x-c++; charset=us-ascii
text/plain charset=iso-8859-1 text/plain charset=us-ascii text/x-c charset=us-ascii text/x-c++ charset=iso-8859-1 text/x-c++ charset=us-ascii
66 368 4 2 2

Forced Alignment

Hidden Markov Models

defeat

Hidden Markov Models

defeat

Hidden Markov Models

defeat

Hidden Markov Models

defeat

Hidden Markov Models

defeat

Hidden Markov Models

defeat

Hidden Markov Models

defeat

Hidden Markov Models

defeat

Hidden Markov Models

defeat

Hidden Markov Models

defeat

Hidden Markov Models

defeat

Hidden Markov Models

defeat

Hidden Markov Models

defeat

The Result

alignment

Training & Use

  • A lot (20+ hours) of hand aligned data.

FAVE-align

P2FA

p2fa

Specs

  • 25.5 hours training data.
  • Monophone model
  • 10 ms granularity

Specs

Accuracy, from Yuan & Liberman (2008)

p2fa_errors

Specs

MacKenzie & Turton compared FAVE to other aligners on British English.

Median Mean Max
Onset Offset Onset Offset Onset Offset
FAVE 0.009 0.009 0.019 0.021 0.583 0.588
PLA 0.015 0.019 0.267 0.252 55.473 55.488
SPPAS 0.150 0.155 0.504 0.480 68.903 67.408

FAVE-Align

transcription

FAVE-Align

Using Forced Alignment

Variation in the dictionary

No

car walking both
K AA R W AO K IH NG B OW TH
K AA W AO K IH N B OW F
Requires special training of forced aligner.

Maybe?

either going to
AY DH ER G OW IH NG T UW
IY DH ER G AA N AH

Using Forced Alignment

Yuan, J., Liberman, M., “Automatic detection of ‘g-dropping’ in American English using forced alignment,” Proceedings of 2011 IEEE Automatic Speech Recognition and Understanding Workshop, pp. 490-493.

Yuan, J, Liberman, M., “Investigating /l/ variation in English through forced alignment,” Proceedings of Interspeech 2009, pp. 2215-2218.

FAVE-Extract

Researcher degrees of freedom

  • Where to measure.
  • Multiple LPC parameter settings.
  • Whether or not to measure the vowel at all.

Formant Estimation

3

4

6

Formant Estimation

rdf

Formant Estimation

https://en.wikipedia.org/wiki/File:HAL9000.svg

Automating Formant Estimation

  • The bad errors are very very bad.
  • Some small differences, any expert may disagree

4

5

FAVE-extract

anae

Extraction Example

Jean’s Vowel Space

plot of chunk jean_vspace

Jean’s /iy/

plot of chunk iy_candidates

Jean’s iys

plot of chunk unnamed-chunk-7

plot of chunk unnamed-chunk-8

What is a reasonable /iy/?

According to the ANAE:

plot of chunk plot_anae_prior

Formant Estimation

  • Use F1, F2, log Bandwidth F1 and log Bandwidth F2
  • For each potential measurement, calculate the distance from the ANAE distribution.
  • Mahalanobis distance
  • The closest is the winner.

Result

plot of chunk example1

plot of chunk example2

Result

plot of chunk first_winner_plot

Have we only found what we “expected”?

plot of chunk not_just_expected

Have we only found what we “expected”?

plot of chunk not_just_expected2

Step 2 – Re-estimation

  • Take the winners from the first step.
  • Re-estimate distribution for F1, F2, log Bandwidth 1, log Bandwidth 2, and log duration
  • Go through all the candidates again, and choose the one closest to the speaker’s own distribution.

Re-estimation

plot of chunk re_estimate_1

Re-estimation

Stayed the same New re-estimation
182 42

We did something!

plot of chunk worth_it

FAVE-extract for other dialects

Results

Language in Motion!