Filled Pauses as a um…
Sociolinguistic Variable

Josef Fruehwald

May 1, 2015

Filled Pauses

The Variable

In this talk, I’ll be focusing on the two filled pauses “UH” and “UM”.

  • completely glossing over orthographic variation
    • <uh, um> ~ <er, erm>
  • completely glossing over variation in vowel quality

A Change in Progress

Language Attitudes


Q: Why can’t I start my posts with the word “um,” be a snotty jerk, or present my views as God’s TV gospel?
A: Don’t start your posts with “um” or “uh” or words like that because nine times out of ten, those words precede a snotty correction directed at another poster…

Language Attitudes

There doesn’t seem to be any attitudes surrounding one versus the other.

A Non-Conventional

It’s difficult to place variation in UH/UM in terms of this early back and forth:

Linguistic theory is concerned with an ideal speaker-listener … unaffected by such grammatically irrelevant conditions as … errors (random or characteristic). (Chomsky 1965)

Deviations from a homogeneous system are not all errorlike vagaries of performance. (Weinreich, Labov & Herzog 1968)


A chance to revisit some basic ideas:

  • Is this a language change? An age-linked trend?
  • What is changing: how speakers communicate a message, or the messages they’re communicating?
  • How is sociolinguistic variation incorporated into the linguistic architecture?

The Data

UH UM Total
19,123 6,391 25,514
  • Extracted from the Philadelphia Neighborhood Corpus alignments.
speakers transcribed audio words stressed vowels date of birth range
395 230.9 hours 1,415,677 743,461 1889-1998

Language Change

A change in English:

A apparent time trend towards higher UM usage has been observed in:

  • English (Wieling et al, forthcoming)
    • The Switchboard Corpus
    • The Fisher Corpus
    • The British National Corpus
    • (The HCRC Map Task Corpus)

A change in Germanic

  • German
    • Forschungs- und Lehrkorpus Gesprochenes Deutsch
  • Dutch
    • Corpus Gesproken Nederlands
  • Norwegian
    • Nordic Dialect Corpus and Syntax Database
  • Danish and Faroese
    • Faroese Danish Corpus Hamburg

(Wieling et al, forthcoming)

Language Change,
or Lifespan?

Language Change,
or Lifespan?

Even in the PNC, age and date of birth are massively collinear:

Language Change,
or Lifespan?

Labov, Rosenfelder & Fruehwald (2013)

Fit 3 models

  1. outcome ~ age
  2. outcome ~ year of interview
  3. outcome ~ date of birth

Model with largest \(r^2\) wins.

Language Change,
or Lifespan?

Zellou & Tamminga (2014)

Sub-sample all of the data to create:

  1. A trend sample by restricting speakers’ ages (age <= 25)
  2. A cohort sample by restricting speakers’ dates of birth
    (1940 <= dob <= 1949)

A modelling approach

Fit a 2 dimensional tensor-product smooth:

gam(outcome ~ te(dob, year))

Produces a 2 dimensional surface

A modelling approach

You must enable Javascript to view this page properly.

A modelling approach

A modelling approach

A modelling approach

A modelling approach

  • 2 dimensional tensor product
    • (date of birth, year of interview)
  • logistic link
  • random intercepts for speakers

Um Model Results

“Apparent Time”

Um Model Results


Generational Change

The shift towards more UM seems to be an inter-generational change, with very little within-cohort shifts.

What is really changing here?

General Filled Pause

Not flat, but not exactly informative

What are Filled Pauses?

Clark & Fox Tree (2002) argue that speakers utilize UH and UM as signals for the kind of processing delay they’re experiencing.

  • Short Delay:
    • Accompanied by a brief pause
    • Speakers more likely to choose UH
  • Long Delay:
    • Accompanied by a long pause
    • Speakers more likely to choose UM


UM is likely to be comorbid with other features greater processing difficulty.

Planning Difficulty


  • More likely early in prosodic units (Clark & Fox Tree, 2002, Hawkins, 1971)
  • Discourse-new NPs more likely to be disfluent than discourse given (Arnold et al 2003)


  • Increased eye-gaze fixation to discourse new referent when preceding filled pause (Arnold et al, 2003)
  • Mitigated N400 of unpredictable NPs when preceded by filled pause (Corley et al, 2007)

What is a Signal?

But this does not prove that they are used by speakers to signal, for example, that there will be a delay in the speech stream due to uncertainty, except in the sense that smoke signals fire. (Corley & Stewart, 2008)

Usage Change
or Message Change

If UH and UM “mean” different things, then are we actually observing a change in the frequency of those meanings?

D’Arcy (2012) argues that an increase in the rate of reporting thought contributed to a diversification of the quotative system.

Usage Change
or Message Change

UM is increasing even in fully connected speech contexts, but at a faster rate than in silent pause contexts.

The Communicative

Usage Change vs Message Change

A change of usage within a stable communicative context will have a different quantitative profile from a change in the messages being signaled with a stable usage system.

Changing Messages

Silence Duration


A model excluding speakers’ date of birth has the lowest AIC & BIC, and including it is not significant according to a likelihood-ratio test.

## Data: pause_for_mod
## Models:
## dur_mod3: log2dur ~ sex + (1 | idstring)
## dur_mod2: log2dur ~ decade + sex + (1 | idstring)
## dur_mod1: log2dur ~ decade * sex + (1 | idstring)
##          Df   AIC   BIC logLik deviance  Chisq Chi Df Pr(>Chisq)
## dur_mod3  4 70354 70385 -35173    70346                         
## dur_mod2  5 70354 70393 -35172    70344 2.1684      1     0.1409
## dur_mod1  6 70355 70402 -35172    70343 0.7363      1     0.3908

Changing Messages

There is equivocal evidence compatible with the hypothesis that the messages speakers are sending, or their speech planning difficulties, are changing over time.

Changing Usage

Silece Duration by Filled Pause


The model with the full dob\(\times\)word\(\times\)gender has the lowest AIC, and is favored by the likelihood ratio tests.

The model without any date of birth predictor has the lowest BIC.

## Data: um_for_mod
## Models:
## um_mod3: log2dur ~ word * sex + (1 | idstring)
## um_mod2: log2dur ~ decade * sex + word * sex + (1 | idstring)
## um_mod1: log2dur ~ decade * word * sex + (1 | idstring)
##         Df   AIC   BIC logLik deviance   Chisq Chi Df Pr(>Chisq)   
## um_mod3  6 69918 69965 -34953    69906                             
## um_mod2  8 69910 69972 -34947    69894 12.4935      2   0.001937 **
## um_mod1 10 69905 69983 -34943    69885  8.5948      2   0.013604 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Silence Predictive Power

Under the changing messages hypothesis, the predictive power of the duration of following silences should be stable.

Under the changing usage hypothesis, its predictive power should decrease as speakers start using “um” more often irrespective of the context.

Silence Predictive Power

What is Changing?

  • The evidence is equivocal at best that what is being communicated (or speakers’ speech planning difficulties) are changing over time.
  • There is better evidence that speakers’ usage choices are changing within a fixed context.

UM’s Meaning

  • [[UM]] = long processing delay.
  • [[-ing]] = gerunds, nouns, & formality

UM is behaving just like any sociolinguistic variable. Its usage is favored under certain contexts, but both variants are fundamentally interchangeable in the current state of the change.

The Upshot

The frequency with which speakers use UM or UH as a filled pauses is a non-trivial, but arbitrary aspect of the knowledge of their language.

Speakers must be able to track frequencies of UM vs UH, even there’s evidence that they can’t accurately explicitly report back when they’ve heard them (Lickley 1995; Lickley & Bard 1996)

Subjects were able to detect filled pauses 55.2% of the time.

Place in the Language

Variation and Grammar

The Classic TD Deletion Rule (Labov, Cohen, Robins & Lewis, 1968):

\[ \left\{\begin{array}{cc} \text{t}\\ \text{d} \end{array}\right\} \rightarrow \langle\emptyset\rangle/ \left[\begin{array}{rl} \alpha & \text{consonantal}\\ \zeta & \text{obstruent} \end{array}\right]~ \gamma(+) \delta(+)~ \begin{array}{c} \text{__}\\ \epsilon ~\text{voice} \end{array}~ \beta(\text{V}) \]

Variation and Grammar

Maximum Entropy Grammar (weights for NYC, from Coetzee & Pater (2011)):

wɛst ɛnd *Ct (140.4) Max-Pre-V (80.3) Max-Final (79.3) Max (59.6) H p
wɛst ɛnd -1 -140.4 0.38
wɛs ɛnd -1 -1 -139.9 0.62

Variation and Grammar

Do we want a grammar for filled pauses?

Even if we don’t we can make a good guess as to what UM/UH’s relationship to grammar is.


Pak (2014, forthcoming) explored acquisition of determiner variation.

(THE):[ðə] ~ [ði]/ __V


Pak (2014): Adults used [ði] 90% of the time pre-vocalically.

PNC the uh/m sequences:

word full n
UH 0.9044536 375
UM 0.7857143 79

Following filled pauses behave a lot like any following vowel with respect to their effect on definite determiners.


The indefinite determiner is a different story:

an ei @ total
UH 0 74 24 98
UM 0 29 4 33

Pak (p.c.) found one example of an um from a 3 year old speaker.


There are only 8 tokens of “Det[-def] uh/m” where the intended following word could be reasonably coded as vowel initial. 2 had just the complement:

  1. There used to be [ə ə] entry way there.
  2. He worked as [ei ʌm ʌm] accident investigator.

6 repeated the determiner in its “an” form.

  1. He was [ə ə ə: ən] Onondaga, from Canada.
  2. Her husband is [ei ə: ən] officer of the guards in a federal penitentiary.
  3. I had [ə ə: ən] appendix attack.
  4. I was [ei ə: ən] insurance man though.
  5. In fact there’s [ei ə: ən] automobile parts house there.
  6. My husband is [ei ə: ən] engineering contractor.


Pak’s analysis of these facts work like so:

  1. D[-def] VIs
    • D[-def]\(\leftrightarrow\)æn/__V
    • D[-def]\(\leftrightarrow\)ei
  2. D[+def] VIs
    • D[+def]\(\leftrightarrow\)ði


For [ei um], we have a DP without any complement:

  • Det[-def] apple \(\rightarrow\) an apple
  • Det[-def] … \(\rightarrow\) a …

But by the time unstressed vowel reduction applies in the phonology, UH or UM are present, blocking it.

Variation in Grammar

If you’re the type of person who must encode variation in the grammar, perhaps you could have an UM insertion rule when you try to linearize something that’s empty due to lexical access failure.

Variation outwith Grammar

There are a growing number of examples of variable phenomena that are sensitive to features that probably shouldn’t be crammed into the narrow grammar.

  • MacKenzie (2013): Auxiliaries appears to count the number of words in the preceding DP. A DP with \(n+1\) words has a lower probablility of contraction than one with \(n\)
  • Tamminga (2014): TD Deletion and ING are sensitive to the variant that was used last time the variable came up, and how many seconds ago it was.
  • This Talk: Filled Pauses (a seemingly paralinguistic speech planning phenomenon) look like a perfectly behaved sociolinguistic variable.

Variation outwith Grammar

Preston (2004): A sociocultural selection device:


Labov et al (2011): A sociolinguistic monitor.

Variation outwith Grammar

Some challenges for moving forward

  • Properly apportioning evidence for variation explained within narrow grammar vs. within a performance system.
  • Properly constraining the performance system so it’s not arbitrarly powerful.

Benefits to moving forward

  • A good theory of the socio-cultural selection device will keep grammatical theory cleaner, so we don’t need to go positing something like a PF UM-insertion rule if we don’t need to.

Wrapping up

Wrapping up

  • The ideology-free nature of UH~UM variation has to some degree allowed people to really see language change for what it is in the public discussion about this research.
  • It’s unconventional nature allows us sociolinguists to re-see some unsettled issues in variation theory.