You cannot email this data to a colleague. You can’t even download it on your computer. This is data on an unprecedented impossibly mind boggling massive scale. - Kenneth Benoit (2015)
Not sociolinguistics yet.
Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. - Wikipedia
It is data made useful to us for analysis - Hilary Mason
Sample sizes are never large. If N is too small to get a sufficiently-precise estimate, you need to get more data (or make more assumptions). But once N is “large enough,” you can start subdividing the data to learn more […]. N is never enough because if it were “enough” you’d already be on to the next problem for which you need more data. - Andrew Gelman
My alternate title
Another alternate title
The Facebook Contagion Experiment
Adam D. I. Kramer et al. PNAS 2014;111:8788-8790
The headlines about the same effect size, but with different Ns might be:
Facebook’s unethical experiment has no apparent effect on users’ emotions.
Facebook is using mind control!
Expectations about how big an effect ought to be can only be provided by an articulated theory.
As a diphthong, /ay/ has a lot of ground to cover. Its nucleus raises before voicless consonants because
The rate of change across phonetic contexts ought to be proportional to the phonetic pressure driving the change in that context.
Based on 10,000 samples from the posterior of the model
precursor ~ TD * context * decade +
(TD * context | Speaker) + (1|Word)
Neither precursor model accounts for the behavior of both t-flaps and d-flaps.
How much does the patterning of the message and signal together reduce uncertainty about either in isolation?
“Big Data” or “Big for Sociolinguistics Data” is going to allow us to investigate some phenomena we’ve always been interested in in detail that wasn’t possible before. If we’re creative, we might be able to investigate phenomena that we hadn’t thought were investigatable.
In the almost total absence of large-scale, questionnaire-supported observations which would have to be extended or repeated over generations of speakers in a community, such a picture can be only guesswork. - Hoenigswald (1960)
It could be observed only by means of an enormous mass of mechanical records, reaching through several generations of speakers. - Bloomfield (1933)
We need to stay on our theory building game. Our theories need to make quantitative predictions about what we’ll observe in our big data. Without that, we risk devolving into a field of superficial and insightless observation.