library("languageVariationAndChangeData")
  library("dplyr")
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
  library("magrittr")
  library("ggplot2")
  library("knitr")

  opts_chunk$set(dev = "svg", fig.width = 8/1.25, fig.height = 5/1.25)

Intro

Plotting principles

We are here to learn the basics of ggplot2. ggplot2 will be useful for producing complex graphics relatively simply. It won’t be of any use for figuring out what is a sensible, useful, or accurate plot. To get a good handle on those, I’d advise simply reading a lot about data visualization.

Grammar of Graphics

ggplot2 is meant to be an implementation of the Grammar of Graphics, hence gg plot. The basic notion is that there is a grammar to the composition of graphical components in statistical graphics. By direcly controlling that grammar, you can generate a large set of carefully constructed graphics from a relatively small set of operations. As Wickham (2010), the author of ggplot2 said,

A good grammar will allow us to gain insight into the composition of complicated graphics, and reveal unexpected connections between seemingly different graphics

A good example of an unexpected connection would be that pie charts are just filled bar charts…

pie <- ggplot(mtcars, aes(x = factor(1), fill = factor(cyl))) +
       geom_bar(width = 1, position = "fill", color = "black")
pie

…in polar coordinates.

pie + coord_polar(theta = "y")

ggplot2 Materials

There are quite a few ggplot2 materials out there to guide you when you’re not sure what to do next. First and foremost are the internal help pages for every piece of ggplot2 we’ll cover here. They tend to be pretty useful.

?geom_jitter

ggplot2 Basic Concepts

There are a few basic concepts to wrap your mind around for using ggplot2. First, we construct plots out of layers. Every component of the graph, from the underlying data it’s plotting, to the coordinate system it’s plotted on, to the statistical summaries overlaid on top, to the axis labels, are layers in the plot. The consequence of this is that your use of ggplot2 will probably involve iterative addition of layer upon layer until you’re pleased with the results.

Next, the graphical properties which encode the data you’re presenting are the aesthetics of the plot. These include things like

  • x position
  • y position
  • size of elements
  • shape of elements
  • color of elements

The actual graphical elements utilized in a plot are the geometries, like

  • points
  • lines
  • line segments
  • bars
  • text

Some of these geometries have their own specific aesthetic settings. For example,

  • points
    • point shape
  • text
    • text labels
  • lines
    • line weight
    • line type

You’ll also frequently want to plot statistics overlaid on top of, or instead of the raw data. Some of these include

  • Smoothing and regression lines
  • One and two dimensional binning
  • Mean and medians with confidence intervals.

The aesthetics, geometries and statistics constitute the most important layers of a plot, but for fine tuning a plot for publication, there are a number of other things you’ll want to adjust. The most common one of these are the scales, which encompass things like

  • A logarithmic x or y axis
  • Customized color scales
  • Customized point shapes, or linetypes

The following sections are devoted to some of these basic elements in ggplot2.

Using ggplot2

We’ll be constructing plots with ggplot2 by building up “layers”. The layering of plot elements on top of each other is perhaps the most powerful aspect of the ggplot2 system. It means that relatively complex plots are built up of modular parts, which you can iteratively add or remove. For example, take this figure, which plots the relationship between vowel duration and F1 for 394 tokens of the lexical item “I”.

  I_jean <- read.delim("http://bit.ly/avml_ggplot2_data")