`ggplot2`

```
library("languageVariationAndChangeData")
library("dplyr")
```

```
##
## Attaching package: 'dplyr'
```

```
## The following objects are masked from 'package:stats':
##
## filter, lag
```

```
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
```

```
library("magrittr")
library("ggplot2")
library("knitr")
opts_chunk$set(dev = "svg", fig.width = 8/1.25, fig.height = 5/1.25)
```

We are here to learn the basics of `ggplot2`

. `ggplot2`

*will* be useful for producing complex graphics relatively simply. It won’t be of any use for figuring out what is a sensible, useful, or accurate plot. To get a good handle on those, I’d advise simply reading a lot about data visualization.

`ggplot2`

is meant to be an implementation of the Grammar of Graphics, hence *gg* plot. The basic notion is that there is a grammar to the composition of graphical components in statistical graphics. By direcly controlling that grammar, you can generate a large set of carefully constructed graphics from a relatively small set of operations. As Wickham (2010), the author of `ggplot2`

said,

A good grammar will allow us to gain insight into the composition of complicated graphics, and reveal unexpected connections between seemingly different graphics

A good example of an unexpected connection would be that pie charts are just filled bar charts…

```
pie <- ggplot(mtcars, aes(x = factor(1), fill = factor(cyl))) +
geom_bar(width = 1, position = "fill", color = "black")
pie
```

…in polar coordinates.

`pie + coord_polar(theta = "y")`

`ggplot2`

MaterialsThere are quite a few `ggplot2`

materials out there to guide you when you’re not sure what to do next. First and foremost are the internal help pages for every piece of `ggplot2`

we’ll cover here. They tend to be pretty useful.

`?geom_jitter`

`ggplot2`

Basic ConceptsThere are a few basic concepts to wrap your mind around for using ggplot2. First, we construct plots out of **layers**. Every component of the graph, from the underlying data it’s plotting, to the coordinate system it’s plotted on, to the statistical summaries overlaid on top, to the axis labels, are layers in the plot. The consequence of this is that your use of `ggplot2`

will probably involve iterative addition of layer upon layer until you’re pleased with the results.

Next, the graphical properties which encode the data you’re presenting are the **aesthetics** of the plot. These include things like

- x position
- y position
- size of elements
- shape of elements
- color of elements

The actual graphical elements utilized in a plot are the **geometries**, like

- points
- lines
- line segments
- bars
- text

Some of these geometries have their own specific aesthetic settings. For example,

- points
- point shape

- text
- text labels

- lines
- line weight
- line type

You’ll also frequently want to plot **statistics** overlaid on top of, or instead of the raw data. Some of these include

- Smoothing and regression lines
- One and two dimensional binning
- Mean and medians with confidence intervals.

The **aesthetics**, **geometries** and **statistics** constitute the most important **layers** of a plot, but for fine tuning a plot for publication, there are a number of other things you’ll want to adjust. The most common one of these are the **scales**, which encompass things like

- A logarithmic x or y axis
- Customized color scales
- Customized point shapes, or linetypes

The following sections are devoted to some of these basic elements in `ggplot2`

.

`ggplot2`

We’ll be constructing plots with `ggplot2`

by building up “layers”. The layering of plot elements on top of each other is perhaps the most powerful aspect of the `ggplot2`

system. It means that relatively complex plots are built up of modular parts, which you can iteratively add or remove. For example, take this figure, which plots the relationship between vowel duration and F1 for 394 tokens of the lexical item “I”.

` I_jean <- read.delim("http://bit.ly/avml_ggplot2_data")`