## ## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats': ## ## filter, lag
## The following objects are masked from 'package:base': ## ## intersect, setdiff, setequal, union
library("magrittr") library("ggplot2") library("knitr") opts_chunk$set(dev = "svg", fig.width = 8/1.25, fig.height = 5/1.25)
We are here to learn the basics of
ggplot2 will be useful for producing complex graphics relatively simply. It won’t be of any use for figuring out what is a sensible, useful, or accurate plot. To get a good handle on those, I’d advise simply reading a lot about data visualization.
ggplot2 is meant to be an implementation of the Grammar of Graphics, hence gg plot. The basic notion is that there is a grammar to the composition of graphical components in statistical graphics. By direcly controlling that grammar, you can generate a large set of carefully constructed graphics from a relatively small set of operations. As Wickham (2010), the author of
A good grammar will allow us to gain insight into the composition of complicated graphics, and reveal unexpected connections between seemingly different graphics
A good example of an unexpected connection would be that pie charts are just filled bar charts…
pie <- ggplot(mtcars, aes(x = factor(1), fill = factor(cyl))) + geom_bar(width = 1, position = "fill", color = "black") pie
…in polar coordinates.
pie + coord_polar(theta = "y")
There are quite a few
ggplot2 materials out there to guide you when you’re not sure what to do next. First and foremost are the internal help pages for every piece of
ggplot2 we’ll cover here. They tend to be pretty useful.
There are a few basic concepts to wrap your mind around for using ggplot2. First, we construct plots out of layers. Every component of the graph, from the underlying data it’s plotting, to the coordinate system it’s plotted on, to the statistical summaries overlaid on top, to the axis labels, are layers in the plot. The consequence of this is that your use of
ggplot2 will probably involve iterative addition of layer upon layer until you’re pleased with the results.
Next, the graphical properties which encode the data you’re presenting are the aesthetics of the plot. These include things like
The actual graphical elements utilized in a plot are the geometries, like
Some of these geometries have their own specific aesthetic settings. For example,
You’ll also frequently want to plot statistics overlaid on top of, or instead of the raw data. Some of these include
The aesthetics, geometries and statistics constitute the most important layers of a plot, but for fine tuning a plot for publication, there are a number of other things you’ll want to adjust. The most common one of these are the scales, which encompass things like
The following sections are devoted to some of these basic elements in
We’ll be constructing plots with
ggplot2 by building up “layers”. The layering of plot elements on top of each other is perhaps the most powerful aspect of the
ggplot2 system. It means that relatively complex plots are built up of modular parts, which you can iteratively add or remove. For example, take this figure, which plots the relationship between vowel duration and F1 for 394 tokens of the lexical item “I”.
I_jean <- read.delim("http://bit.ly/avml_ggplot2_data")