```{r}
1+1
```
[1] 2
Josef Fruehwald
January 17, 2023
To run R code in a Quarto notebook, you need to insert a “code chunk”. In visual editor mode, you can do that by typing the forward slash (/
) and start typing in “R Code Chunk”. In the source editor mode, you have to have a line with ```{r}
(three “backticks” followed by “r” in curly braces), then a few blank lines followed by another ```
To actually run the code, you can either click on the green play button, or press the appropriate hotkey for your system (COMMAND+RETURN on a mac, CTRL+ENTER on windows).
1 + 4
[1] 5
1 - 4
[1] -3
5 * 4
[1] 20
5 / 4
[1] 1.25
5 ^ 4
[1] 625
Honestly, instead of gambling on how R may or may not interpret PEMDAS, just add parentheses ( )
around every operation in the order you want it to happen.
(5 ^ (2 * 2)) / 6
[1] 104.1667
To assign values to a variable in R, you can use either <-
or ->
. Most style guides shun ->
, but I actually wind up using it a lot.
my_variable <- 4 * 5
print(my_variable)
[1] 20
my_variable / 2
[1] 10
When using a number in R, we can only use digits and dots (.
). If we try to enter “one hundred thousand” with a comma separator, we’ll get an error.
We also can’t use any percent signs (%
) or currency symbols ($
, £
, €
)
When we type in text without any quotes, R will assume it’s a variable or function that’s already been defined and go looking for it.
large <- 100000
large
[1] 1e+05
If the variable hasn’t been created already, we’ll get an error.
small
Error in eval(expr, envir, enclos): object 'small' not found
If we enter text inside of quotation marks, either single quotes '
or double quotes "
, R will instead treat the text as a value that we could, for example, assign to a variable, or just print out.
"small"
[1] "small"
tiny_synonym <- "small"
tiny_synonym
[1] "small"
You will often get confused about this and get the Error: object '' not found
message. Even if you do this for 15 years, you will still sometimes enter plain text when you meant to put it in quotes, and put text in quotes you meant to enter without. It’s always annoying, but doesn’t mean you’re bad at doing this.
There are two specialized values that you could call “True/False” or “Logical” or “Boolean” values
# fullnames
TRUE
[1] TRUE
FALSE
[1] FALSE
# Short Forms
T
[1] TRUE
F
[1] FALSE
These are often created using logical comparisons
large <- 100000
medium <- 600
large < medium
[1] FALSE
short_word <- "to"
nchar(short_word) == 2
[1] TRUE
When you have a missing value, that’s given a special NA
value.
numbers <- c(1, NA, 5)
numbers
[1] 1 NA 5
Vectors are basically 1 dimensional lists of values.1 You can have numeric, character or logical vectors in R, but you can’t mix types. One way to create vectors is with the c()
(for concatenate) function. There needs to be a comma ,
between every value that you add to a vector.
digital_words <- c(
"-dle",
"BFFR",
"chief twit",
"chronically online",
"crypto rug pull",
"touch grass",
"-verse"
)
print(digital_words)
[1] "-dle" "BFFR" "chief twit"
[4] "chronically online" "crypto rug pull" "touch grass"
[7] "-verse"
You can also create vectors of sequential vectors with the :
operator.
1:10
[1] 1 2 3 4 5 6 7 8 9 10
There are a lot of functions for creating vectors.
seq(from = 1, to = 5, length = 10)
[1] 1.000000 1.444444 1.888889 2.333333 2.777778 3.222222 3.666667 4.111111
[9] 4.555556 5.000000
seq_along(digital_words)
[1] 1 2 3 4 5 6 7
You can do arithmetic on a whole vector of numbers. digital_word_votes
is a vector of how many votes each word got. We can get the sum like so:
total_votes <- sum(digital_word_votes)
total_votes
[1] 212
Then, we can convert those vote counts to proportions by dividing them by the total.
digital_word_votes / total_votes
[1] 0.39622642 0.06603774 0.01886792 0.14150943 0.03773585 0.30188679 0.03773585
And we can convert that to percentages by multiplying by 100.
(digital_word_votes / total_votes) * 100
[1] 39.622642 6.603774 1.886792 14.150943 3.773585 30.188679 3.773585
If you’ve never programmed before, this part will make sense, and if you haven’t programmed before, this part will be confusing.
If you have a vector, and you want to get the first value from it, you put square brackets []
after the variable name, and put 1
inside.
print(digital_words)
[1] "-dle" "BFFR" "chief twit"
[4] "chronically online" "crypto rug pull" "touch grass"
[7] "-verse"
digital_words[1]
[1] "-dle"
If you want a range of values from a vector, you can give it a vector of numeric indices.
digital_words[2:5]
[1] "BFFR" "chief twit" "chronically online"
[4] "crypto rug pull"
Also really useful is the ability to do logical indexing. For example, if we wanted to see which digital words got ten or fewer votes, we can do
digital_word_votes <= 10
[1] FALSE FALSE TRUE FALSE TRUE FALSE TRUE
We can use this sequence of TRUE
and FALSE
values to get the actual words from the digital_words
vector.
digital_words[digital_word_votes <= 10]
[1] "chief twit" "crypto rug pull" "-verse"
The most common kind of data structure we’re going to be working with are Data Frames. These are two dimensional structures with rows and columns. The data types within each column all need to be the same.
word_df <- data.frame(
type = "digital",
word = digital_words,
votes = digital_word_votes
)
print(word_df)
type word votes
1 digital -dle 84
2 digital BFFR 14
3 digital chief twit 4
4 digital chronically online 30
5 digital crypto rug pull 8
6 digital touch grass 64
7 digital -verse 8
To navigate data frames, there are a few handy functions. First, in RStudio you can launch a viewer with View()
View(word_df)
Keeping things inside the Quarto notebook, other useful functions are summary()
, nrow()
, ncol()
and colnames()
.
To get all of the data from a single column of a data frame, we can put $
after the data frame variable name, then the name of the column.
word_df$word
[1] "-dle" "BFFR" "chief twit"
[4] "chronically online" "crypto rug pull" "touch grass"
[7] "-verse"
We’re going to have more, interesting ways to get specific rows from a data frame later on in the course, but for now if you want to subset just the rows that have 10 or fewer votes, we can use subset
.
subset(word_df, votes <= 10)
type word votes
3 digital chief twit 4
5 digital crypto rug pull 8
7 digital -verse 8
The “pipe” (|>
) is going to play a big role in our R workflow. What it does is take whatever is on its left hand side and inserts it as the first argument to the function on the left hand side. Here’s a preview.
word_df |>
subset(votes <= 10)
type word votes
3 digital chief twit 4
5 digital crypto rug pull 8
7 digital -verse 8
Packages get installed once with install.pacakges()
# Only needs to be run once ever, or when updating
install.packages("tidyverse")
But they need to be loaded every time with library()
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0 ✔ purrr 1.0.1
✔ tibble 3.1.8 ✔ dplyr 1.1.0
✔ tidyr 1.3.0 ✔ stringr 1.5.0
✔ readr 2.1.3 ✔ forcats 0.5.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
If you try to load a package that you haven’t installed yet, you’ll get this error:
library(fake_library)
Error in library(fake_library): there is no package called 'fake_library'
The reason they aren’t called “lists” is because there’s another kind of data object called a list that has different properties.↩︎