Basics of R Syntax

R basics
Author

Josef Fruehwald

Published

January 17, 2023

Running R Code in a Quarto Notebook

To run R code in a Quarto notebook, you need to insert a “code chunk”. In visual editor mode, you can do that by typing the forward slash (/) and start typing in “R Code Chunk”. In the source editor mode, you have to have a line with ```{r} (three “backticks” followed by “r” in curly braces), then a few blank lines followed by another ```

```{r}
1+1
```
[1] 2

To actually run the code, you can either click on the green play button, or press the appropriate hotkey for your system (COMMAND+RETURN on a mac, CTRL+ENTER on windows).

Mathematical Operations

Addition

1 + 4
[1] 5

Subtraction

1 - 4
[1] -3

Multiplication

5 * 4
[1] 20

Division

5 / 4
[1] 1.25

Exponentiation

5 ^ 4
[1] 625

Orders of Operation

Honestly, instead of gambling on how R may or may not interpret PEMDAS, just add parentheses ( ) around every operation in the order you want it to happen.

(5 ^ (2 * 2)) / 6
[1] 104.1667

Assignment

To assign values to a variable in R, you can use either <- or ->. Most style guides shun ->, but I actually wind up using it a lot.

my_variable <- 4 * 5
print(my_variable)
[1] 20
my_variable / 2
[1] 10

Data Types

Numeric

When using a number in R, we can only use digits and dots (.). If we try to enter “one hundred thousand” with a comma separator, we’ll get an error.

big_number <- 100,000
Error: <text>:1:18: unexpected ','
1: big_number <- 100,
                     ^

We also can’t use any percent signs (%) or currency symbols ($, £, )

Characters

When we type in text without any quotes, R will assume it’s a variable or function that’s already been defined and go looking for it.

large <- 100000
large
[1] 1e+05

If the variable hasn’t been created already, we’ll get an error.

small
Error in eval(expr, envir, enclos): object 'small' not found

If we enter text inside of quotation marks, either single quotes ' or double quotes ", R will instead treat the text as a value that we could, for example, assign to a variable, or just print out.

"small"
[1] "small"
tiny_synonym <- "small"
tiny_synonym
[1] "small"
Common Error

You will often get confused about this and get the Error: object '' not found message. Even if you do this for 15 years, you will still sometimes enter plain text when you meant to put it in quotes, and put text in quotes you meant to enter without. It’s always annoying, but doesn’t mean you’re bad at doing this.

Logical

There are two specialized values that you could call “True/False” or “Logical” or “Boolean” values

# fullnames
TRUE
[1] TRUE
FALSE
[1] FALSE
# Short Forms
T
[1] TRUE
F
[1] FALSE

These are often created using logical comparisons

large  <- 100000
medium <-    600

large < medium
[1] FALSE
short_word <- "to"

nchar(short_word) == 2
[1] TRUE

NA

When you have a missing value, that’s given a special NA value.

numbers <- c(1, NA, 5)
numbers
[1]  1 NA  5

Vectors

Vectors are basically 1 dimensional lists of values.1 You can have numeric, character or logical vectors in R, but you can’t mix types. One way to create vectors is with the c() (for concatenate) function. There needs to be a comma , between every value that you add to a vector.

digital_words <- c(
  "-dle",
  "BFFR",
  "chief twit",
  "chronically online",
  "crypto rug pull",
  "touch grass",
  "-verse"
)
print(digital_words)
[1] "-dle"               "BFFR"               "chief twit"        
[4] "chronically online" "crypto rug pull"    "touch grass"       
[7] "-verse"            
digital_word_votes <- c(
  84,
  14,
  4,
  30,
  8,
  64,
  8
)
print(digital_word_votes)
[1] 84 14  4 30  8 64  8

You can also create vectors of sequential vectors with the : operator.

1:10
 [1]  1  2  3  4  5  6  7  8  9 10
More vector creating possibilities

There are a lot of functions for creating vectors.

seq(from = 1, to = 5, length = 10)
 [1] 1.000000 1.444444 1.888889 2.333333 2.777778 3.222222 3.666667 4.111111
 [9] 4.555556 5.000000
seq_along(digital_words)
[1] 1 2 3 4 5 6 7
rep(c("a", "b"), times = 3)
[1] "a" "b" "a" "b" "a" "b"
rep(c("a", "b"), each = 3)
[1] "a" "a" "a" "b" "b" "b"

Vector Arithmetic

You can do arithmetic on a whole vector of numbers. digital_word_votes is a vector of how many votes each word got. We can get the sum like so:

total_votes <- sum(digital_word_votes)
total_votes
[1] 212

Then, we can convert those vote counts to proportions by dividing them by the total.

digital_word_votes / total_votes
[1] 0.39622642 0.06603774 0.01886792 0.14150943 0.03773585 0.30188679 0.03773585

And we can convert that to percentages by multiplying by 100.

(digital_word_votes / total_votes) * 100
[1] 39.622642  6.603774  1.886792 14.150943  3.773585 30.188679  3.773585

Indexing

If you’ve never programmed before, this part will make sense, and if you haven’t programmed before, this part will be confusing.

If you have a vector, and you want to get the first value from it, you put square brackets [] after the variable name, and put 1 inside.

print(digital_words)
[1] "-dle"               "BFFR"               "chief twit"        
[4] "chronically online" "crypto rug pull"    "touch grass"       
[7] "-verse"            
digital_words[1]
[1] "-dle"

If you want a range of values from a vector, you can give it a vector of numeric indices.

digital_words[2:5]
[1] "BFFR"               "chief twit"         "chronically online"
[4] "crypto rug pull"   

Logical Indexing

Also really useful is the ability to do logical indexing. For example, if we wanted to see which digital words got ten or fewer votes, we can do

digital_word_votes <= 10
[1] FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE

We can use this sequence of TRUE and FALSE values to get the actual words from the digital_words vector.

digital_words[digital_word_votes <= 10]
[1] "chief twit"      "crypto rug pull" "-verse"         

Data Frames

The most common kind of data structure we’re going to be working with are Data Frames. These are two dimensional structures with rows and columns. The data types within each column all need to be the same.

word_df <- data.frame(
  type = "digital",
  word = digital_words,
  votes = digital_word_votes  
)
print(word_df)
     type               word votes
1 digital               -dle    84
2 digital               BFFR    14
3 digital         chief twit     4
4 digital chronically online    30
5 digital    crypto rug pull     8
6 digital        touch grass    64
7 digital             -verse     8

Indexing Data Frames

To get all of the data from a single column of a data frame, we can put $ after the data frame variable name, then the name of the column.

word_df$word
[1] "-dle"               "BFFR"               "chief twit"        
[4] "chronically online" "crypto rug pull"    "touch grass"       
[7] "-verse"            

We’re going to have more, interesting ways to get specific rows from a data frame later on in the course, but for now if you want to subset just the rows that have 10 or fewer votes, we can use subset.

subset(word_df, votes <= 10)
     type            word votes
3 digital      chief twit     4
5 digital crypto rug pull     8
7 digital          -verse     8
Pipe Preview

The “pipe” (|>) is going to play a big role in our R workflow. What it does is take whatever is on its left hand side and inserts it as the first argument to the function on the left hand side. Here’s a preview.

word_df |> 
  subset(votes <= 10)
     type            word votes
3 digital      chief twit     4
5 digital crypto rug pull     8
7 digital          -verse     8

Packages

Packages get installed once with install.pacakges()

# Only needs to be run once ever, or when updating
install.packages("tidyverse")

But they need to be loaded every time with library()

# Needs to be run every time
library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0     ✔ purrr   1.0.1
✔ tibble  3.1.8     ✔ dplyr   1.1.0
✔ tidyr   1.3.0     ✔ stringr 1.5.0
✔ readr   2.1.3     ✔ forcats 0.5.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

If you try to load a package that you haven’t installed yet, you’ll get this error:

library(fake_library)
Error in library(fake_library): there is no package called 'fake_library'

Footnotes

  1. The reason they aren’t called “lists” is because there’s another kind of data object called a list that has different properties.↩︎

Reuse

CC-BY-SA 4.0