Survey Responses

Author

Josef Fruehwald

Published

April 13, 2023

Setup

Package install

Loading required package: here
here() starts at /Users/joseffruehwald/Documents/AS500_site
Loading required package: fs
Loading required package: janitor

Attaching package: 'janitor'
The following objects are masked from 'package:stats':

    chisq.test, fisher.test
if(!require(quanteda.textplots)){
  install.packages("quanteda.textplots")
  library(quanteda.textplots)
}
Loading required package: quanteda.textplots

Package load

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ tibble  3.2.1     ✔ dplyr   1.1.1
✔ tidyr   1.3.0     ✔ stringr 1.5.0
✔ readr   2.1.3     ✔ forcats 0.5.2
✔ purrr   1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
Package version: 3.3.0
Unicode version: 14.0
ICU version: 71.1
Parallel computing: 10 of 10 threads used.
See https://quanteda.io for tutorials and examples.

Data download

Source: ottlex.org

Downloading

dir_create(here("data", "zipfiles"))
download.file(
  "https://www.ottlex.org/s/OTT-2022-Raw-Data.zip",
  destfile = here("data", "zipfiles", "ott.zip")
  )

unzipping

dir_create(here("data", "ott"))
unzip(
  zipfile = here("data", "zipfiles", "ott.zip"),
  junkpaths = T,
  exdir = here("data", "ott")
)

Reading in

ott_data <- read_csv(here("data", "ott", "OTT Raw Data-Raw Data Grid.csv")) |> 
  clean_names()
Rows: 2413 Columns: 27
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (26): Neighborhood Cluster, Neighborhood MC, Neighborhood - OR, Neighbor...
dbl  (1): OTT Raw Data Response

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Data Processing

ott_data |> 
  select(
    ott_raw_data_response,
    ends_with("_or"),
    gender, 
    age_range,
    race,
    highest_education_level,
    zip_code,
    likely_council_district
    ) ->
  ott_open_response
ott_open_response |> 
  pivot_longer(
    cols = ends_with("_or"),
    names_to = "question",
    values_to = "response"
  ) |> 
  mutate(
    question = str_remove(question, "_or"),
    response = str_squish(response)
  ) |> 
  filter(str_length(response) > 0) ->
  ott_tidied

To Corpus

ott_tidied |> 
  mutate(document = str_c(ott_raw_data_response, question, sep = "_"))|> 
  corpus(docid_field = "document", text_field = "response") ->
  ott_corpus
summary(ott_corpus, n = 6)
Corpus consisting of 10949 documents, showing 6 documents:

              Text Types Tokens Sentences ott_raw_data_response gender
    1_neighborhood    29     32         1                     1 Female
     1_environment    33     40         2                     1 Female
 1_jobs_prosperity    25     30         1                     1 Female
  1_transportation    55     71         3                     1 Female
      1_ur_balance    39     45         1                     1 Female
             1_lex    41     57         1                     1 Female
 age_range  race   highest_education_level zip_code likely_council_district
     40-50 White More than Master's Degree    40502                       3
     40-50 White More than Master's Degree    40502                       3
     40-50 White More than Master's Degree    40502                       3
     40-50 White More than Master's Degree    40502                       3
     40-50 White More than Master's Degree    40502                       3
     40-50 White More than Master's Degree    40502                       3
        question
    neighborhood
     environment
 jobs_prosperity
  transportation
      ur_balance
             lex

Content analysis

2 word collocations

ott_corpus |> 
  textstat_collocations(size = 2) |> 
  slice(1:20) |> 
  rmarkdown::paged_table()
ott_corpus |> 
  tokens(remove_punct = T) |> 
  tokens_compound(
    phrase(c("urban service boundary", "urban services boundary"))
  ) |> 
  tokens_tolower() |> 
  tokens_remove(pattern = stopwords()) ->
  ott_tokens
ott_tokens |> 
  dfm() ->
  ott_dfm
ott_dfm |> 
  dfm_group(groups = ott_dfm$question) |> 
  textstat_keyness(target = "transportation") |> 
  textplot_keyness()

ott_dfm |> 
  dfm_group(groups = ott_dfm$question) |> 
  textstat_keyness(target = "neighborhood") |> 
  textplot_keyness()

ott_dfm |> 
  dfm_group(ott_dfm$highest_education_level) |> 
  textstat_keyness(target = "More than Master's Degree") |> 
  textplot_keyness()

ott_dfm |> 
  dfm_group(ott_dfm$highest_education_level) |> 
  textstat_keyness(target = "High School Diploma or equivalent") |> 
  textplot_keyness()

Clustering


Attaching package: 'rainette'
The following object is masked from 'package:stats':

    cutree
clust <- rainette(ott_dfm)
Warning in rainette(ott_dfm): some documents don't have any term, they won't be
assigned to any cluster.
  Clustering...
  Done.
rainette_explor(clust, ott_dfm, ott_corpus)
rainette_plot(clust, ott_dfm)