Indexing

council <- c("Aberdeenshire", "Angus", "Argyll and Bute", "City of Aberdeen",
"City of Dundee", "City of Edinburgh", "City of Glasgow", "Clackmannanshire", "Dumfries and Galloway", "East Ayrshire", "East Dunbartonshire",
"East Lothian", "East Renfrewshire", "Falkirk", "Fife", "Highland", "Inverclyde", "Midlothian", "Moray", "Na h-Eileanan Siar (Western Isles)", "North Ayrshire", "North Lanarkshire", "Orkney Islands", "Perth and Kinross", "Renfrewshire", "Scottish Borders", "Shetland Islands", "South Ayrshire", "South Lanarkshire", "Stirling", "West Dunbartonshire", "West Lothian")
population <- c(245800, 110600, 89200, 217100, 144300, 486100, 592800, 50600, 148200, 120200, 104600, 97500, 89500, 153300, 365000, 221600,
                79800, 81100, 87700, 26200, 135200, 326400, 20100, 147800, 170300,
                112900, 22400, 111400, 311900, 89900, 90600, 172100)

The most basic way to get the councils that match the population profiles was to use logical indexing.

council[population > 150000]
 [1] "Aberdeenshire"     "City of Aberdeen"  "City of Edinburgh" "City of Glasgow"   "Falkirk"          
 [6] "Fife"              "Highland"          "North Lanarkshire" "Renfrewshire"      "South Lanarkshire"
[11] "West Lothian"     
council[population > 30000 & population < 150000]
 [1] "Angus"                 "Argyll and Bute"       "City of Dundee"        "Clackmannanshire"     
 [5] "Dumfries and Galloway" "East Ayrshire"         "East Dunbartonshire"   "East Lothian"         
 [9] "East Renfrewshire"     "Inverclyde"            "Midlothian"            "Moray"                
[13] "North Ayrshire"        "Perth and Kinross"     "Scottish Borders"      "South Ayrshire"       
[17] "Stirling"              "West Dunbartonshire"  

I saw a few exercises that tried the following, which only accidentally works in this case.

council[population %in% 30000:150000]
 [1] "Angus"                 "Argyll and Bute"       "City of Dundee"        "Clackmannanshire"     
 [5] "Dumfries and Galloway" "East Ayrshire"         "East Dunbartonshire"   "East Lothian"         
 [9] "East Renfrewshire"     "Inverclyde"            "Midlothian"            "Moray"                
[13] "North Ayrshire"        "Perth and Kinross"     "Scottish Borders"      "South Ayrshire"       
[17] "Stirling"              "West Dunbartonshire"  

The way this works is first 30000:150000 creates a vector of all integers between and including 30000 and 150000. Then, it checks with every value of population to see if it is within this vector. However, this will not usually work, especially if you are working with any values that are not integers.

Another thing that I saw a lot of was people not being sure how to join together the two logical statements population > 30000 and population < 150000. Using a comma (,), gives strange results:

council[population > 30000 , population < 150000]
Error in council[population > 30000, population < 150000] : 
  incorrect number of dimensions

This is because when you put a comma inside of square brackets, like [ , ], R interprets the two positions on the sides of the commas as referring to rows and columns: [rows , columns]. But council doesn’t have rows and columns, it is just one long vector. The only way to combine two logical statements in R is with the ampersand &.

council[population > 30000 & population < 150000]
 [1] "Angus"                 "Argyll and Bute"       "City of Dundee"        "Clackmannanshire"     
 [5] "Dumfries and Galloway" "East Ayrshire"         "East Dunbartonshire"   "East Lothian"         
 [9] "East Renfrewshire"     "Inverclyde"            "Midlothian"            "Moray"                
[13] "North Ayrshire"        "Perth and Kinross"     "Scottish Borders"      "South Ayrshire"       
[17] "Stirling"              "West Dunbartonshire"  

I also saw a lot of people taking the two vectors, putting them into a data frame, and subsetting the dataframe like so:

council_df <- data.frame(Council = council, Population = population)
subset(council_df, Population > 150000)

This is a good instinct to have, since most of the rest of the data analysis techniques we’ll learn in the course depend on dataframes.

SVLR Data

Almost everyone was able to successfully load the data, and were able to successfully subset the data. However, many people struggled with correctly calculating the proportion of Scottish respondents who said tide/tied were different.

summary(svlr)
      Response         Wordpair   Scottish Edinburgh Gender       Age       
 different:318   tide/tied :395   ?: 17    ?: 30     *:  1   Min.   :10.00  
 same     :296   toad/towed:219   n:240    n:392     f:341   1st Qu.:20.00  
                                  y:357    y:192     m:272   Median :30.00  
                                                             Mean   :33.06  
                                                             3rd Qu.:40.00  
                                                             Max.   :80.00  

As you can see from the summary, there is data in this data frame for both tide/toed and toad/towed word pairs, as well as Scottish, Non-Scottish, and “It’s complicated” respondents.

nrow(svlr)
[1] 614

There are a total of 614 data points in this data set.

In order to find out how many how many Scottish people said that tied/tide were the same, we need to subset based on all three of Response, Wordpair and Scottish.

nrow(subset(svlr, Scottish == "y" & Wordpair == "tide/tied" & Response == "same"))
[1] 62

If we left out the Scottish == "y" subset, then we’d be mixing in data from people who said they were not Scottish. If we left out the Wordpair == "tide/tied" subset, we’d be mixing in data from the toad/towed wordpair.

To get the number of Scottish people said that tied/tide were different, we need to subset based on all three of these columns again.

nrow(subset(svlr, Scottish == "y" & Wordpair == "tide/tied" & Response == "different"))
[1] 184

To get the proportion of Scottish respondents who thought tide/tied was different is now pretty straightforward:

184/(64 + 184)
[1] 0.7419355

Doing it with table()

One thing we didn’t manage to cover yet in the practicals is the table() function. Let’s re-answer these questions with table(). First, we’ll just look at the Scottish data.

svlr_scottish <- subset(svlr, Scottish == "y")

Now, we can create a 2x2 table of counts for the Response and Wordpair columns as follows:

table(Response = svlr_scottish$Response, Wordpair = svlr_scottish$Wordpair)
           Wordpair
Response    tide/tied toad/towed
  different       184         44
  same             62         67
LS0tCnRpdGxlOiAiU2Vjb25kIFByYWN0aWNhbCBSIEV4ZXJjaXNlcywgTW9kZSBBbnN3ZXJzIgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgojIyBJbmRleGluZwoKYGBge3J9CmNvdW5jaWwgPC0gYygiQWJlcmRlZW5zaGlyZSIsICJBbmd1cyIsICJBcmd5bGwgYW5kIEJ1dGUiLCAiQ2l0eSBvZiBBYmVyZGVlbiIsCiJDaXR5IG9mIER1bmRlZSIsICJDaXR5IG9mIEVkaW5idXJnaCIsICJDaXR5IG9mIEdsYXNnb3ciLCAiQ2xhY2ttYW5uYW5zaGlyZSIsICJEdW1mcmllcyBhbmQgR2FsbG93YXkiLCAiRWFzdCBBeXJzaGlyZSIsICJFYXN0IER1bmJhcnRvbnNoaXJlIiwKIkVhc3QgTG90aGlhbiIsICJFYXN0IFJlbmZyZXdzaGlyZSIsICJGYWxraXJrIiwgIkZpZmUiLCAiSGlnaGxhbmQiLCAiSW52ZXJjbHlkZSIsICJNaWRsb3RoaWFuIiwgIk1vcmF5IiwgIk5hIGgtRWlsZWFuYW4gU2lhciAoV2VzdGVybiBJc2xlcykiLCAiTm9ydGggQXlyc2hpcmUiLCAiTm9ydGggTGFuYXJrc2hpcmUiLCAiT3JrbmV5IElzbGFuZHMiLCAiUGVydGggYW5kIEtpbnJvc3MiLCAiUmVuZnJld3NoaXJlIiwgIlNjb3R0aXNoIEJvcmRlcnMiLCAiU2hldGxhbmQgSXNsYW5kcyIsICJTb3V0aCBBeXJzaGlyZSIsICJTb3V0aCBMYW5hcmtzaGlyZSIsICJTdGlybGluZyIsICJXZXN0IER1bmJhcnRvbnNoaXJlIiwgIldlc3QgTG90aGlhbiIpCgpwb3B1bGF0aW9uIDwtIGMoMjQ1ODAwLCAxMTA2MDAsIDg5MjAwLCAyMTcxMDAsIDE0NDMwMCwgNDg2MTAwLCA1OTI4MDAsIDUwNjAwLCAxNDgyMDAsIDEyMDIwMCwgMTA0NjAwLCA5NzUwMCwgODk1MDAsIDE1MzMwMCwgMzY1MDAwLCAyMjE2MDAsCiAgICAgICAgICAgICAgICA3OTgwMCwgODExMDAsIDg3NzAwLCAyNjIwMCwgMTM1MjAwLCAzMjY0MDAsIDIwMTAwLCAxNDc4MDAsIDE3MDMwMCwKICAgICAgICAgICAgICAgIDExMjkwMCwgMjI0MDAsIDExMTQwMCwgMzExOTAwLCA4OTkwMCwgOTA2MDAsIDE3MjEwMCkKYGBgCgpUaGUgbW9zdCBiYXNpYyB3YXkgdG8gZ2V0IHRoZSBjb3VuY2lscyB0aGF0IG1hdGNoIHRoZSBwb3B1bGF0aW9uIHByb2ZpbGVzIHdhcyB0byB1c2UgbG9naWNhbCBpbmRleGluZy4gCmBgYHtyfQpjb3VuY2lsW3BvcHVsYXRpb24gPiAxNTAwMDBdCmBgYAoKYGBge3J9CmNvdW5jaWxbcG9wdWxhdGlvbiA+IDMwMDAwICYgcG9wdWxhdGlvbiA8IDE1MDAwMF0KYGBgCgpJIHNhdyBhIGZldyBleGVyY2lzZXMgdGhhdCB0cmllZCB0aGUgZm9sbG93aW5nLCAqKndoaWNoIG9ubHkgYWNjaWRlbnRhbGx5Kiogd29ya3MgaW4gdGhpcyBjYXNlLgoKYGBge3J9CmNvdW5jaWxbcG9wdWxhdGlvbiAlaW4lIDMwMDAwOjE1MDAwMF0KYGBgCgpUaGUgd2F5IHRoaXMgd29ya3MgaXMgZmlyc3QgYDMwMDAwOjE1MDAwMGAgY3JlYXRlcyBhIHZlY3RvciBvZiBhbGwgaW50ZWdlcnMgYmV0d2VlbiBhbmQgaW5jbHVkaW5nIGAzMDAwMGAgYW5kIGAxNTAwMDBgLiBUaGVuLCBpdCBjaGVja3Mgd2l0aCBldmVyeSB2YWx1ZSBvZiBgcG9wdWxhdGlvbmAgdG8gc2VlIGlmIGl0IGlzIHdpdGhpbiB0aGlzIHZlY3Rvci4gSG93ZXZlciwgdGhpcyB3aWxsIG5vdCAqdXN1YWxseSogd29yaywgZXNwZWNpYWxseSBpZiB5b3UgYXJlIHdvcmtpbmcgd2l0aCBhbnkgdmFsdWVzIHRoYXQgYXJlIG5vdCBpbnRlZ2Vycy4KCkFub3RoZXIgdGhpbmcgdGhhdCBJIHNhdyBhIGxvdCBvZiB3YXMgcGVvcGxlIG5vdCBiZWluZyBzdXJlIGhvdyB0byBqb2luIHRvZ2V0aGVyIHRoZSB0d28gbG9naWNhbCBzdGF0ZW1lbnRzIGBwb3B1bGF0aW9uID4gMzAwMDBgIGFuZCBgcG9wdWxhdGlvbiA8IDE1MDAwMGAuIFVzaW5nIGEgY29tbWEgKGAsYCksIGdpdmVzIHN0cmFuZ2UgcmVzdWx0czoKCmBgYHtyfQpjb3VuY2lsW3BvcHVsYXRpb24gPiAzMDAwMCAsIHBvcHVsYXRpb24gPCAxNTAwMDBdCmBgYAoKVGhpcyBpcyBiZWNhdXNlIHdoZW4geW91IHB1dCBhIGNvbW1hIGluc2lkZSBvZiBzcXVhcmUgYnJhY2tldHMsIGxpa2UgYFsgLCBdYCwgUiBpbnRlcnByZXRzIHRoZSB0d28gcG9zaXRpb25zIG9uIHRoZSBzaWRlcyBvZiB0aGUgY29tbWFzIGFzIHJlZmVycmluZyB0byByb3dzIGFuZCBjb2x1bW5zOiBgW3Jvd3MgLCBjb2x1bW5zXWAuIEJ1dCBgY291bmNpbGAgZG9lc24ndCBoYXZlIHJvd3MgYW5kIGNvbHVtbnMsIGl0IGlzIGp1c3Qgb25lIGxvbmcgdmVjdG9yLiBUaGUgb25seSB3YXkgdG8gY29tYmluZSB0d28gbG9naWNhbCBzdGF0ZW1lbnRzIGluIFIgaXMgd2l0aCB0aGUgYW1wZXJzYW5kIGAmYC4gCgpgYGB7cn0KY291bmNpbFtwb3B1bGF0aW9uID4gMzAwMDAgJiBwb3B1bGF0aW9uIDwgMTUwMDAwXQpgYGAKCkkgYWxzbyBzYXcgYSBsb3Qgb2YgcGVvcGxlIHRha2luZyB0aGUgdHdvIHZlY3RvcnMsIHB1dHRpbmcgdGhlbSBpbnRvIGEgZGF0YSBmcmFtZSwgYW5kIHN1YnNldHRpbmcgdGhlIGRhdGFmcmFtZSBsaWtlIHNvOgoKYGBge3J9CmNvdW5jaWxfZGYgPC0gZGF0YS5mcmFtZShDb3VuY2lsID0gY291bmNpbCwgUG9wdWxhdGlvbiA9IHBvcHVsYXRpb24pCmBgYAoKCmBgYHtyfQpzdWJzZXQoY291bmNpbF9kZiwgUG9wdWxhdGlvbiA+IDE1MDAwMCkKYGBgCgoKVGhpcyBpcyBhIGdvb2QgaW5zdGluY3QgdG8gaGF2ZSwgc2luY2UgbW9zdCBvZiB0aGUgcmVzdCBvZiB0aGUgZGF0YSBhbmFseXNpcyB0ZWNobmlxdWVzIHdlJ2xsIGxlYXJuIGluIHRoZSBjb3Vyc2UgZGVwZW5kIG9uIGRhdGFmcmFtZXMuCgojIyBTVkxSIERhdGEKCkFsbW9zdCBldmVyeW9uZSB3YXMgYWJsZSB0byBzdWNjZXNzZnVsbHkgbG9hZCB0aGUgZGF0YSwgYW5kIHdlcmUgYWJsZSB0byBzdWNjZXNzZnVsbHkgc3Vic2V0IHRoZSBkYXRhLiBIb3dldmVyLCBtYW55IHBlb3BsZSBzdHJ1Z2dsZWQgd2l0aCBjb3JyZWN0bHkgY2FsY3VsYXRpbmcgdGhlIHByb3BvcnRpb24gb2YgU2NvdHRpc2ggcmVzcG9uZGVudHMgd2hvIHNhaWQgdGlkZS90aWVkIHdlcmUgZGlmZmVyZW50LgoKYGBge3J9CnN2bHIgPC0gcmVhZC5jc3YoIi4uLy4uL3N2bHJfMjAxNS5jc3YiKQpgYGAKCmBgYHtyfQpzdW1tYXJ5KHN2bHIpCmBgYAoKQXMgeW91IGNhbiBzZWUgZnJvbSB0aGUgc3VtbWFyeSwgdGhlcmUgaXMgZGF0YSBpbiB0aGlzIGRhdGEgZnJhbWUgZm9yIGJvdGggdGlkZS90b2VkIGFuZCB0b2FkL3Rvd2VkIHdvcmQgcGFpcnMsIGFzIHdlbGwgYXMgU2NvdHRpc2gsIE5vbi1TY290dGlzaCwgYW5kICJJdCdzIGNvbXBsaWNhdGVkIiByZXNwb25kZW50cy4gCgpgYGB7cn0KbnJvdyhzdmxyKQpgYGAKClRoZXJlIGFyZSBhIHRvdGFsIG9mIDYxNCBkYXRhIHBvaW50cyBpbiB0aGlzIGRhdGEgc2V0LgoKSW4gb3JkZXIgdG8gZmluZCBvdXQgaG93IG1hbnkgaG93IG1hbnkgU2NvdHRpc2ggcGVvcGxlIHNhaWQgdGhhdCB0aWVkL3RpZGUgd2VyZSB0aGUgc2FtZSwgd2UgbmVlZCB0byBzdWJzZXQgYmFzZWQgb24gYWxsIHRocmVlIG9mIGBSZXNwb25zZWAsIGBXb3JkcGFpcmAgYW5kIGBTY290dGlzaGAuCgpgYGB7cn0KbnJvdyhzdWJzZXQoc3ZsciwgU2NvdHRpc2ggPT0gInkiICYgV29yZHBhaXIgPT0gInRpZGUvdGllZCIgJiBSZXNwb25zZSA9PSAic2FtZSIpKQpgYGAKCklmIHdlIGxlZnQgb3V0IHRoZSBgU2NvdHRpc2ggPT0gInkiYCBzdWJzZXQsIHRoZW4gd2UnZCBiZSBtaXhpbmcgaW4gZGF0YSBmcm9tIHBlb3BsZSB3aG8gc2FpZCB0aGV5IHdlcmUgbm90IFNjb3R0aXNoLiBJZiB3ZSBsZWZ0IG91dCB0aGUgYFdvcmRwYWlyID09ICJ0aWRlL3RpZWQiYCBzdWJzZXQsIHdlJ2QgYmUgbWl4aW5nIGluIGRhdGEgZnJvbSB0aGUgdG9hZC90b3dlZCB3b3JkcGFpci4KClRvIGdldCB0aGUgbnVtYmVyIG9mIFNjb3R0aXNoIHBlb3BsZSBzYWlkIHRoYXQgdGllZC90aWRlIHdlcmUgZGlmZmVyZW50LCB3ZSBuZWVkIHRvIHN1YnNldCBiYXNlZCBvbiBhbGwgdGhyZWUgb2YgdGhlc2UgY29sdW1ucyBhZ2Fpbi4KCmBgYHtyfQpucm93KHN1YnNldChzdmxyLCBTY290dGlzaCA9PSAieSIgJiBXb3JkcGFpciA9PSAidGlkZS90aWVkIiAmIFJlc3BvbnNlID09ICJkaWZmZXJlbnQiKSkKYGBgCgpUbyBnZXQgdGhlIHByb3BvcnRpb24gb2YgU2NvdHRpc2ggcmVzcG9uZGVudHMgd2hvIHRob3VnaHQgdGlkZS90aWVkIHdhcyBkaWZmZXJlbnQgaXMgbm93IHByZXR0eSBzdHJhaWdodGZvcndhcmQ6CgpgYGB7cn0KMTg0Lyg2NCArIDE4NCkKYGBgCgojIyBEb2luZyBpdCB3aXRoIGB0YWJsZSgpYAoKT25lIHRoaW5nIHdlIGRpZG4ndCBtYW5hZ2UgdG8gY292ZXIgeWV0IGluIHRoZSBwcmFjdGljYWxzIGlzIHRoZSBgdGFibGUoKWAgZnVuY3Rpb24uIExldCdzIHJlLWFuc3dlciB0aGVzZSBxdWVzdGlvbnMgd2l0aCBgdGFibGUoKWAuIEZpcnN0LCB3ZSdsbCBqdXN0IGxvb2sgYXQgdGhlIFNjb3R0aXNoIGRhdGEuCgpgYGB7cn0Kc3Zscl9zY290dGlzaCA8LSBzdWJzZXQoc3ZsciwgU2NvdHRpc2ggPT0gInkiKQpgYGAKCk5vdywgd2UgY2FuIGNyZWF0ZSBhIDJ4MiB0YWJsZSBvZiBjb3VudHMgZm9yIHRoZSBgUmVzcG9uc2VgIGFuZCBgV29yZHBhaXJgIGNvbHVtbnMgYXMgZm9sbG93czoKCmBgYHtyfQp0YWJsZShSZXNwb25zZSA9IHN2bHJfc2NvdHRpc2gkUmVzcG9uc2UsIFdvcmRwYWlyID0gc3Zscl9zY290dGlzaCRXb3JkcGFpcikKYGBgCgoKCg==