The way to learn R is by using it, so jump into some practice problems. You can use the “practice with datasets” code copied below, or download swirl and play around with that. Or both!
To install swirl, first install R if you haven’t already (see instructions in the slides above), and open it. In the command line, type
install.packages("swirl")
and hit Enter. You need a working internet connection. Once R has installed the package, you also need to load it. Type
library(swirl)
and hit Enter. Once you do that, swirl will take over and start giving you instructions (and peppy feedback!) to take you though the basics of R. Have fun!
This is some code for you to work through to practice using datasets in R.
Download the relevant datasets from dropbox:
https://www.dropbox.com/s/cw5hrw62exn3lvx/rclub1_data1.txt
https://www.dropbox.com/s/zvw58hhel8bu3gh/rclub1_data2.sav
Make a note of where these data files get saved when you download them (your downloads folder, maybe? Or the desktop?). You’ll need to know where they are to be able to access them from R.
Note: this is meant to provide practice moving data in and out of R. suggested functions are provided, but you need to fill in the arguments, etc. For help on how to use the functions, enter ?function.name in the command window (e.g. ?read.table).
To save your work, I recommend copying your code into an R script and working from there. In R Studio, go to File > New File > R Script. In R, go to File > New Document. In either case, this will open up a blank text file. You can copy-paste this code there, and then fill in your answers as work through the problems. Any line of text that begins with # is a comment.
First, check out the file: open the file rclub1_data1.txt in a text editor or excel. you’ll see it’s tab-deliminated. now we want to bring it into R so we can use it. before we tell R which file to open, we need to know what folder R is currently set to look in (working directory). If you’re trying unsucessfully to open a file, the first thing you should check is that your wd is correct.
getwd()
## [1] "/Users/TARDIS/Dropbox/RClub/rclub_code"
# if necessary, change wd so it matches the folder where you've got your data files
setwd("/Users/TARDIS/Dropbox/RClub")
# we want to read in the data file, so first you need to learn about the appropriate function (read.table) so you know how to structure the command. You'll want to do this for each new function you use.
?read.table
# now we can tell R to open the file, and save it as an object called df (short for dataframe). you could name it whatever you want, though. the name of the file ("rclub_data1.txt" must be in quotes, so R knows to look for a string matching that rather than to look for an existing object with that name).
df <- read.table("rclub1_data1.txt", header = T, sep = "\t")
# look at the first part of your dataframe, to eyeball the data.
head(df)
## gender HowSmartTheyAre HowManyPointsTheyGot DidTheyEatBreakfast
## 1 M 95 3 Y
## 2 F 85 3 Y
## 3 F 114 1 Y
## 4 M 108 2 Y
## 5 M 98 3 Y
## 6 F 86 3 Y
# check the last chunk of the dataset, too, just because you're curious.
tail(df)
## gender HowSmartTheyAre HowManyPointsTheyGot DidTheyEatBreakfast
## 45 F 87 5 N
## 46 M 108 2 N
## 47 F 91 3 N
## 48 M 95 4 N
## 49 F 112 1 N
## 50 F 104 1 N
# actually, let's look at it in a new window.
View(df)
# get basic information about your dataframe (dimensions, variable types, etc.)
dim(df)
## [1] 50 4
str(df)
## 'data.frame': 50 obs. of 4 variables:
## $ gender : Factor w/ 2 levels "F","M": 2 1 1 2 2 1 2 1 2 1 ...
## $ HowSmartTheyAre : int 95 85 114 108 98 86 85 107 100 97 ...
## $ HowManyPointsTheyGot: int 3 3 1 2 3 3 3 5 2 3 ...
## $ DidTheyEatBreakfast : Factor w/ 2 levels "N","Y": 2 2 2 2 2 2 2 2 1 1 ...
# what are the variable names (column names)?
colnames(df)
## [1] "gender" "HowSmartTheyAre" "HowManyPointsTheyGot"
## [4] "DidTheyEatBreakfast"
# you don't like these column names. rename several of the variables.
colnames(df) <- c("gender", "IQ", "Score", "Breakfast?")
# rename just one variable: change "gender" to "male".
colnames(df)[1] <- "male"
# change the coding scheme for "male" from M/F to Y/N. Remeber that "male" is currently a factor variable. Use str() and View() to check how the coding scheme is applied when you change the levels of the variable.
levels(df$male) # check what they are originally first
## [1] "F" "M"
levels(df$male) <- c("N","Y") # change them to what you want
# you notice a typo (the first subject should actually be female, not male). damn RAs and their sloppy data entry! edit just that cell.
df$male[1] <- "N"
# save this dataframe in csv format, so you can open it in other software:
# write.csv()
# open that csv file back in R, because you're fickle. (note that you can also just use read.table to read csv files - they're actually the same function, just with different defaults)
# read.csv()
# and now save it again, but this time as tab-deliminated. or try using other deliminators, if you like (e.g spaces).
# write.table()
# open the file rclub1_data2.sav (an SPSS file). first, learn about read.spss:
?read.spss
## No documentation for 'read.spss' in specified packages and libraries:
## you could try '??read.spss'
# if R can't find the function, use two ?'s to conduct a boarder search:
??read.spss
#read.spss is in the package "foreign". do you have the foreign package attached already?
sessionInfo()
## R version 3.0.2 (2013-09-25)
## Platform: x86_64-apple-darwin10.8.0 (64-bit)
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.4 evaluate_0.5.5 formatR_0.10 htmltools_0.2.4
## [5] knitr_1.6 rmarkdown_0.3.10 stringr_0.6.2 tools_3.0.2
# if not, load the foreign library.
library(foreign)
# now try ?read.spss again, then open the fie rclub1_data2.sav
df <- read.spss("rclub1_data2.sav")
## Warning: rclub1_data2.sav: Unrecognized record type 7, subtype 18
## encountered in system file
## re-encoding from latin1
# check out the dataframe to see what variables you have, how many cases you have, etc. You can use the same functions you used before.
# add a new column on the end with average test score by taking the mean of each person's scores for the three tests
df$TestAve <- (df$Test1 + df$Test2 + df$Test3)/3