ANOVA + Contrasts in R
The linked Dropbox file has code and data files for doing contrasts and ANOVA in R.
https://www.dropbox.com/sh/132z6stjuaapn4c/AAB8TZoNIck5FH395vRpDYPPa?dl=0
R Club 11.3.15 ANOVA Contrasts
Jessica Kosie
November 3, 2015
# set the working directory
setwd("/Users/jessicakosie/Dropbox/RClub_11.3.15")
# clear the environment
rm(list=ls())
Planned Contrasts in R
# read in Data Set #1
data <- read.csv("RClub_DataSet1_11.3.15.csv", header = TRUE)
# get descriptives
library(psych)
## Warning: package 'psych' was built under R version 3.2.2
describeBy(data$score, data$animal, mat=TRUE)
## item group1 vars n mean sd median trimmed mad min max
## 11 1 Cougar 1 10 3.0 1.1547005 3.0 3.000 1.4826 1 5
## 12 2 Dog 1 10 9.1 0.9944289 9.0 9.250 1.4826 7 10
## 13 3 HouseCat 1 10 1.6 0.6992059 1.5 1.500 0.7413 1 3
## 14 4 Wolf 1 10 6.9 1.1972190 7.0 6.875 1.4826 5 9
## range skew kurtosis se
## 11 4 0.0000000 -0.9750000 0.3651484
## 12 3 -0.7809801 -0.5931107 0.3144660
## 13 2 0.5616761 -1.0787603 0.2211083
## 14 4 0.1678308 -1.1806760 0.3785939
# run an omnibus ANOVA
model <- aov(score ~ animal, data = data)
summary(model)
## Df Sum Sq Mean Sq F value Pr(>F)
## animal 3 358.9 119.63 112.7 <2e-16 ***
## Residuals 36 38.2 1.06
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# look at the levels of our factor
levels(data$animal)
## [1] "Cougar" "Dog" "HouseCat" "Wolf"
# tell R which groups to compare
c1 <- c(.5, -.5, .5, -.5) # canines vs. felines
c2 <- c(1, 0, -1, 0) # cougars vs. housecats
c3 <- c(0, 1, 0, -1) # dogs vs. wolves
# combined the above 3 lines into a matrix
mat <- cbind(c1,c2,c3)
# tell R that the matrix gives the contrasts you want
contrasts(data$animal) <- mat
# these lines give you your results
model1 <- aov(score ~ animal, data = data)
# can just look at the omnibus ANOVA (same as above)
summary(model1)
## Df Sum Sq Mean Sq F value Pr(>F)
## animal 3 358.9 119.63 112.7 <2e-16 ***
## Residuals 36 38.2 1.06
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Make sure to use summary.aov here or 'split' might not work
summary.aov(model1, split=list(animal=list("Canines vs. Felines"=1, "Cougars vs House Cats" = 2, "Wolves vs Dogs"=3)))
## Df Sum Sq Mean Sq F value Pr(>F)
## animal 3 358.9 119.6 112.743 < 2e-16 ***
## animal: Canines vs. Felines 1 324.9 324.9 306.188 < 2e-16 ***
## animal: Cougars vs House Cats 1 9.8 9.8 9.236 0.0044 **
## animal: Wolves vs Dogs 1 24.2 24.2 22.806 2.98e-05 ***
## Residuals 36 38.2 1.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Polynomial Contrasts
Plot the means
Before we run an ANOVA, let’s obtain a plot to get an idea of how the sample means differ.
Check out this useful tutorial: http://www.cookbook-r.com/Graphs/Plotting_means_and_error_bars_(ggplot2)/
# clear the environment
rm(list=ls())
# read in Data Set #2
data <- read.csv("RClub_DataSet2_11.3.15.csv", header = TRUE)
# Lets take a look at the data - do we see a linear trend?
# Since we're not plotting the data themselves but rather means, first we need to calculate that, so we can feed it into the plot.
# To do that, we'll use a few functions from the dplyr package. Check out this tutorial: http://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.2.2
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
groups <- group_by(data, dose) # this just prepares it for us to calculate eveyrthing within each condition
plot.data <- summarise(groups,
mean = mean(score, na.rm=TRUE),
sd = sd(score, na.rm=TRUE),
n = n(),
se=sd/sqrt(n),
ci = qt(0.975, df=n-1)*se)
plot.data # take a peek
## Source: local data frame [3 x 6]
##
## dose mean sd n se ci
## (fctr) (dbl) (dbl) (int) (dbl) (dbl)
## 1 15ml 3.55 0.4972145 10 0.1572330 0.3556858
## 2 25ml 4.05 0.3689324 10 0.1166667 0.2639183
## 3 35ml 4.35 0.5296750 10 0.1674979 0.3789066
# ready to plot!
# line graph
library(ggplot2)
##
## Attaching package: 'ggplot2'
##
## The following object is masked from 'package:psych':
##
## %+%
ggplot(plot.data, aes(x=dose, y=mean, group = factor(1))) +
geom_line() +
geom_point() +
geom_errorbar(aes(ymin=mean-ci, ymax=mean+ci), width=.1) +
ggtitle("Figure 1. Mean Score by Dose")
Polynomial Contrasts
For additional info on setting up contrasts (beyond the scope of this lab), check out the ever useful UCLA stats walkthrough
# run an ANOVA
results <- aov(score ~ dose, data=data)
summary.aov(results)
## Df Sum Sq Mean Sq F value Pr(>F)
## dose 2 3.267 1.6333 7.381 0.00277 **
## Residuals 27 5.975 0.2213
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Run a Polynomial Contrast
# Here's one way - tell R which groups to compare. These are the orthogonal polynomial contrasts.
c1 <- c(-1, 0, 1)
c2 <- c(0.5, -1, 0.5)
# combined the above 2 lines into a matrix
mat <- cbind(c1,c2)
# tell R that the matrix gives the contrasts you want
contrasts(data$dose) <- mat
# these lines give you your results
model1 <- aov(score ~ dose, data = data)
summary.aov(model1, split=list(dose=list("Linear"=1, "Quadratic" = 2)))
## Df Sum Sq Mean Sq F value Pr(>F)
## dose 2 3.267 1.633 7.381 0.002773 **
## dose: Linear 1 3.200 3.200 14.460 0.000744 ***
## dose: Quadratic 1 0.067 0.067 0.301 0.587607
## Residuals 27 5.975 0.221
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Try it using built-in contr.poly as well:
# we tell R to get a polynomial contrast matrix for 3 conditions
contrasts(data$dose) <- contr.poly(3)
# these lines give you your results
model2 <- aov(score ~ dose, data = data)
summary.aov(model2, split=list(dose=list("Linear"=1, "Quadratic" = 2)))
## Df Sum Sq Mean Sq F value Pr(>F)
## dose 2 3.267 1.633 7.381 0.002773 **
## dose: Linear 1 3.200 3.200 14.460 0.000744 ***
## dose: Quadratic 1 0.067 0.067 0.301 0.587607
## Residuals 27 5.975 0.221
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Run a TukeyHSD Test
TukeyTest <- TukeyHSD(results)
TukeyTest
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = score ~ dose, data = data)
##
## $dose
## diff lwr upr p adj
## 25ml-15ml 0.5 -0.02161703 1.021617 0.0621938
## 35ml-15ml 0.8 0.27838297 1.321617 0.0020820
## 35ml-25ml 0.3 -0.22161703 0.821617 0.3420861
Factorial ANOVA
# clear the environment
rm(list=ls())
# read in Data Set #3
data <- read.csv("RClub_DataSet3_11.3.15.csv", header = TRUE)
First, get a plot of your means, so you can see what’s going on.
# I'm choosing to plot with instruction type across the x-axis and grouped by age. It would also be fine to do it the other way around.
library(dplyr)
groups <- group_by(data, instructions, age) # this just prepares it for us to calculate everything within each combination of instructions and age
plot.data <- summarise(groups,
mean = mean(score, na.rm=TRUE),
sd = sd(score, na.rm=TRUE),
n = n(),
se=sd/sqrt(n),
ci = qt(0.975,df=n-1)*se)
plot.data # take a peek
## Source: local data frame [6 x 7]
## Groups: instructions [?]
##
## instructions age mean sd n se ci
## (fctr) (fctr) (dbl) (dbl) (int) (dbl) (dbl)
## 1 both old 21.16667 4.262237 6 1.7400511 4.472944
## 2 both young 26.16667 4.167333 6 1.7013067 4.373348
## 3 verbal old 23.50000 3.209361 6 1.3102163 3.368018
## 4 verbal young 23.00000 2.366432 6 0.9660918 2.483418
## 5 written old 13.66667 3.502380 6 1.4298407 3.675523
## 6 written young 22.00000 4.774935 6 1.9493589 5.010986
# ready to plot!
# bar graph
# check this out: http://www.cookbook-r.com/Graphs/Plotting_means_and_error_bars_(ggplot2)/
library(ggplot2)
ggplot(plot.data, aes(x=instructions, y=mean, fill = age )) +
geom_bar(stat="identity", position=position_dodge()) +
geom_errorbar(aes(ymin=mean-ci, ymax=mean+ci), width=.2, position=position_dodge(.9)) +
ggtitle("Figure 1. Mean score \nby instruction type and age")
### Run a 2 X 3 ANOVA on the data.
# The best way to run this is actually with the lm() command, not aov(). It stands for "linear model". ANOVAs, regressions, t-tests, etc. are all examples of the general linear model, so you can use this one command to do pretty much any of them in R.
# aov() works, and it will generate exactly the same source table for you (the math is all identical), but lm() gives you more useful output.
model <- lm(score ~ instructions*age , data=data)
# When you specify an interaction with *, R automatically assumes you want the main effects as well.
# So "instructions*age" is shorthand for "instructions + age + instructions*age".
anova(model) # to get a source table
## Analysis of Variance Table
##
## Response: score
## Df Sum Sq Mean Sq F value Pr(>F)
## instructions 2 254.17 127.083 8.8150 0.0009741 ***
## age 1 164.69 164.694 11.4239 0.0020278 **
## instructions:age 2 119.39 59.694 4.1407 0.0258236 *
## Residuals 30 432.50 14.417
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# We can discuss calculating simple effects in a later R Club
// add bootstrap table styles to pandoc tables $(document).ready(function () { $('tr.header').parent('thead').parent('table').addClass('table table-condensed'); });
I found this post really helpful (using the Planned Comparisons section), but I need to do a Type II ANOVA with Anova(). Is there a way to split my list while performing a Type II ANOVA? The summary.aov() is for Type I. Running Anova() rather than summary.aov() does not split the list.
Super helpful vignette here, thank you! Initially, I used the aov() method as shown and produced my SS/F-value/significance for my linear, quadratic and cubic trends. However, when I try to replicate this using lm() (as suggested), my overall model F-statistic and linear trend significance remains the same, however the p-values for the quadratic and cubric trends no longer match what I received with aov().
Is this because with lm() I am “forcing” quadratic/cubic trends into a linear function? Or is something else going on in the backround? When I run aov() withour splitting the list, a message below the output says that “estimated effects may be unbalanced”. Any insight is HUGELY appreciated, thank you!
My code here..
# Create a polynomial contrast matrix for 5 groups, then assign it to the classification variable
contrasts(SCI$sf1) <- contr.poly(n=5)
# run anova, save
anova.sf1 <- aov(ICECAP_A ~ sf1, data = SCI)
# note: this output says that the 'estimated effects may be unbalanced', which may be due to the aov() function?
summary.aov(anova.sf1, split=list (sf1=list ("Linear"=1 , "Quadratic" = 2, "Cubic" = 3)))
# Try to get teh same result using lm() instead of aov(), but it produces a different significance for the quadratic…
lm.sf1 <-lm(ICECAP_A ~ sf1, data = SCI)
summary(lm.sf1)
Thank you, this is really useful. Is there a way to know the effect size for each tested orthogonal contrast?